Technology


Technology26 Mar 2008 12:00 am

A lame video on techcrunch today inspired me to go hunting for the original argument between Linus Torvalds and (Professor) Andy Tanenbaum and here it is. Titled Linux is Obsolete, it’s a post by the author of Minix in 1992 telling Linus he’s just created an obsolete OS that’s running on obsolete hardware (the 386) that won’t be around in a few years.

Andy’s ideas are a great example of how an academic approach to software design can lead to layers of abstraction that kill performance. You see this mistake often in web applications because web development teams are separate from the operations team and don’t have to think about performance under load. So their focus stays on the manageability of the code base rather than its performance. They make language choices and design decisions that help them write beautiful code in as few lines as possible that any university professor would be proud of.

Find me an ops guy who loves Ruby on Rails and I’ll find you a dev who loves hand-crafting SQL statements.

Here is AST’s original email:

Subject: LINUX is obsolete
From: ast@cs.vu.nl (Andy Tanenbaum)
Date: 29 Jan 92 12:12:50 GMT
Newsgroups: comp.os.minix
Organization: Fac. Wiskunde & Informatica, Vrije Universiteit, Amsterdam

I was in the U.S. for a couple of weeks, so I haven't commented much on
LINUX (not that I would have said much had I been around), but for what
it is worth, I have a couple of comments now.

As most of you know, for me MINIX is a hobby, something that I do in the
evening when I get bored writing books and there are no major wars,
revolutions, or senate hearings being televised live on CNN.  My real
job is a professor and researcher in the area of operating systems.

As a result of my occupation, I think I know a bit about where operating
are going in the next decade or so.  Two aspects stand out:

1. MICROKERNEL VS MONOLITHIC SYSTEM
   Most older operating systems are monolithic, that is, the whole operating
   system is a single a.out file that runs in 'kernel mode.'  This binary
   contains the process management, memory management, file system and the
   rest. Examples of such systems are UNIX, MS-DOS, VMS, MVS, OS/360,
   MULTICS, and many more.

   The alternative is a microkernel-based system, in which most of the OS
   runs as separate processes, mostly outside the kernel.  They communicate
   by message passing.  The kernel's job is to handle the message passing,
   interrupt handling, low-level process management, and possibly the I/O.
   Examples of this design are the RC4000, Amoeba, Chorus, Mach, and the
   not-yet-released Windows/NT.

   While I could go into a long story here about the relative merits of the
   two designs, suffice it to say that among the people who actually design
   operating systems, the debate is essentially over.  Microkernels have won.
   The only real argument for monolithic systems was performance, and there
   is now enough evidence showing that microkernel systems can be just as
   fast as monolithic systems (e.g., Rick Rashid has published papers comparing
   Mach 3.0 to monolithic systems) that it is now all over but the shoutin`.

   MINIX is a microkernel-based system.  The file system and memory management
   are separate processes, running outside the kernel.  The I/O drivers are
   also separate processes (in the kernel, but only because the brain-dead
   nature of the Intel CPUs makes that difficult to do otherwise).  LINUX is
   a monolithic style system.  This is a giant step back into the 1970s.
   That is like taking an existing, working C program and rewriting it in
   BASIC.  To me, writing a monolithic system in 1991 is a truly poor idea.

2. PORTABILITY
   Once upon a time there was the 4004 CPU.  When it grew up it became an
   8008.  Then it underwent plastic surgery and became the 8080.  It begat
   the 8086, which begat the 8088, which begat the 80286, which begat the
   80386, which begat the 80486, and so on unto the N-th generation.  In
   the meantime, RISC chips happened, and some of them are running at over
   100 MIPS.  Speeds of 200 MIPS and more are likely in the coming years.
   These things are not going to suddenly vanish.  What is going to happen
   is that they will gradually take over from the 80x86 line.  They will
   run old MS-DOS programs by interpreting the 80386 in software.  (I even
   wrote my own IBM PC simulator in C, which you can get by FTP from
   ftp.cs.vu.nl =  192.31.231.42 in dir minix/simulator.)  I think it is a
   gross error to design an OS for any specific architecture, since that is
   not going to be around all that long.

   MINIX was designed to be reasonably portable, and has been ported from the
   Intel line to the 680x0 (Atari, Amiga, Macintosh), SPARC, and NS32016.
   LINUX is tied fairly closely to the 80x86.  Not the way to go.

Don`t get me wrong, I am not unhappy with LINUX.  It will get all the people
who want to turn MINIX in BSD UNIX off my back.  But in all honesty, I would
suggest that people who want a **MODERN** "free" OS look around for a
microkernel-based, portable OS, like maybe GNU or something like that.

Andy Tanenbaum (ast@cs.vu.nl)

P.S. Just as a random aside, Amoeba has a UNIX emulator (running in user
space), but it is far from complete.  If there are any people who would
like to work on that, please let me know.  To run Amoeba you need a few 386s,
one of which needs 16M, and all of which need the WD Ethernet card.
Technology and Startups24 Mar 2008 12:36 pm

[Thanks Sam for the idea for this entry] Ever heard of IP Anycasting? Thanks to my recent change from godaddy (frowny face and no link) to dnsmadeeasy (happy face and they get a link) I’m now using a DNS provider that provides anycasting. What is it and should you care?

IP Anycasting is assigning the same IP address to multiple instances of the same service on strategic points in the network. For example, if you are a DNS provider, you might have servers in New York, London and Los Angeles with the same IP address. Then when a surfer in San Diego (about 80 Miles South of Los Angeles) makes a request to your DNS system the server in Los Angeles answers and saves the network from having to route traffic to New York or London.

Anycasting is generally used to distribute load geographically and to mitigate the effect of distributed denial of service attacks. It’s been used by the F root server since November 2002 and has saved good ole F from getting taken down by several DDoS attacks.

I was using dnspark.net a couple of years ago and we had a few hours of down-time while they were hit by a DDoS attack - so it’s not as uncommon as you think. [They obviously don’t use anycasting]

Anycasting is suitable for DNS because DNS uses a connectionless session layer protocol called UDP. One packet is sent, a response is received and hey, if the response isn’t received the client just tries another DNS server. [This occurs in the vast majority of DNS queries. There are a small number of exceptions where DNS uses TCP.]

Anycasting is not ideally suited for TCP connections like web browser-server communication because TCP is connection oriented. For example, TCP requires a 3 way handshake to establish the connection. If the network topology changes and one packet is sent to the Los Angeles server and another is sent to New York it breaks TCP because the New York server doesn’t know about the session that Los Angeles has started establishing.

That’s the theory anyway, but if the network topology stays reasonably stable and you don’t mind a few sessions breaking when the topology does change then perhaps you’ll consider using Anycasting with your web servers. But don’t get too creative and launch a content delivery network. Akamai might sue you and they’ll probably win. They own patent No. 6,108,703 which covers a “global hosting system” in which “a base HTML document portion of a Web page is served from the Content Provider’s site while one or more embedded objects for the page are served from the hosting servers, preferably, those hosting servers near the client machine.” Akamai just won a case against competitor Limelight for violating that patent and the case is now heading to the appeal courts.

There are other protocols that are connectionless and therefore well suited for Anycasting like SNTP and SNMP but there isn’t much demand for these because they’re network management protocols and don’t experience the massive load that more public protocols like DNS, SMTP and HTTP get.

Deploying an anycast network is not something you’re likely to consider in the near future unless you’re eBay or Google, but outsourcing some of your services like DNS to an anycast provider is something that’s worked well for me and might work for you.

Tech News and Technology16 Dec 2007 10:57 pm

“My machine overnight could process my in-box, analyze which ones were probably the most important, but it could go a step further,” he said. “It could interpret some of them, it could look at whether I’ve ever corresponded with these people, it could determine the semantic context, it could draft three possible replies. And when I came in in the morning, it would say, hey, I looked at these messages, these are the ones you probably care about, you probably want to do this for these guys, and just click yes and I’ll finish the appointment.” ~Craig Mundie from Microsoft in today’s NY Times

Sounds like Microsoft is working on a Positronic Brain rather than writing software for multi-core processors.

Tech News and Technology23 Oct 2007 11:26 am

Paciolan is managing ticket sales for the Colorado Rockies. Their servers were hit with over 1500 requests per second and it took down not only the Rockies ticket sales infrastructure, but all Paciolans other customers too.

They claim to have been hit by a DDoS attack, but that’s something that’s hard to prove or disprove when you have corporate firewalls and AOL firewalls sending many requests from a single IP - it looks just like a DDoS attack but it actually isn’t.

Is 1500 requests per second a lot? No. Feedjit (my site) peaks at 140 requests per second and it does it with just two servers - and the data it’s serving is dynamic.

So a cluster of 10 to 30 servers should easily handle the load they’ve described - especially if all it’s doing is queueing visitors and only letting a handful through, which is what Paciolan’s ticketing software does.

The result? Police are erecting barricades around Coors Field. Here’s a quote from cNet:

“…many fans are apparently converging near Coors Field in hopes that the team will sell tickets in person through the box office; so many in fact that the police have closed streets around the ballpark and are erecting barricades, the paper reported.”

Ticketmaster is trying to buy Paciolan - the deal is currently under government review. Ticketmaster runs Mod_Perl (and so does Feedjit) and some very smart people who know a lot about scalability (and who I used to work with) work for Ticketmaster. So hopefully the deal will go through and mod_perl will come to the rescue.

btw, I’m doing a short talk in 2 days on how to scale your web servers fast based on my experience scaling Feedjit.

Technology01 Oct 2007 09:33 pm

I take my dev server and my workstation everywhere with me in a single small backpack.
My “dev server” is an intel macbook that dual-boots Linux and OSX. My workstation is a windows laptop. Most of my work is done with the macbook booted into Linux and my windows laptop running an SSH client that I use to write all my code on the linux/macbook. I do this because most of my users run Windows and I can write code in my ssh client and test immediately on Firefox, IE7, IE6 (using Virtual PC) and Opera. I also do graphic design in Fireworks on my windoze laptop.

So most of the time my macbook sits quietly in the corner and plays server, unless I need to boot into OSX to test something in Safari.

On my recent road trip I needed to do some dev and the wireless network at the hotel had a firewall set up so that one machine on the wireless network could not connect to another machine on the same network. So my windows laptop couldn’t connect to my macbook server.

I had two ethernet cables with me and no hub. So I googled how to make a crossover cable to connect the two laptops directly to each other using a single ethernet cable and no hub.

I started cutting up one of the cables with my leatherman and after I’d stripped the insulation off but right before I’d cut the first wire, just for the hell of it I plugged the cable into the two machines without it being crossed over.

…and it worked.

I couldn’t believe it. Finally someone has designed an ethernet port that auto-detects if it needs to be in crossover or regular mode. As far as I can tell it’s the macbook that’s doing this piece of pure genius.

So now I have a half trashed crossover cable - but it still works - and I can connect my dev workstation and “server” directly to each other with any old ethernet cable.

Sometimes Apple sucks. But sometimes they well and truly rock!!

Technology14 Aug 2007 08:41 pm

I can’t log in to either of my hosted gmail accounts. Anyone else?

UPDATE: I contacted gmail support and apparently they occasionally lock accounts due to suspicious activity. I think I had two different hosted gmail accounts open in tabs in the same browser. Very suspicious.

Their suggestion: contact them. The response: occasionally we lock accounts - see our help page for detail. The help page suggests you contact them.

So I’m stuck in a loop and it’s pissing me off because I haven’t had access to mark at linebuzz.com for going on 16 hours now.

If you’re thinking of moving your corporate email to hosted GMail, think twice. You may be up the creek for 16 hours waiting for a locked account to timeout.

.

Rants and Tech News and Trash Talking and Technology and Randomness30 Jul 2007 07:51 pm

I rant, Tony rants, Alan ranted.

With surprisingly similar space-time coordinates.

Our love of Facebook is duly recanted.

We’re no longer Zuckerberg’s subordinates.

Technology26 Jul 2007 08:43 am

A quick article about how to record a remote interview and how to fix the audio levels after the interview.

I got a few questions about the equipment I used to record the podcast interview with Tony yesterday. I recorded it remotely using Skype - Tony was in West Seattle and I’m in Sammamish. We were both wearing headsets which I recommend because even though Skype is good at cutting out feedback from a PC speaker, some noise does get through if you’re not wearing a headset.

I used Pamela to record the audio. I recommend the Pro version because the other versions limit your recording to 30 minutes or less. Pamela is free for the first 30 days and it’s about $12 after that. A tip when using Pamela: To get to the mp3 audio files, right-click on a recording and click “open call recording folder”. It took me a while to figure that out.

The only complaint I have about Pamela is that it doesn’t regulate the volume of the caller vs. the callee. So my voice was very loud and Tony’s was much softer. It’s taking the audio directly from Skype, so perhaps that’s too much to ask. I also haven’t experimented playing around with the Skype audio settings. Fixing this was time consuming:

I used Audacity, and open source sound editor to fix the difference in Audio volume, and besides the actual interview, this occupied most of my time putting the podcast together. Using Audacity you can see the waveform and it’s quite clear where the audio level is much lower. So I selected the parts in the audio where Tony speaks and applied the Amplify effect. Amplify automatically detects the largest waveform and sets the amplification so that the largest waveform won’t clip - in other words it wont over-amplify and cause distortion. I recommend using the default number it gives you and if that’s too low, then look at the area of the clip you’ve selected and you’ll probably see a spike in the waveform that’s causing amplify to give you a low amplification number. Just select around that spike and you’ll be able to boost the signal more.

I’m sure there’s an easier way to do this, but I tried using Leveller and a couple of other tools and the results weren’t as good as Amplify.

Next time, I’m going to make darn sure my levels are much lower and as close as possible to the person I’m calling. Pamela has a level indicator when you’re recording, so I might try and use that as a visual guide and tweak Skype’s audio settings.

Once I’d finished working with the clip in Audacity, I saved it as a WAV file rather than using Audacity’s ’save-as mp3′ option and I used RazorLame to convert the WAV to mp3. That gave me more control over the mp3 quality. Under Edit/LAME options, select 24kbit as the bitrate and ‘mono’ as the mode.

Then I just uploaded the file to my blog server and presto!

Technology25 Jul 2007 10:12 am

I logged onto my blog this morning and it wouldn’t load. I tried to ping the server and it was still up. Then I tried ssh’ing into the server and it connected. I hit reload again in my browser and starting mumbling WTF.

Then I ran ‘uptime’ on the server and got something like this:

09:52:40 up 325 days, 6:45, 2 users, load average: 0.28, 0.28, 0.27

That’s a little high, so I checked how many apache processes there were and it was at MaxClients, apache was working pretty hard. I checked my Analytics stats and by 7am today I had already done as much traffic as yesterday:

So I tailed the web server log file and it just flew off the screen.

I figured out Reddit was the source. Someone posted a blog entry I wrote yesterday about Rescuetime and it’s getting a few votes.

I run a standard Wordpress.org install (newest version). My server has 1G of RAM and is an AMD Athlon XP 2100. It’s on a 10 Megabit backbone, so has plenty of bandwidth. So I made some basic changes to the server.

Apache needed to handle more concurrent connections, and I had MaxClients set to 15. But the server was using too much memory for me to increase maxclients, and MySQL was the memory hog. So changed the mysql config to use less memory because fetching blog entries from disk is not that much hard work.

My my.cnf file (the config file for mysql) has the following settings now:

key_buffer = 50M
sort_buffer_size = 5M
read_buffer_size = 1M
read_rnd_buffer_size = 1M
myisam_sort_buffer_size = 5M
query_cache_size = 4M
That’s a fairly small number of the key buffer and the other caches are very low too, but I’m just serving around 300 blog entries, so I could probably do away with the key buffer completely and just rely on disk access and it would still be ok. I left the query cache at 4M in the hope that it would save me some disk access when fetching blog entries.

I changed Apache’s config from this:

MinSpareServers 15
MaxSpareServers 15
StartServers 15
MaxClients 30

to this:

MinSpareServers 15
MaxSpareServers 45
StartServers 30
MaxClients 60

It fixed it immediately and my blog is now blazingly fast. :) Right now apache has 49 children, so it’s still getting a lot of traffic, but it’s not hitting MaxClients which means it’s not turning away users.

Digg!

Technology and Randomness22 Jul 2007 01:48 pm

This little guy quietly spun this masterpiece while I was snoozing on the couch below him last night - and he got me thinking about The Web and what the word really means these days. Perhaps I’ve been in the entrepreneurial game for too long now, but it’s beginning to mean:

  • Design
  • User interfaces
  • SEO
  • SEM
  • Traffic
  • Competitive analysis
  • Bounce rates,
  • Return rates,
  • Content optimization
  • etc…etc…

…when it really means one thing:

COMMUNICATION

Whether it’s buyers communicating with sellers or mining the worlds collective knowledge via search or blog trackbacks or inline comments - it’s all just new ways for us all to communicate. Communication is the web’s raison d’être. It’s really that simple.

Next Page »

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License.