Author: mark

Linux is Obsolete!

A lame video on techcrunch today inspired me to go hunting for the original argument between Linus Torvalds and (Professor) Andy Tanenbaum and here it is. Titled Linux is Obsolete, it’s a post by the author of Minix in 1992 telling Linus he’s just created an obsolete OS that’s running on obsolete hardware (the 386) that won’t be around in a few years.

Andy’s ideas are a great example of how an academic approach to software design can lead to layers of abstraction that kill performance. You see this mistake often in web applications because web development teams are separate from the operations team and don’t have to think about performance under load. So their focus stays on the manageability of the code base rather than its performance. They make language choices and design decisions that help them write beautiful code in as few lines as possible that any university professor would be proud of.

Find me an ops guy who loves Ruby on Rails and I’ll find you a dev who loves hand-crafting SQL statements.

Here is AST’s original email:

Subject: LINUX is obsolete
From: ast@cs.vu.nl (Andy Tanenbaum)
Date: 29 Jan 92 12:12:50 GMT
Newsgroups: comp.os.minix
Organization: Fac. Wiskunde & Informatica, Vrije Universiteit, Amsterdam

I was in the U.S. for a couple of weeks, so I haven't commented much on
LINUX (not that I would have said much had I been around), but for what
it is worth, I have a couple of comments now.

As most of you know, for me MINIX is a hobby, something that I do in the
evening when I get bored writing books and there are no major wars,
revolutions, or senate hearings being televised live on CNN.  My real
job is a professor and researcher in the area of operating systems.

As a result of my occupation, I think I know a bit about where operating
are going in the next decade or so.  Two aspects stand out:

1. MICROKERNEL VS MONOLITHIC SYSTEM
   Most older operating systems are monolithic, that is, the whole operating
   system is a single a.out file that runs in 'kernel mode.'  This binary
   contains the process management, memory management, file system and the
   rest. Examples of such systems are UNIX, MS-DOS, VMS, MVS, OS/360,
   MULTICS, and many more.

   The alternative is a microkernel-based system, in which most of the OS
   runs as separate processes, mostly outside the kernel.  They communicate
   by message passing.  The kernel's job is to handle the message passing,
   interrupt handling, low-level process management, and possibly the I/O.
   Examples of this design are the RC4000, Amoeba, Chorus, Mach, and the
   not-yet-released Windows/NT.

   While I could go into a long story here about the relative merits of the
   two designs, suffice it to say that among the people who actually design
   operating systems, the debate is essentially over.  Microkernels have won.
   The only real argument for monolithic systems was performance, and there
   is now enough evidence showing that microkernel systems can be just as
   fast as monolithic systems (e.g., Rick Rashid has published papers comparing
   Mach 3.0 to monolithic systems) that it is now all over but the shoutin`.

   MINIX is a microkernel-based system.  The file system and memory management
   are separate processes, running outside the kernel.  The I/O drivers are
   also separate processes (in the kernel, but only because the brain-dead
   nature of the Intel CPUs makes that difficult to do otherwise).  LINUX is
   a monolithic style system.  This is a giant step back into the 1970s.
   That is like taking an existing, working C program and rewriting it in
   BASIC.  To me, writing a monolithic system in 1991 is a truly poor idea.

2. PORTABILITY
   Once upon a time there was the 4004 CPU.  When it grew up it became an
   8008.  Then it underwent plastic surgery and became the 8080.  It begat
   the 8086, which begat the 8088, which begat the 80286, which begat the
   80386, which begat the 80486, and so on unto the N-th generation.  In
   the meantime, RISC chips happened, and some of them are running at over
   100 MIPS.  Speeds of 200 MIPS and more are likely in the coming years.
   These things are not going to suddenly vanish.  What is going to happen
   is that they will gradually take over from the 80x86 line.  They will
   run old MS-DOS programs by interpreting the 80386 in software.  (I even
   wrote my own IBM PC simulator in C, which you can get by FTP from
   ftp.cs.vu.nl =  192.31.231.42 in dir minix/simulator.)  I think it is a
   gross error to design an OS for any specific architecture, since that is
   not going to be around all that long.

   MINIX was designed to be reasonably portable, and has been ported from the
   Intel line to the 680x0 (Atari, Amiga, Macintosh), SPARC, and NS32016.
   LINUX is tied fairly closely to the 80x86.  Not the way to go.

Don`t get me wrong, I am not unhappy with LINUX.  It will get all the people
who want to turn MINIX in BSD UNIX off my back.  But in all honesty, I would
suggest that people who want a **MODERN** "free" OS look around for a
microkernel-based, portable OS, like maybe GNU or something like that.

Andy Tanenbaum (ast@cs.vu.nl)

P.S. Just as a random aside, Amoeba has a UNIX emulator (running in user
space), but it is far from complete.  If there are any people who would
like to work on that, please let me know.  To run Amoeba you need a few 386s,
one of which needs 16M, and all of which need the WD Ethernet card.

March 26, 2008

Guy's house cleaned out by ad on Craigslist

This is too much. A guy in Jacksonville, Oregon had his house cleaned out thanks to a malicious ad placed on Craigslist. Someone posted an ad saying that the house had been declared abandoned and all the belongings including a horse were free to good homes. So the entire neighborhood rocked up and started carting stuff off. When the guy arrived the looters were armed printouts of the ad and refused to hand the guys stuff over. From the Seattle Times:

The independent contractor was at Emigrant Lake when he got a call from a woman who had stopped by his house to claim his horse.

On his way home he stopped a truck loaded down with his work ladders, lawn mower and weed eater.

“I informed them I was the owner, but they refused to give the stuff back,” Salisbury said. “They showed me the Craigslist printout and told me they had the right to do what they did.”

The driver sped away after rebuking Salisbury. On his way home he spotted other cars filled with his belongings.

Once home he was greeted by close to 30 people rummaging through his barn and front porch.

The trespassers, armed with printouts of the ad, tried to brush him off. “They honestly thought that because it appeared on the Internet it was true,” Salisbury said. “It boggles the mind.”

Full article here.

March 25, 2008
Obamagirl's latest

Nice touch getting Bill to play the sax. 🙂 [obamagirl video below] And in case you haven’t seen it, check out Hillary’s latest faux pas on youtube today. And now I need to go flagellate myself for an hour for violating my not-blogging-about-politics rule.

March 25, 2008
Anycasting anyone?

[Thanks Sam for the idea for this entry] Ever heard of IP Anycasting? Thanks to my recent change from godaddy (frowny face and no link) to dnsmadeeasy (happy face and they get a link) I’m now using a DNS provider that provides anycasting. What is it and should you care?

IP Anycasting is assigning the same IP address to multiple instances of the same service on strategic points in the network. For example, if you are a DNS provider, you might have servers in New York, London and Los Angeles with the same IP address. Then when a surfer in San Diego (about 80 Miles South of Los Angeles) makes a request to your DNS system the server in Los Angeles answers and saves the network from having to route traffic to New York or London.

Anycasting is generally used to distribute load geographically and to mitigate the effect of distributed denial of service attacks. It’s been used by the F root server since November 2002 and has saved good ole F from getting taken down by several DDoS attacks.

I was using dnspark.net a couple of years ago and we had a few hours of down-time while they were hit by a DDoS attack – so it’s not as uncommon as you think. [They obviously don’t use anycasting]

Anycasting is suitable for DNS because DNS uses a connectionless session layer protocol called UDP. One packet is sent, a response is received and hey, if the response isn’t received the client just tries another DNS server. [This occurs in the vast majority of DNS queries. There are a small number of exceptions where DNS uses TCP.]

Anycasting is not ideally suited for TCP connections like web browser-server communication because TCP is connection oriented. For example, TCP requires a 3 way handshake to establish the connection. If the network topology changes and one packet is sent to the Los Angeles server and another is sent to New York it breaks TCP because the New York server doesn’t know about the session that Los Angeles has started establishing.

That’s the theory anyway, but if the network topology stays reasonably stable and you don’t mind a few sessions breaking when the topology does change then perhaps you’ll consider using Anycasting with your web servers. But don’t get too creative and launch a content delivery network. Akamai might sue you and they’ll probably win. They own patent No. 6,108,703 which covers a “global hosting system” in which “a base HTML document portion of a Web page is served from the Content Provider’s site while one or more embedded objects for the page are served from the hosting servers, preferably, those hosting servers near the client machine.” Akamai just won a case against competitor Limelight for violating that patent and the case is now heading to the appeal courts.

There are other protocols that are connectionless and therefore well suited for Anycasting like SNTP and SNMP but there isn’t much demand for these because they’re network management protocols and don’t experience the massive load that more public protocols like DNS, SMTP and HTTP get.

Deploying an anycast network is not something you’re likely to consider in the near future unless you’re eBay or Google, but outsourcing some of your services like DNS to an anycast provider is something that’s worked well for me and might work for you.

March 24, 2008
Very high performance web servers

Have you ever tried to get Apache to handle 10,000 concurrent connections? For example, you have a very busy website and you enable keepalive on your web server. Then you set the timeout to something high like 300 seconds for ridiculously slow clients (sounds crazy but I think that’s Apache’s default). All of a sudden when you run netstat it tells you that you have thousands of clients with established connections to your machine.

Apache can’t handle 10,000 connections efficiently because it uses a one-thread-per-connection model (or if you’re using prefork then one process per connection).

If you want to allow your clients to use keepalive on your very busy website you need to use a server that uses an event notification model. That means that you have a single thread or process that manages thousands of sockets or connections. The sockets don’t block the execution of the thread but instead sit quietly until something happens and then have a way of notifying the thread that something happened and it better come take a look.

Most of us use Linux these days – of course there are the BSD die hards but whatever. The linux kernel 2.6 introduced something called epoll that is an event notification system for applications that want to manage lots of file descriptors without blocking execution and be notified when something changes.

Both lighttpd and nginx are two very fast web servers that use epoll and a non-blocking event notification model to manage thousands of connections with a single thread and just a few megs of ram (ram consumption is the real reason you can’t use apache for high concurrency). You can also spawn more than one thread on both servers if you’d like to have them use more than one processor or cpu core.

I used to use lighttpd 1.4.x but it’s configuration really sucks because it’s so inflexible. I love nginx’s configuration because it’s very intuitive and very flexible. It also has some very cool modules including an experimental embedded perl module. The performance I’m getting out of it is nothing short of spectacular. I run 8 worker processes and each process consumes about 7 megs of RAM with a few modules loaded.

So my config looks like:

request ==> nginx_with_keepalive –> apache/appserver_nokeepalive

If you’d like to read more about server models for handling huge numbers of clients, check out Dan Kegel’s page on the so called c10k problem where he documents a few other event models for servers and has a history lesson on event driven IO.

Also, if you’re planning on running a high traffic server with high concurrency you should probably optimize your IP stack – here are a few suggestions I made a while back on how to do that.

March 23, 2008
The irrelevance of microsoft's search

I put some cross-cluster traffic throttling in place yesterday using memcached – which rocks btw. In the last 12 hours I’ve blocked three sources – two were rogue crawlers from broadband ISP’s. The other was MSN’s live search crawler which is requesting more than 1 page per second sustained over 30 seconds. If it was Google I’d probably care, but Google has polite crawlers and unlike Google, Live search only sends me about 2% of my total search traffic.

March 23, 2008
How to fix munin's netstat passive connections increasing constantly

Another thing I googled until I was all googled out and couldn’t find an answer, so for future explorers who pass by here, here’s the fix…

If you’re running munin and you suddenly notice the number of netstat passive connections is constantly increasing in a linear fashion, rest assured it’s not your server that’s busy beating itself into oblivion. It’s a munin bug that’s easily fixed.

If you run netstat and get something like this:

netstat -s|grep passive
3339672 passive connection openings
7574 passive connections rejected because of time stamp

…then it’s the passive connections rejected that’s confusing munin.

To fix this edit:

/usr/share/munin/plugins/netstat

and change the line

netstat -s | awk ‘/active connections/ { print “active.value ” $1 } /passive connection/ { print “passive.value ” $1 } /failed connection/ { print “failed.value ” $1 } /connection resets/ { print “resets.value ” $1 } /connections established/ { print “established.value ” $1 }’

to

netstat -s | awk ‘/active connections/ { print “active.value ” $1 } /passive connection openings/ { print “passive.value ” $1 } /failed connection/ { print “failed.value ” $1 } /connection resets/ { print “resets.value ” $1 } /connections established/ { print “established.value ” $1 }’

March 15, 2008
Sergio and Muse

My good friend Sergio who is an extremely accomplished musician and who morphed himself from a spectacular bassist to spectacular drummer and can put most lead guiarists to shame once told me that Muse is the best rock band that has ever existed.

Personally I don’t have the balls or the knowledge to make far reaching statements like that. And reading this I know you’re enumerating the thousands (millions?) of rock bands that have existed since African American slave communities sang their first question/answer folk songs and created the foundation for blues and then rock.

But Serge is a smart guy and his opinion is not to be taken lightly. Go buy Muse – “Hysteria” and “Supermassive Black Hole” on iTunes and let me know what you think.

March 14, 2008
Why Free?

A great article on wired about the free web economy.

Interesting quote:

“Anything you can consistently convert to cash is a form of currency itself, and Google plays the role of central banker for these new economies.”

March 14, 2008
I'm so dumb

Don’t ever leave a website that starts to get any kind of traffic on the joke that calls itself GoDaddy. As a registrar they’re not bad but their DNS tool is very broken.

I won’t bore you with tales of my screaming match at a manager there at 2am when a simple A record IP address change caused my image server’s address to drop in and out of their DNS at random. Or how the crankier I got the more he called me sir. Or how his colleague explained that if I choose to use their DNS service I need to know intuitively that I can’t make more than one change a day or their zone file gets corrupt – and how it’s standard procedure that you call them to do a “zone file refresh”. Or how he explained that a record I hadn’t changed at all dropped off their servers and the reason was because it’s an “Internet Thing”.

I moved over to dnsmadeeasy.com today and so far they rock. They’re the lowest cost host that offers Anycast on their servers which gives pretty good protection against DDoS attacks – something that took out dnspark a while back when I used to use them.

March 11, 2008