How to handle 1000′s of concurrent users on a 360MB VPS

There has been some recent confusion about how much memory you need in a web server to handle a huge number of concurrent requests. I also made a performance claim on the STS list that got me an unusual number of private emails.

Here’s how you run a highly concurrent website on a shoe-string budget:

The first thing you’ll do is get a Linode server because they have the fastest CPU and disk.

Install Apache with your web application running under mod_php, mod_perl or some other persistence engine for your language. Then you get famous and start getting emails about people not being able to access your website.

You increase the number of Apache threads or processes (depending on which Apache MPM you’re using) until you can’t anymore because you only have 360MB of memory in your server.

Then you’ll lower the KeepaliveTimeout and eventually disable Keepalive so that more users can access your website without tying up your Apache processes. Your users will slow down a little because they now have to re-establish a new connection for every piece of your website they want to fetch, but you’ll be able to serve more of them.

But as you scale up you will get a few more emails about your server being down. Even though  you’ve disabled keepalive it still takes time for each Apache child to send data to users, especially if they’re on slow connections or connections with high latency. Here’s what you do next:

Install Nginx on your new Linode box and get it to listen on Port 80. Then reconfigure Apache so that it listens on another port – say port 81 – and can only be accessed from the local machine. Configure Nginx as a reverse proxy to Apache listening on port 81 so that it sits in front of Apache like so:

YourVisitor <—–> Nginx:Port80 <—–> Apache:Port81

Enable Keepalive on Nginx and set the Keepalive timeout as high as you’d like. Disable Keepalive on Apache – this is just-in-case because Nginx’s proxy engine doesn’t support Keepalive to the back-end servers anyway.

The 10 or so Apache children you’re running will be getting requests from a client (Nginx) that is running locally. Because there is zero latency and a huge amount of bandwidth (it’s a loopback request), the only time Apache takes to handle the request is the amount of CPU time it actually takes to handle the request. Apache children are no longer tied up with clients on slow connections. So each request is handled in a few microseconds, freeing up each child to do a hell of a lot more work.

Nginx will occupy about 5 to 10 Megs of Memory. You’ll see thousands of users concurrently connected to it. If you have Munin loaded on your server check out the netstat graph. Bitchin isn’t it? You’ll also notice that Nginx uses very little CPU – almost nothing in fact. That’s because Nginx is designed using a single threaded model where one thread handles a huge number of connections. It can do this with little CPU usage because it uses a feature in the Linux kernel called epoll().

Footnotes:

Lack of time forced me to leave out all explanations on how to install and configure Nginx (I’m assuming you know Apache already) – but the Nginx Wiki is excellent, even if the Russain translation is a little rough.

I’ve also purposely left out all references to solving disk bottlenecks (as I’ve left out a discussion about browser caching) because there has been a lot written about this and depending on what app or app-server you’re running, there are some very standard ways to solve IO problems already. e.g. Memcached, the InnoDB cache for MySQL, PHP’s Alternative PHP Cache, perstence engines that keep your compiled code in memory, etc..etc..

This technique works to speed up any back-end application server that uses a one-thread-per-connection model. It doesn’t matter if it’s Ruby via FastCGI, Mod_Perl on Apache or some crappy little Bash script spitting out data on a socket.

This is a very standard config for most high traffic websites today. It’s how they are able to leave keepalive enabled and handle a huge number of concurrent users with a relatively small app server cluster.  Lighttpd and Nginx are the two most popular free FSM/epoll web servers out there and Nginx is the fastest growing, best designed (IMHO) and the one I use to serve 400 requests per second on a small Apache cluster. It’s also what guys like WordPress.com use.

No-latency SSH sessions on a 5Ghz WiFi router with 250mw radio

Disclaimer: You may brick your fancy new Linksys router by following the advice in this blog entry. A large number of folks have installed this software successfully including me. But consider yourself warned in case you’re the unlucky one.

I use SSH a lot. My wife and nephew love streaming video like Hulu instead of regular cable. For the last few years there’s been a cold war simmering. I’m working late, they start streaming, and my SSH session to my server gets higher latency. So every time I hit a keystroke it takes 0.3 seconds to appear instead of 0.01. Try hitting 10,000 keystrokes in an evening and you’ll begin to understand why this sucks.

I’ve tried screwing with the QoS settings on my Linksys routers but it doesn’t help at all. I ran across a bunch of articles explaining how it’s useless to try to use QoS because it only modifies your outgoing bandwidth and can’t change the speed at which routers on the Internet send you traffic.

Well that’s all bullshit. Here’s how you fix it:

Upgrade the firmware on your router to DD-WRT. Here’s the list of supported devices. I have a WRT320N Linksys router. It’s a newer router that has both a 2.4 Ghz and 5Ghz radio. Many routers that look new and claim to support “N” actually just have 2.4Ghz radios in them.

The DD-WRT firmware for the WRT320N router is very very new, but it works perfectly. Here’s how you upgrade:

Read Eko’s (DD-WRT author) announcement about WRT320N support here. The standard DD-WRT installation instructions are here so you may want to reference them too. Here’s how I upgraded without bricking my router:

  1. Download the ‘mini’ DD-WRT here.
  2. Open all the links in this blog entry in other browser windows in case you need to refer to them for troubleshooting. You’re about to lose your Internet access.
  3. Visit your router’s web interface and take not of all settings – not just your wireless SSID and keys but your current MAC address on your Internet interface too. I had to clone this once DD-WRT started up because my ISP hard-codes MAC addresses on their side and filters out any unauthorized MAC’s. I’d suggest printing the settings direct from your web browser.
  4. Use the web interface (visit http://192.168.1.1/ usually) and reset your router to factory default settings.
  5. You’ll need to log into your router again. For linksys the default login is a blank username and the password ‘admin’.
  6. Use Internet Explorer to upgrade the firmware using your router’s web interface. Apparently Firefox has a bug on some Linksys routers so don’t use that.
  7. Wait for the router to reboot.
  8. Hit http://192.168.1.1/ with your web browser and change your router’s default username and password.
  9. Go to the Clone MAC address option and set it to your old Internet MAC address
  10. Set up your wireless with the old SSID and key
  11. Confirm you can connect to the router via WiFi and have Internet Access.

Now the fun part:

  1. Go to Wireless, Advanced settings, and scroll down to TX Power. You can boost your transmit signal all the way to 251mw. Boosting it by about 70mw should be safe according to the help. I’ve actually left mine as is to increase my radio’s life, but nice to know I have that.
  2. Go to the NAT/QoS menu and hit the QoS tab on the right. Enable QoS. Add your machine’s MAC address. Set the priority to Premium (not Exempt because that does nothing). Hit Apply Settings. Every other machine now has a default priority of Standard and your traffic will be expedited.
  3. For Linux Geeks: Click the services tab and enable SSHd. Then ssh to your router’s IP, usually 192.168.1.1. Log in as root and whatever password you chose for your router. I actually changed my username to ‘admin’ but the username seems to stay root for ssh.

You can use a lot of standard linux commands in SSH – it’s busybox linux. Type:

cat /proc/net/ip_conntrack | grep <YourIPAddress>

Close to the end of each line you’ll see a mark= field. For your IP address it should have mark=10 for all your connections. Everyone else should be mark=0. The values mean:

  • Exempt: 100
  • Premium: 10
  • Express: 20
  • Standard: 30
  • Bulk: 40
  • (no QoS matched): 0

Remember if no QoS rule is matched the traffic is Standard priority if you have QoS enabled on the router. So you are Premium and everyone else is standard. Much more detail is available on the QoS DD-WRT Wiki here.

The Linux distro is quite amazing. There are over 1000 packages available for DD-WRT including Perl, PHP and MySQL in case you’d like to write a blogging platform for your Linksys router. To use this you’re going to have to upgrade your firmware to the ‘big’ version of the WRT320N binary. Don’t upgrade directly from Linksys firmware to the ‘big’ DD-WRT – Ecko recommends upgrading to mini first and then upgrading to ‘big’. Also note I haven’t tried running ‘big’ on the WRT320N because I’m quite happy with QoS and a more powerful radio.

There are detailed instructions on how to get Optware up and running once you’re running ‘big’ on the Wiki. It includes info on how to install a throttling HTTP server, Samba2 for windows networking and a torrent client.

If you’d like to run your WRT320N at 5Ghz the DD-WRT forums suggest switching wireless network mode to ‘NA-only’ but that didn’t work for my Snow Leopard OS X machine. When I was running Linksys I had to use 802.11A to make 5Ghz work for my macbook. And likewise for this router I run A-only. You can confirm you’re at 5Ghz by holding down the ‘option’ key on your macbook and clicking the wifi icon on top right.

I prefer 5Ghz because the spectrum is quieter, but 5Ghz doesn’t have the distance through air that 2.4 Ghz does. So boosting your TX power will give you the same distance with a clear spectrum while all your neighbors fight over teh 2.4Ghz band.

What the Web Sockets Protocol means for web startups

Ian Hickson’s latest draft of the Web Sockets Protocol (WSP) is up for your reading pleasure. It got me thinking about the tangible benefits the protocol is going to offer over the long polling that my company and others have been using for our real-time products.

The protocol works as follows:

Your browser accesses a web page and loads, lets say, a javascript application. Then the javascript application decides it needs a constant flow of data to and from it’s web server. So it sends an HTTP request that looks like this:

GET /demo HTTP/1.1
Upgrade: WebSocket
Connection: Upgrade
Host: example.com
Origin: http://example.com
WebSocket-Protocol: sample

The server responds with an HTTP response that looks like this:

HTTP/1.1 101 Web Socket Protocol Handshake
Upgrade: WebSocket
Connection: Upgrade
WebSocket-Origin: http://example.com
WebSocket-Location: ws://example.com/demo
WebSocket-Protocol: sample

Now data can flow between the browser and server without having to send HTTP headers until the connection is broken down again.

Remember that at this point, the connection has been established on top of a standard TCP connection. The TCP protocol provides a reliable delivery mechanism so the WSP doesn’t have to worry about that. It can just send or receive data and rest assured the very best attempt will be made to deliver it – and if delivery fails it means the connection has broken and WSP will be notified accordingly. WSP is not limited to any frame size because TCP takes care of that by negotiating an MSS (maximum segment size) when it establishes the connection. WSP is just riding on top of TCP and can shove as much data in each frame as it likes and TCP will take care of breaking that up into packets that will fit on the network.

The WSP sends data using very lightweight frames. There are two ways the frames can be structured. The first frame type starts with a 0×00 byte (zero byte), consists of UTF-8 text and ends with a 0xFF byte with the UTF-8 text in between.

The second WSP frame type starts with a byte that ranges from 0×80 to 0xFF, meaning the byte has the high-bit (or left-most binary bit) set to 1. Then there is a series of bytes that all have the high-bit set and the 7 right most bits define the data length. Then there’s a final byte that doesn’t have the high-bit set and the data follows and is the length specified. This second WSP frame type is presumably for binary data and is designed to provide some future proofing.

If you’re still with me, here’s what this all means. Lets say you have a web application that has a real-time component. Perhaps it’s a chat application, perhaps it’s Google Wave, perhaps it’s something like my Feedjit Live that is hopefully showing a lot of visitors arriving here in real-time. Lets say you have 100,000 people using your application concurrently.

The application has been built to be as efficient as possible using the current HTTP specification. So your browser connects and the server holds the connection open and doesn’t send the response until there is data available. That’s called long-polling and it avoids the old situation of your browser reconnecting every few seconds and getting told there’s no data yet along with a full load of HTTP headers moving back and forward.

Lets assume that every 10 seconds the server or client has some new data they need to send to each other. Each time a full set of client and server headers are exchanged. They look like this:

GET / HTTP/1.1
User-Agent: ...some long user agent string...
Host: markmaunder.com
Accept: */*

HTTP/1.1 200 OK
Date: Sun, 25 Oct 2009 17:32:19 GMT
Server: Apache
X-Powered-By: PHP/5.2.3
X-Pingback: http://markmaunder.com/xmlrpc.php
Connection: close
Transfer-Encoding: chunked
Content-Type: text/html; charset=UTF-8

That’s 373 bytes of data. Some simple math tells us that 100,000 people generating 373 bytes of data every 10 seconds gives us a network throughput of 29,840,000 bits per second or roughly 30 Megabits per second.

That’s 30 Mbps just for HTTP headers.

With the WSP every frame only has 2 bytes of packaging. 100,000 people X 2 bytes = 200,000 bytes per 10 seconds or 160 Kilobits per second.

So WSP takes 30 Mbps down to 160 Kbps for 100,000 concurrent users of your application. And that’s what Hickson and the WSP team and trying to do for us.

Google would be the single biggest winner if the WSP became standard in browsers and browser API’s like Javascript. Google’s goal is to turn the browser into an operating system and give their applications the ability to run on any machine that has a browser. Operating systems have two advantages over browsers: They have direct access to the network and they have local file system storage. If you solve the network problem you also solve the storage problem because you can store files over the network.

Hickson is also working on the HTML 5 specification for Google, but the current date the recommendation is expected to be ratified is 2022. WSP is also going to take time to be ratified and then incorporated into Javascript (and other) API’s. But it is so strategically important for Google that I expect to see it in Chrome and in Google’s proprietary web servers in the near future.

Routers treat HTTPS and HTTP traffic differently

OSI Network Model

Well the title says it all. Internet routers live at Layer 3 [the Network Layer] of the OSI model which I’ve included to the left. HTTP and HTTPS live at Layer 7 (Application layer) of the OSI model, although some may argue HTTPS lives at Layer 6.

So how is it that Layer 3 devices like routers treat HTTPS traffic differently?

Because HTTPS servers set the DF or Do Not Fragment IP flag on packets and regular HTTP servers do not.

This matters because HTTP and HTTPS usually transfer a lot of data. That means that the packets are usually quite large and are often the maximum allowed size.

So if a server sends out a very big HTTP packet and it goes through a route on the network that does not allow packets that size, then the router in question simply breaks the packet up.

But if a server sends out a big HTTPS packet and it hits a route that doesn’t allow packets that size, the routers on that route can’t break the packet up. So they drop the packet and send back an ICMP message telling the machine that sent the big packet to adjust it’s MTU (maximum transfer unit) size and resend the packet. This is called Path MTU Discovery.

This can create some interesting problems that don’t exist with plain HTTP. For example, if your ops team has gotten a little overzealous with security and decided to filter out all ICMP traffic, your web server won’t receive any of those ICMP messages I’ve described above telling it to break up it’s packets and resend them. So large secure packets that usually are sent halfway through a secure HTTPS connection will just be dropped. So visitors to your website who are across network paths that need to have their packets broken up into smaller pieces will see half-loaded pages from the secure part of your site.

If you have the problem I’ve described above there are two solutions: If you’re a webmaster, make sure your web server can receive ICMP messages [You need to allow ICMP code 4 "Fragmentation needed and DF bit set"]. If you’re a web surfer (client) and are trying to access a secure site that has ICMP disabled, adjust your network card’s MTU to be smaller than the default (usually the default is 1500 for ethernet).

But the bottom line is that if everything else is working fine and you are having a problem sending or receiving HTTPS traffic, know that the big difference with HTTPS traffic over regular web traffic is that the packets can’t be broken up.

How to upgrade your server BIOS on Linux without a floppy drive

This is another thing I just couldn’t find no matter how hard I googled. Here’s the story behind this post. Scroll down if you want to get at the useful stuff.

I run a cluster of Dell 2950′s and I just ordered second CPU’s (Intel XEON E5410 64 bit) for all the machines. I test upgraded one of them and the LCD on the front came up orange with an error message and the chassis cooling fan cranked all the way up to high. Of course I ignored the instructions that came with the CPU’s that said UPGRADE THE BIOS AND BMC BEFORE YOU INSTALL THIS.

ATTEMPT #1: I tried to create a Bootable USB flash (pen) drive using various utilities from HP and elsewhere but I couldn’t get my Dell 2950′s to boot into the drive. I even bought an HP Flash Floppy Key and I couldn’t get my workstations to boot into it when switched into floppy mode. I didn’t try it on the Dells because by then I’d discovered the method below. Interestingly, once I upgrade my Dell 2950 Bios’s I noticed it actually REMOVES the option to boot into a USB device from the BIOS menu. So using the method below with Linux and Grub is definitelly preferable – and it probably boots slightly faster because hard-drives are faster than USB 2.0.

ATTEMPT #2: I got hold of a USB floppy drive, made a DOS bootable disk and upgraded the BIOS and BMC. A week earlier I was in Fry’s joking with my wife holding up a box of 1.44″ disks saying “Who uses these?!” Now I know. The problem was the BIOS upgrade and BMC upgrade was very very slow from a floppy disk. It took forever to load the BIOS upgrade software into memory. And that meant a lot of down-time for our users while I upgrade the whole cluster.

Here’s the solution:

–USEFUL STUFF–

If you’re running any flavour of Linux using Grub as your boot loader and you need to upgrade your BIOS from a floppy drive, and you don’t have a USB floppy drive or you don’t want to use one because they’re so damn slow, then here’s the trick. This is taken from David Backeberg’s page at MIT which seems to be offline at the moment. I had a very hard time finding his advice so I’m echoing much of it here. I’ve removed steps to compile memdisk because they’re unneccesary and I also don’t use autoexec.bat because I prefer to manually launch the bios upgrade on each machine so that I can shut it down immediatelly afterwards in order to upgrade the hardware.

  1. Go to FreeDOS floppies and download the OEM bootdisk. (NOTE: I’ve tried to use the 2.88 Disk that FreeDOS provides but it doesn’t mount with dosemu)
  2. Unzip the file you downloaded: unzip FDOEM.144.imz
  3. Rename the image to something useful: mv FDOEM.144.img dell_bios_floppy.img
  4. Setup the loopback device (Try /dev/loop0 if loop2 doesn’t exist): losetup /dev/loop2 dell_bios_floppy.img
  5. Install dosemu. Instructions for Ubuntu: (apt-get install dosemu)
  6. Edit /etc/dosemu/dosemu.conf and add (or edit the floppy_a line) to say: $_floppy_a = “threeinch:/dev/loop2″
  7. Check where the c_drive is in your dosemu.conf. It’s usually at /root/.dosemu/c_drive
  8. Copy your BIOS flash executable to the fake C Drive and give it a 8.3 style name: cp PE123456789.EXE /root/.dosemu/c_drive/BIOSUP.EXE
  9. Start dosemu: dosemu
  10. If you start Dosemu and you see a blank screen, try typing ‘cls’ and hit enter.
  11. Feels good being in a DOS shell on linux doesn’t it? Don’t ask me why – nostalgia maybe.
  12. Copy your BIOS exe from C drive to your A drive image: copy C:\BIOSUP.EXE a:\
  13. Type exitemu to exit dosemu
  14. Unloop your loopback device: losetup -d /dev/loop2 (or loop0 if you used that)

You now have a floppy image you can boot into that contains your BIOS exe file. If you are also upgrading your BMC or other components that require booting into a floppy and executing files, you can try to fit those files on the floppy using the above steps. If they don’t fit then you need to create a second floppy image using the above steps and add a second entry to your menu.lst file in the steps below.

Now you need to set up Grub to give you the option to boot into your new floppy image when you reboot your machine:

  1. First install memdisk. If you’re running Ubuntu, memdisk is in the syslinux package: apt-get install syslinux
  2. Copy your dell BIOS floppy into /boot: cp /root/dell_bios_floppy.img /boot/
  3. I like to put a copy of memdisk into /boot:  cp /usr/lib/syslinux/memdisk /boot/
  4. Edit Grub’s menu.lst file. On Ubuntu it’s in /boot/grub/menu.lst. Add the following lines – and change (hd0,4) to whatever your harddrive setting is – look at other entries in menu.lst to figure it out.

title DELL Bios flash 1
root (hd0,4)
kernel /boot/memdisk
initrd /boot/dell_bios_floppy.img

That’s it! Reboot. Hit ESC when you see the grub menu. There should be a new option labled “DELL Bios flash 1″. Select it and boot into FreeDOS. Run your bios update.

Please add comments if you have any tips for other flavors of Linux.

Linux is Obsolete!

A lame video on techcrunch today inspired me to go hunting for the original argument between Linus Torvalds and (Professor) Andy Tanenbaum and here it is. Titled Linux is Obsolete, it’s a post by the author of Minix in 1992 telling Linus he’s just created an obsolete OS that’s running on obsolete hardware (the 386) that won’t be around in a few years.

Andy’s ideas are a great example of how an academic approach to software design can lead to layers of abstraction that kill performance. You see this mistake often in web applications because web development teams are separate from the operations team and don’t have to think about performance under load. So their focus stays on the manageability of the code base rather than its performance. They make language choices and design decisions that help them write beautiful code in as few lines as possible that any university professor would be proud of.

Find me an ops guy who loves Ruby on Rails and I’ll find you a dev who loves hand-crafting SQL statements.

Here is AST’s original email:

Subject: LINUX is obsolete
From: ast@cs.vu.nl (Andy Tanenbaum)
Date: 29 Jan 92 12:12:50 GMT
Newsgroups: comp.os.minix
Organization: Fac. Wiskunde & Informatica, Vrije Universiteit, Amsterdam

I was in the U.S. for a couple of weeks, so I haven't commented much on
LINUX (not that I would have said much had I been around), but for what
it is worth, I have a couple of comments now.

As most of you know, for me MINIX is a hobby, something that I do in the
evening when I get bored writing books and there are no major wars,
revolutions, or senate hearings being televised live on CNN.  My real
job is a professor and researcher in the area of operating systems.

As a result of my occupation, I think I know a bit about where operating
are going in the next decade or so.  Two aspects stand out:

1. MICROKERNEL VS MONOLITHIC SYSTEM
   Most older operating systems are monolithic, that is, the whole operating
   system is a single a.out file that runs in 'kernel mode.'  This binary
   contains the process management, memory management, file system and the
   rest. Examples of such systems are UNIX, MS-DOS, VMS, MVS, OS/360,
   MULTICS, and many more.

   The alternative is a microkernel-based system, in which most of the OS
   runs as separate processes, mostly outside the kernel.  They communicate
   by message passing.  The kernel's job is to handle the message passing,
   interrupt handling, low-level process management, and possibly the I/O.
   Examples of this design are the RC4000, Amoeba, Chorus, Mach, and the
   not-yet-released Windows/NT.

   While I could go into a long story here about the relative merits of the
   two designs, suffice it to say that among the people who actually design
   operating systems, the debate is essentially over.  Microkernels have won.
   The only real argument for monolithic systems was performance, and there
   is now enough evidence showing that microkernel systems can be just as
   fast as monolithic systems (e.g., Rick Rashid has published papers comparing
   Mach 3.0 to monolithic systems) that it is now all over but the shoutin`.

   MINIX is a microkernel-based system.  The file system and memory management
   are separate processes, running outside the kernel.  The I/O drivers are
   also separate processes (in the kernel, but only because the brain-dead
   nature of the Intel CPUs makes that difficult to do otherwise).  LINUX is
   a monolithic style system.  This is a giant step back into the 1970s.
   That is like taking an existing, working C program and rewriting it in
   BASIC.  To me, writing a monolithic system in 1991 is a truly poor idea.

2. PORTABILITY
   Once upon a time there was the 4004 CPU.  When it grew up it became an
   8008.  Then it underwent plastic surgery and became the 8080.  It begat
   the 8086, which begat the 8088, which begat the 80286, which begat the
   80386, which begat the 80486, and so on unto the N-th generation.  In
   the meantime, RISC chips happened, and some of them are running at over
   100 MIPS.  Speeds of 200 MIPS and more are likely in the coming years.
   These things are not going to suddenly vanish.  What is going to happen
   is that they will gradually take over from the 80x86 line.  They will
   run old MS-DOS programs by interpreting the 80386 in software.  (I even
   wrote my own IBM PC simulator in C, which you can get by FTP from
   ftp.cs.vu.nl =  192.31.231.42 in dir minix/simulator.)  I think it is a
   gross error to design an OS for any specific architecture, since that is
   not going to be around all that long.

   MINIX was designed to be reasonably portable, and has been ported from the
   Intel line to the 680x0 (Atari, Amiga, Macintosh), SPARC, and NS32016.
   LINUX is tied fairly closely to the 80x86.  Not the way to go.

Don`t get me wrong, I am not unhappy with LINUX.  It will get all the people
who want to turn MINIX in BSD UNIX off my back.  But in all honesty, I would
suggest that people who want a **MODERN** "free" OS look around for a
microkernel-based, portable OS, like maybe GNU or something like that.

Andy Tanenbaum (ast@cs.vu.nl)

P.S. Just as a random aside, Amoeba has a UNIX emulator (running in user
space), but it is far from complete.  If there are any people who would
like to work on that, please let me know.  To run Amoeba you need a few 386s,
one of which needs 16M, and all of which need the WD Ethernet card.

Anycasting anyone?

[Thanks Sam for the idea for this entry] Ever heard of IP Anycasting? Thanks to my recent change from godaddy (frowny face and no link) to dnsmadeeasy (happy face and they get a link) I’m now using a DNS provider that provides anycasting. What is it and should you care?

IP Anycasting is assigning the same IP address to multiple instances of the same service on strategic points in the network. For example, if you are a DNS provider, you might have servers in New York, London and Los Angeles with the same IP address. Then when a surfer in San Diego (about 80 Miles South of Los Angeles) makes a request to your DNS system the server in Los Angeles answers and saves the network from having to route traffic to New York or London.

Anycasting is generally used to distribute load geographically and to mitigate the effect of distributed denial of service attacks. It’s been used by the F root server since November 2002 and has saved good ole F from getting taken down by several DDoS attacks.

I was using dnspark.net a couple of years ago and we had a few hours of down-time while they were hit by a DDoS attack – so it’s not as uncommon as you think. [They obviously don't use anycasting]

Anycasting is suitable for DNS because DNS uses a connectionless session layer protocol called UDP. One packet is sent, a response is received and hey, if the response isn’t received the client just tries another DNS server. [This occurs in the vast majority of DNS queries. There are a small number of exceptions where DNS uses TCP.]

Anycasting is not ideally suited for TCP connections like web browser-server communication because TCP is connection oriented. For example, TCP requires a 3 way handshake to establish the connection. If the network topology changes and one packet is sent to the Los Angeles server and another is sent to New York it breaks TCP because the New York server doesn’t know about the session that Los Angeles has started establishing.

That’s the theory anyway, but if the network topology stays reasonably stable and you don’t mind a few sessions breaking when the topology does change then perhaps you’ll consider using Anycasting with your web servers. But don’t get too creative and launch a content delivery network. Akamai might sue you and they’ll probably win. They own patent No. 6,108,703 which covers a “global hosting system” in which “a base HTML document portion of a Web page is served from the Content Provider’s site while one or more embedded objects for the page are served from the hosting servers, preferably, those hosting servers near the client machine.” Akamai just won a case against competitor Limelight for violating that patent and the case is now heading to the appeal courts.

There are other protocols that are connectionless and therefore well suited for Anycasting like SNTP and SNMP but there isn’t much demand for these because they’re network management protocols and don’t experience the massive load that more public protocols like DNS, SMTP and HTTP get.

Deploying an anycast network is not something you’re likely to consider in the near future unless you’re eBay or Google, but outsourcing some of your services like DNS to an anycast provider is something that’s worked well for me and might work for you.

Microsoft Buzzquotes

“My machine overnight could process my in-box, analyze which ones were probably the most important, but it could go a step further,” he said. “It could interpret some of them, it could look at whether I’ve ever corresponded with these people, it could determine the semantic context, it could draft three possible replies. And when I came in in the morning, it would say, hey, I looked at these messages, these are the ones you probably care about, you probably want to do this for these guys, and just click yes and I’ll finish the appointment.” ~Craig Mundie from Microsoft in today’s NY Times

Sounds like Microsoft is working on a Positronic Brain rather than writing software for multi-core processors.

Server Downtime == Police Baricades and Angry World Series Fans

Paciolan is managing ticket sales for the Colorado Rockies. Their servers were hit with over 1500 requests per second and it took down not only the Rockies ticket sales infrastructure, but all Paciolans other customers too.

They claim to have been hit by a DDoS attack, but that’s something that’s hard to prove or disprove when you have corporate firewalls and AOL firewalls sending many requests from a single IP – it looks just like a DDoS attack but it actually isn’t.

Is 1500 requests per second a lot? No. Feedjit (my site) peaks at 140 requests per second and it does it with just two servers – and the data it’s serving is dynamic.

So a cluster of 10 to 30 servers should easily handle the load they’ve described – especially if all it’s doing is queueing visitors and only letting a handful through, which is what Paciolan’s ticketing software does.

The result? Police are erecting barricades around Coors Field. Here’s a quote from cNet:

“…many fans are apparently converging near Coors Field in hopes that the team will sell tickets in person through the box office; so many in fact that the police have closed streets around the ballpark and are erecting barricades, the paper reported.”

Ticketmaster is trying to buy Paciolan – the deal is currently under government review. Ticketmaster runs Mod_Perl (and so does Feedjit) and some very smart people who know a lot about scalability (and who I used to work with) work for Ticketmaster. So hopefully the deal will go through and mod_perl will come to the rescue.

btw, I’m doing a short talk in 2 days on how to scale your web servers fast based on my experience scaling Feedjit.

MacBooks auto-detect crossover/non-crossover ethernet modes

I take my dev server and my workstation everywhere with me in a single small backpack.
My “dev server” is an intel macbook that dual-boots Linux and OSX. My workstation is a windows laptop. Most of my work is done with the macbook booted into Linux and my windows laptop running an SSH client that I use to write all my code on the linux/macbook. I do this because most of my users run Windows and I can write code in my ssh client and test immediately on Firefox, IE7, IE6 (using Virtual PC) and Opera. I also do graphic design in Fireworks on my windoze laptop.

So most of the time my macbook sits quietly in the corner and plays server, unless I need to boot into OSX to test something in Safari.

On my recent road trip I needed to do some dev and the wireless network at the hotel had a firewall set up so that one machine on the wireless network could not connect to another machine on the same network. So my windows laptop couldn’t connect to my macbook server.

I had two ethernet cables with me and no hub. So I googled how to make a crossover cable to connect the two laptops directly to each other using a single ethernet cable and no hub.

I started cutting up one of the cables with my leatherman and after I’d stripped the insulation off but right before I’d cut the first wire, just for the hell of it I plugged the cable into the two machines without it being crossed over.

…and it worked.

I couldn’t believe it. Finally someone has designed an ethernet port that auto-detects if it needs to be in crossover or regular mode. As far as I can tell it’s the macbook that’s doing this piece of pure genius.

So now I have a half trashed crossover cable – but it still works – and I can connect my dev workstation and “server” directly to each other with any old ethernet cable.

Sometimes Apple sucks. But sometimes they well and truly rock!!