How to handle 1000’s of concurrent users on a 360MB VPS

There has been some recent confusion about how much memory you need in a web server to handle a huge number of concurrent requests. I also made a performance claim on the STS list that got me an unusual number of private emails.

Here’s how you run a highly concurrent website on a shoe-string budget:

The first thing you’ll do is get a Linode server because they have the fastest CPU and disk.

Install Apache with your web application running under mod_php, mod_perl or some other persistence engine for your language. Then you get famous and start getting emails about people not being able to access your website.

You increase the number of Apache threads or processes (depending on which Apache MPM you’re using) until you can’t anymore because you only have 360MB of memory in your server.

Then you’ll lower the KeepaliveTimeout and eventually disable Keepalive so that more users can access your website without tying up your Apache processes. Your users will slow down a little because they now have to re-establish a new connection for every piece of your website they want to fetch, but you’ll be able to serve more of them.

But as you scale up you will get a few more emails about your server being down. Even though¬† you’ve disabled keepalive it still takes time for each Apache child to send data to users, especially if they’re on slow connections or connections with high latency. Here’s what you do next:

Install Nginx on your new Linode box and get it to listen on Port 80. Then reconfigure Apache so that it listens on another port – say port 81 – and can only be accessed from the local machine. Configure Nginx as a reverse proxy to Apache listening on port 81 so that it sits in front of Apache like so:

YourVisitor <—–> Nginx:Port80 <—–> Apache:Port81

Enable Keepalive on Nginx and set the Keepalive timeout as high as you’d like. Disable Keepalive on Apache – this is just-in-case because Nginx’s proxy engine doesn’t support Keepalive to the back-end servers anyway.

The 10 or so Apache children you’re running will be getting requests from a client (Nginx) that is running locally. Because there is zero latency and a huge amount of bandwidth (it’s a loopback request), the only time Apache takes to handle the request is the amount of CPU time it actually takes to handle the request. Apache children are no longer tied up with clients on slow connections. So each request is handled in a few microseconds, freeing up each child to do a hell of a lot more work.

Nginx will occupy about 5 to 10 Megs of Memory. You’ll see thousands of users concurrently connected to it. If you have Munin loaded on your server check out the netstat graph. Bitchin isn’t it? You’ll also notice that Nginx uses very little CPU – almost nothing in fact. That’s because Nginx is designed using a single threaded model where one thread handles a huge number of connections. It can do this with little CPU usage because it uses a feature in the Linux kernel called epoll().

Footnotes:

Lack of time forced me to leave out all explanations on how to install and configure Nginx (I’m assuming you know Apache already) – but the Nginx Wiki is excellent, even if the Russain translation is a little rough.

I’ve also purposely left out all references to solving disk bottlenecks (as I’ve left out a discussion about browser caching) because there has been a lot written about this and depending on what app or app-server you’re running, there are some very standard ways to solve IO problems already. e.g. Memcached, the InnoDB cache for MySQL, PHP’s Alternative PHP Cache, perstence engines that keep your compiled code in memory, etc..etc..

This technique works to speed up any back-end application server that uses a one-thread-per-connection model. It doesn’t matter if it’s Ruby via FastCGI, Mod_Perl on Apache or some crappy little Bash script spitting out data on a socket.

This is a very standard config for most high traffic websites today. It’s how they are able to leave keepalive enabled and handle a huge number of concurrent users with a relatively small app server cluster.¬† Lighttpd and Nginx are the two most popular free FSM/epoll web servers out there and Nginx is the fastest growing, best designed (IMHO) and the one I use to serve 400 requests per second on a small Apache cluster. It’s also what guys like WordPress.com use.

Costs and Startups – Advice for your CFO

In any company if you save $1 it goes straight to your bottom line. Meaning it’s as if you just earned another $1. The company that my wife and I have been running for about 2 years now serves over 30 Million page requests per day. We’ve invested a lot of time in getting more performance out of our hardware but about 6 months ago we started hitting pesky issues like limits on the speed of light and electrons.

So we’ve had to keep growing without going out and buying a Google-size web cluster. A lot of the wins we’ve had have been simply using every spare drop of capacity we can find. I’ve noticed a pattern during the last 6 months. It goes something like this:

Kerry (My wife, our CFO, and keeper of the graphs): Server 12 is hitting 10. [Meaning it has a load average of 10 on an 8 CPU machine which is 125% load]

Me: OK Dell has this great special on these new R410 servers that are about twice as fast as the previous generation.

Kerry: What about the other machines in the cluster?

Me: They’re already at 80%.

Kerry: OK what else do we have?

Me: Well the crawlers are maxed, the mail server’s maxed, the proxy’s maxed out, the load balancer is maxed….

Kerry: What about 25 and 26? They’re sitting at 2.

Me: Well we’d have to [technical and managerial speak explaining how complicated it's going to be to implement]

Kerry: OK so lets do that.

Me: [More bullcrap this time rolling out the big guns desperately trying to get money for new toys]

Kerry: …[waits it out]

Me: OK so lets do that.

If you’re a CFO approving purchase decisions in your company, take it from me: Geeks and CEO’s alike love buying new stuff. I assure you there isn’t a web cluster or database cluster on this planet that you can’t squeeze a little more capacity out of without breaking things. So before you take the [technical and managerial bullcrap from your geeks and CEO] at face value, sit down with your team and have them explain all the data to you and go through all your resources with a fine tooth comb. Then, if you absolutely have to, spend some money.

And if you don’t have a CFO, nominate someone immediately!! It doesn’t matter how small you are, someone had better be the keeper of the cash-flow plan or you’re going to run out of money and wonder why.

Incidentally, this is the load decrease on one of the busiest servers in our cluster when we brought online some ‘found’ capacity earlier today.

Screen shot 2009-11-01 at 2.29.56 PM

Posted on Hacker News.

How to integrate PHP, Perl and other languages on Apache

I have this module that a great group of guys in Malaysia have put together. But their language of choice is PHP and mine is Perl. I need to modify it slightly to integrate it. For example, I need to add my own session code so that their code knows if my user is logged in or not and who they are.

I started writing PHP but quickly started duplicating code I’d already written in Perl. Fetch the session from the database, de-serialize the session data, that sort of thing. I also ran into issues trying to recreate my Perl decryption routines in PHP. [I use non-mainstream ciphers]

Then I found ways to run Perl inside PHP and vice-versa. But I quickly realized that’s a very bad idea. Not only are you creating a new Perl or PHP interpreter for every request, but you’re still duplicating code, and you’re using a lot more memory to run interpreters in addition to what mod_php and mod_perl already run.

Eventually I settled on creating a very lightweight wrapper function in PHP called doPerl. It looks like this:

$associativeArrayResult = doPerl(functionName, associativeArrayWithParameters);

function doPerl($func, $arrayData){
 $ch = curl_init();
 $ip = '127.0.0.1';
 $postData = array(
 json => json_encode($arrayData),
 auth => 'myPassword',
 );
 curl_setopt($ch,CURLOPT_POST, TRUE);
 curl_setopt($ch,CURLOPT_POSTFIELDS, $postData);
 curl_setopt($ch, CURLOPT_URL, "http://" . $ip . "/webService/" . $func . "/");
 curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
 $output = curl_exec($ch);
 curl_close($ch);
 $data = json_decode($output, TRUE);
 return $data;
}

On the other side I have a very fast mod_perl handler that only allows connections from 127.0.0.1 (the local machine). I deserialise the incoming JSON data using Perl’s JSON::from_json(). I use eval() to execute the function name that is, as you can see above, part of the URL. I reserialize the result using Perl’s JSON::to_json($result) and send it back to the PHP app as the HTML body.

This is very very fast because all PHP and Perl code that executes is already in memory under mod_perl or mod_php. The only overhead is the connection creation, sending of packet data across the network connection and connection breakdown. Some of this is handled by your server’s hardware. [And of course the serialization/deserialization of the JSON data on both ends.]

The connection creation is a three way handshake, but because there’s no latency on the link it’s almost instantaneous. The transferring of data is faster than a network because the MTU on your lo interface (the 127.0.0.1 interface) is 16436 bytes instead of the normal 1500 bytes. That means the entire request or response fits inside a single packet. And connection termination is again just two packets from each side and because of the zero latency it’s super fast.

I use JSON because it’s less bulky than XML and on average it’s faster to parse across all languages. Both PHP and Perl’s JSON routines are ridiculously fast.

My final implementation on the PHP side is a set of wrapper classes that use the doPerl() function to do their work. Inside the classes I use caching liberally, either in instance variables, or if the data needs to persist across requests I use PHP’s excellent APC cache to store the data in shared memory.

Update: On request I’ve posted the perl web service handler for this here. The Perl code allows you to send parameters via POST using either a query parameter called ‘json’ and including escaped JSON that will get deserialized and passed to your function, or you can just use regular post style name value pairs that will be sent as a hashref to your function. I’ve included one test function called hello() in the code. Please note this web service module lets you execute arbitrary perl code in the web service module’s namespace and doesn’t filter out double colon’s, so really you can just do whatever the hell you want. So I’ve included two very simple security mechanisms that I strongly recommend you don’t remove. It only allows requests from localhost, and you must include an ‘auth’ post parameter containing a password (currently set to ‘password’). You’re going to have to implement the MM::Util::getIP() routine to make this work and it’s really just a one liner:

sub getIP {
 my $r = shift @_;
 return $r->headers_in->{'X-Forwarded-For'} ? 
    $r->headers_in->{'X-Forwarded-For'} : 
    $r->connection->get_remote_host();
}