Category: Scaling

  • How to limit website visitor bandwidth by country

    This technique is great if you have no customers from countryX but are being targeted by a DoS, unwanted crawlers, bots, scrapers and other baddies. Please don’t use this to discriminate against less profitable countries. The web should be open for all. Thanks.

    If you’re not already using Nginx, you should get it even if you already have a great web server. Put it in front and get it to act as a reverse proxy.

    First grab this perl script which you will use to convert Maxmind’s geo IP database into a format usable by Nginx.

    Then download Maxmind’s latest GeoLite country database in CSV format on this page.

    Then run:

    geo2nginx.pl < maxmind.csv > nginxGeo.txt

    Copy nginxGeo.txt into your nginx config directory.

    Then add the following text in the ‘http’ section of your nginx.conf file:

    geo $country {
    default no;
    include nginxGeo.txt;
    }

    Then add the following in the ‘server’ section of your nginx.conf file:

    if ($country ~ ^(?:US|CA|ES)$ ){
    set $limit_rate 10k;
    }
    if ($country ~ ^(?:BR|ZA)$ ){
    set $limit_rate 20k;
    }

    This limits anyone from the USA, Canada and Spain to a maximum of 10 kilobits per second of bandwidth. It gives anyone from Brazil and South Africa 20 Kbps of bandwidth. Every other country gets the maximum.

    You could use a exclamation character before the tilde (!~) to do the opposite. In other words, if you’re NOT from US, Canada or Spain, you get 10 Kbps, although I strongly advise against this policy.

    Remember that $limit_rate only limits per connection, so the amount of bandwidth each visitor has is $limit_rate X number_of_connections. See below to limit connections.

    Another interesting variable is limit_rate_after. The documentation on this is very very sparse, but from what I’ve gathered it is time based. So the first 1 minute of a connection will get full bandwidth, and then after that the limiting starts. Great for streaming sites I would think.

    There are two other great modules in Nginx but neither of them work inside ‘if’ directives which means you can’t use them to limit by country. They are the Limit Zone module which lets you limit the number of concurrent connections and the Limit Requests module which lets you limit the number of requests over a period of time. The Limit Requests module also has a burst variable which is very useful. Once again the documentation is sparse, but this comment from Igor (Nginx author) sheds some light on how bursting works.

    I’ve enabled all three features on our site. Bandwidth limiting by country, limiting concurrent connections and limiting requests over a time period. I serve around 20 to 40 million requests a day on a single nginx box and I haven’t noticed much performance degradation with the new config. It has quadrupled the size of each nginx process though to about 46M per process, but that’s still a lot smaller than most web server processes.

  • How to handle 1000's of concurrent users on a 360MB VPS

    There has been some recent confusion about how much memory you need in a web server to handle a huge number of concurrent requests. I also made a performance claim on the STS list that got me an unusual number of private emails.

    Here’s how you run a highly concurrent website on a shoe-string budget:

    The first thing you’ll do is get a Linode server because they have the fastest CPU and disk.

    Install Apache with your web application running under mod_php, mod_perl or some other persistence engine for your language. Then you get famous and start getting emails about people not being able to access your website.

    You increase the number of Apache threads or processes (depending on which Apache MPM you’re using) until you can’t anymore because you only have 360MB of memory in your server.

    Then you’ll lower the KeepaliveTimeout and eventually disable Keepalive so that more users can access your website without tying up your Apache processes. Your users will slow down a little because they now have to re-establish a new connection for every piece of your website they want to fetch, but you’ll be able to serve more of them.

    But as you scale up you will get a few more emails about your server being down. Even though  you’ve disabled keepalive it still takes time for each Apache child to send data to users, especially if they’re on slow connections or connections with high latency. Here’s what you do next:

    Install Nginx on your new Linode box and get it to listen on Port 80. Then reconfigure Apache so that it listens on another port – say port 81 – and can only be accessed from the local machine. Configure Nginx as a reverse proxy to Apache listening on port 81 so that it sits in front of Apache like so:

    YourVisitor <—–> Nginx:Port80 <—–> Apache:Port81

    Enable Keepalive on Nginx and set the Keepalive timeout as high as you’d like. Disable Keepalive on Apache – this is just-in-case because Nginx’s proxy engine doesn’t support Keepalive to the back-end servers anyway.

    The 10 or so Apache children you’re running will be getting requests from a client (Nginx) that is running locally. Because there is zero latency and a huge amount of bandwidth (it’s a loopback request), the only time Apache takes to handle the request is the amount of CPU time it actually takes to handle the request. Apache children are no longer tied up with clients on slow connections. So each request is handled in a few microseconds, freeing up each child to do a hell of a lot more work.

    Nginx will occupy about 5 to 10 Megs of Memory. You’ll see thousands of users concurrently connected to it. If you have Munin loaded on your server check out the netstat graph. Bitchin isn’t it? You’ll also notice that Nginx uses very little CPU – almost nothing in fact. That’s because Nginx is designed using a single threaded model where one thread handles a huge number of connections. It can do this with little CPU usage because it uses a feature in the Linux kernel called epoll().

    Footnotes:

    Lack of time forced me to leave out all explanations on how to install and configure Nginx (I’m assuming you know Apache already) – but the Nginx Wiki is excellent, even if the Russain translation is a little rough.

    I’ve also purposely left out all references to solving disk bottlenecks (as I’ve left out a discussion about browser caching) because there has been a lot written about this and depending on what app or app-server you’re running, there are some very standard ways to solve IO problems already. e.g. Memcached, the InnoDB cache for MySQL, PHP’s Alternative PHP Cache, perstence engines that keep your compiled code in memory, etc..etc..

    This technique works to speed up any back-end application server that uses a one-thread-per-connection model. It doesn’t matter if it’s Ruby via FastCGI, Mod_Perl on Apache or some crappy little Bash script spitting out data on a socket.

    This is a very standard config for most high traffic websites today. It’s how they are able to leave keepalive enabled and handle a huge number of concurrent users with a relatively small app server cluster.  Lighttpd and Nginx are the two most popular free FSM/epoll web servers out there and Nginx is the fastest growing, best designed (IMHO) and the one I use to serve 400 requests per second on a small Apache cluster. It’s also what guys like WordPress.com use.

  • Troubleshooting

    It’s funny how when you’re troubleshooting a performance issue on your servers that suddenly made the load average spike to 14 (350% with four CPU cores) at 6:30am on a Sunday morning (yay!) all the stats look like garbage until you figure out what it is and then it’s so glaringly obvious that you spend the rest of the day kicking yourself for not seeing it immediately.

    Note to self: Next time make coffee, drink coffee, and only then log on to munin to troubleshoot.

  • Very high performance web servers

    Have you ever tried to get Apache to handle 10,000 concurrent connections? For example, you have a very busy website and you enable keepalive on your web server. Then you set the timeout to something high like 300 seconds for ridiculously slow clients (sounds crazy but I think that’s Apache’s default). All of a sudden when you run netstat it tells you that you have thousands of clients with established connections to your machine.

    Apache can’t handle 10,000 connections efficiently because it uses a one-thread-per-connection model (or if you’re using prefork then one process per connection).

    If you want to allow your clients to use keepalive on your very busy website you need to use a server that uses an event notification model. That means that you have a single thread or process that manages thousands of sockets or connections. The sockets don’t block the execution of the thread but instead sit quietly until something happens and then have a way of notifying the thread that something happened and it better come take a look.

    Most of us use Linux these days – of course there are the BSD die hards but whatever. The linux kernel 2.6 introduced something called epoll that is an event notification system for applications that want to manage lots of file descriptors without blocking execution and be notified when something changes.

    Both lighttpd and nginx are two very fast web servers that use epoll and a non-blocking event notification model to manage thousands of connections with a single thread and just a few megs of ram (ram consumption is the real reason you can’t use apache for high concurrency). You can also spawn more than one thread on both servers if you’d like to have them use more than one processor or cpu core.

    I used to use lighttpd 1.4.x but it’s configuration really sucks because it’s so inflexible. I love nginx’s configuration because it’s very intuitive and very flexible. It also has some very cool modules including an experimental embedded perl module. The performance I’m getting out of it is nothing short of spectacular. I run 8 worker processes and each process consumes about 7 megs of RAM with a few modules loaded.

    So my config looks like:

    request ==> nginx_with_keepalive –> apache/appserver_nokeepalive

    If you’d like to read more about server models for handling huge numbers of clients, check out Dan Kegel’s page on the so called c10k problem where he documents a few other event models for servers and has a history lesson on event driven IO.

    Also, if you’re planning on running a high traffic server with high concurrency you should probably optimize your IP stack – here are a few suggestions I made a while back on how to do that.