Blog

  • How to handle 1000's of concurrent users on a 360MB VPS

    There has been some recent confusion about how much memory you need in a web server to handle a huge number of concurrent requests. I also made a performance claim on the STS list that got me an unusual number of private emails.

    Here’s how you run a highly concurrent website on a shoe-string budget:

    The first thing you’ll do is get a Linode server because they have the fastest CPU and disk.

    Install Apache with your web application running under mod_php, mod_perl or some other persistence engine for your language. Then you get famous and start getting emails about people not being able to access your website.

    You increase the number of Apache threads or processes (depending on which Apache MPM you’re using) until you can’t anymore because you only have 360MB of memory in your server.

    Then you’ll lower the KeepaliveTimeout and eventually disable Keepalive so that more users can access your website without tying up your Apache processes. Your users will slow down a little because they now have to re-establish a new connection for every piece of your website they want to fetch, but you’ll be able to serve more of them.

    But as you scale up you will get a few more emails about your server being down. Even though  you’ve disabled keepalive it still takes time for each Apache child to send data to users, especially if they’re on slow connections or connections with high latency. Here’s what you do next:

    Install Nginx on your new Linode box and get it to listen on Port 80. Then reconfigure Apache so that it listens on another port – say port 81 – and can only be accessed from the local machine. Configure Nginx as a reverse proxy to Apache listening on port 81 so that it sits in front of Apache like so:

    YourVisitor <—–> Nginx:Port80 <—–> Apache:Port81

    Enable Keepalive on Nginx and set the Keepalive timeout as high as you’d like. Disable Keepalive on Apache – this is just-in-case because Nginx’s proxy engine doesn’t support Keepalive to the back-end servers anyway.

    The 10 or so Apache children you’re running will be getting requests from a client (Nginx) that is running locally. Because there is zero latency and a huge amount of bandwidth (it’s a loopback request), the only time Apache takes to handle the request is the amount of CPU time it actually takes to handle the request. Apache children are no longer tied up with clients on slow connections. So each request is handled in a few microseconds, freeing up each child to do a hell of a lot more work.

    Nginx will occupy about 5 to 10 Megs of Memory. You’ll see thousands of users concurrently connected to it. If you have Munin loaded on your server check out the netstat graph. Bitchin isn’t it? You’ll also notice that Nginx uses very little CPU – almost nothing in fact. That’s because Nginx is designed using a single threaded model where one thread handles a huge number of connections. It can do this with little CPU usage because it uses a feature in the Linux kernel called epoll().

    Footnotes:

    Lack of time forced me to leave out all explanations on how to install and configure Nginx (I’m assuming you know Apache already) – but the Nginx Wiki is excellent, even if the Russain translation is a little rough.

    I’ve also purposely left out all references to solving disk bottlenecks (as I’ve left out a discussion about browser caching) because there has been a lot written about this and depending on what app or app-server you’re running, there are some very standard ways to solve IO problems already. e.g. Memcached, the InnoDB cache for MySQL, PHP’s Alternative PHP Cache, perstence engines that keep your compiled code in memory, etc..etc..

    This technique works to speed up any back-end application server that uses a one-thread-per-connection model. It doesn’t matter if it’s Ruby via FastCGI, Mod_Perl on Apache or some crappy little Bash script spitting out data on a socket.

    This is a very standard config for most high traffic websites today. It’s how they are able to leave keepalive enabled and handle a huge number of concurrent users with a relatively small app server cluster.  Lighttpd and Nginx are the two most popular free FSM/epoll web servers out there and Nginx is the fastest growing, best designed (IMHO) and the one I use to serve 400 requests per second on a small Apache cluster. It’s also what guys like WordPress.com use.

  • Fly Fishing on lake Samm

    My neighbor John Winkler just emailed this to me. He took this while I was fly fishing off our dock on Lake Sammamish earlier this year.

    meFlyfishingOnLakeSamm

  • A legend just died

    About a year ago I received two old servers from Snapvine my former neighbors (now acquired by Whitepages). Snapvine got them from another startup in Seattle, I don’t recall which. I named them Rex1 and Rex2 and put them to work immediately. The graph below shows the disk activity during the last year on Rex1:

    rex1-cpu-year

    The pink stuff in the graph is the amount of time the CPU spends waiting for the disk – showing disk activity. The application on Rex1 is one of the processes that produces the Feedjit Geoblogosphere (what people are reading in your city).

    Well because Rex1 has such a long history with Seattle Startups and has give us such a kick-ass run, I feel it’s only appropriate that I give Rex1’s first disk fail a fitting eulogy. We’ll miss you buddy!

    [Yeah that’s right, I’m killing time on this blog entry waiting for Rex1 to come back up so I can copy the data over to Rex2]

  • Spooky fun with ssh on OS X

    Want to freak out your wife/husband/kids?

    On a mac that you have access to:

    Go to ‘System Preferences’. Under ‘Internet & Networking’ there is a ‘Sharing’ icon. Run that. In the list that appears, check the ‘Remote Login’ option.

    Then ssh into your mac remotely by downloading putty if you’re on a pc or launch a terminal on another mac and run “ssh username@ip.address” without quotes to ssh to the mac while someone is working on it.

    Once you’re logged in:

    Crank up the volume by running:
    sudo osascript -e "set volume 10"

    Then run:

    sudo osascript -e 'say "I am watching you." using "Zarvox"

    Or if that doesn’t work, try:

    sudo osascript -e 'say "I am watching you." using "Cellos"'

    Make sure you have an automatic emergency defibrillator handy.

  • Why we breathe

    Free Diver in LimasolHold your breath for a moment.

    In about 10 to 30 seconds you’ll be feeling a strong desire to take a breath. That’s not caused by lack of oxygen. It’s caused by excess carbon dioxide buildup in your blood.

    [Ok you can breathe again.]

    The trigger in mammals that causes us to want to take a breath is an excess buildup of CO2. In reptiles the trigger is lack of O2. Free divers don’t hyperventilate to get more O2 into their bloodstream. They do it to to flush out excess CO2 and remove that breathing trigger. That’s also what causes shallow water black-out as you’re surfacing, so don’t try it without a buddy.

    I’ve worked in more startups than I care to count where the lack of endurance was not caused by lack of oxygen, but an excess buildup of waste. Getting a larger office, buying excess server capacity early on that isn’t needed, hiring excess people to manage that server capacity, hiring managers to manage the people, hiring an ad agency and PR firm and a small team to manage them.

    Once you start down the path of waste you may still have enough oxygen in your bloodstream to surface, but the excess CO2 in your business creates a strong demand for more Oxygen which causes you to raise another round of funding, producing more CO2 and the cycle continues.

    So start your business by hyperventilating to flush out all excess CO2, take a deep breath and beware of shallow water blackout as you’re approaching the surface.

    [Photo credit: My good friend Bruno Stichini who hosted a free diving world record attempt in Limasol, Cyprus back in 2000]

  • Revenue and Runway – Why every cent matters

    A month ago on Techcrunch, Michael Arrington wrote about “Twitter’s Revenue Dilemma”: “Your valuation can actually go down once you turn on revenue.”.

    “Turning on revenue” frames it as a binary thing. You’re either making money or you’re not. It completely disregards the most important variable in finance: Time.

    With the tiniest trickle of revenue you can extend your runway infinitely. That means you never have to raise another cent and you even have money to fund your growth. Let’s take an example:

    Say you’re a consumer web business. You have some growth and some traction. You close an angel round for $400k in Month 1. In month 2 you start spending it and your burn rate is $25k for salaries, office and hosting. It takes you 4 months to get the product into shape and launch.

    In your first month of launch you make a meagre $500 bucks. And lets say you suck at marketing and your revenue increases by $1000 per month so that a year after you launch your product (17 months after getting funded) you’re making $12,500 per month in revenue.

    Even two years after getting funded you’re still only making $19,500 which is far from breaking even.

    But what this does it it slows your burn rate enough and buys you enough time so that you never run out of money. That means you can keep paying yourself a full salary and growing your business and you never run out of cash. In month 29 your bank balance drops down to $12,500, but then it starts increasing again because in Month 30 you break even.

    If you didn’t generate any revenue in the first 18 months you run out of money in month 17.

    You might argue this approach stifles growth. So be more aggressive, increase your burn rate to $200k and raise $3 Million. The same logic applies. Early cash-flow that is far from break-even can extend your runway to infinity (and beyond).

    This matters for founders more than anyone else because it means you can raise a single round and never have to give away any of your equity ever again.

    The sheet below shows the two scenarios – with and without revenue. [I’ve reoriented the flows vertically for readability]

  • Possible leads in the shooting of officer Brenton

    The memorial procession is happening right now for Officer Tim Brenton. Details on the West Seattle Blog. More importantly, please read this entry on the same blog. It contains a detailed description from SPD on how you can help catch the shooter(s). The comment thread has some great data.

    “TK” has increased the size of SPD’s photos of the suspected vehicle which is a 1980-83 Datsun 210 according to SPD.

    A few commenters have suggested it may be a Toyota Celica. I’m not sure I agree.

    Someone found this B210 with Washington plates online.

    From SPD: “It is important that if anyone has recently sold a vehicle of this type or had one stolen that they call Seattle Police at 206 233-5000.”

    The full SPD blog entry is here. Please read and distribute widely among your fellow Seattleites.

  • How much traffic do the biggest typo domains get?

    There’s an article on searchengineland today about domaining and how Google and Yahoo “make money off a twitter typo domain”. I’m not sure I’m as excited about exposing this travesty of justice as SEL is, but I was curious how much traffic typo domains get:

    Alexa domain typo traffic

    In my brief research I found facebok.com was by far the biggest winner with twiter.com running a distant second. But their traffic dropped off to a trickle middle of this year. I wonder if facebook themselves or a popular app mistyped a URL somewhere and then fixed it.

    Other variations of facebook, twitter, google and myspace didn’t yield much. I entered a high traffic site who’s exact numbers I have access to for comparison and by my estimates facebok.com was getting just under half a million uniques per month. Nothing compared to the real FB, but slapping remnant advertising on there would yield $1000 to $5000 per month. Twiter.com gets around a quarter million uniques per month netting around $500 to $2500 on remnant ads.

  • Chrysler's new logo: A trademark lawsuit about to happen

    My education in trademark law a few years ago taught me this: The test of whether a case is winnable or not is that the plaintiff has to prove actual instances of customer confusion. TM law is not there to protect your trademark. It’s goal is to protect the consumer – hence this test.

    Here’s Chrysler’s new logo. What do you think? [Rest assured that when the suit is filed, attorney’s from both sides will be taking your comments into account]

    989678915

    Aston Martin logo 01

  • WTF is up with Dell?

    Screen shot 2009-11-05 at 11.01.06 AM[Update at end of post] I love Dell servers. In fact I even love their network hardware. I’ve spent 0000’s (yeah that’s four zeros) with them during the last 2 years and Mick my old sales guy rocked! As did his hardware team.

    I’m ready to spend more money. Yup, I’m going to take my hard earned dollars and hand it over for more of that great hardware they have. Unfortunately Mick got laid off. So now I’m dealing with a team of 5 people.

    I get almost daily emails from someone called Loree Brown reminding me of how much Dell rocks and telling me about the great deals they have. I even have this cute guy with his cute chin fluff, silk tie and his cutsie pie little smile appearing in my emails. He’s really been a huge influence on my buying decision. He looks like he just got laid and I’d like to look like that right after I buy my servers.

    But I’ve been emailing “Loree Brown” @ Dell with requests for quotes and I get nada response. Nothing. Hell I even put the $4000 server that I want to buy in my shopping cart ready for him or her or it to turn into a quote.

    I’ve sent reminder emails. Follow up emails. To multiple addresses. Nothing.

    So really what’s happened is that they fired Mick and “streamlined” operations and I’m going to end up buying my machines somewhere else. Which means getting rid of Mick cost them probably several 0000’s (four zero’s again) over the next 2 years from me alone.

    Get your shit together guys, I want my quote!!

    Update: Got a call from Loree’s manager Reed West apologizing profusely for the confusion. Apparently the problem is that Dell’s marketing emails come from username@midmarket.dell.com and the reps don’t actually receive emails at those addresses. I had sent several emails to Lori at the ‘midmarket’ address instead of her real address. Their real email addresses are username@dell.com. So Reed has undertaken to fix that issue – their marketing emails will now come from real email addresses.  Nice to know Dell reads the blogosphere and twittersphere and responds – they got back to me less than 3 hours after I posted this. As I mentioned in the original post, their servers are unbeatable – looking forward to a better relationship with the new sales team.