The thing about running a widget business is that you serve as many web server requests as all your users websites, combined. And if one of your users get’s Dugg or Slashdotted, you get Slashdotted too.
After I launched FEEDJIT on Thursday (5 days ago) the traffic started picking up Friday and by Saturday morning my server was groaning under the strain. Some of the highest traffic blogs were Japanese (there are more Japanese bloggers than English) and by mid-morning the Japanese were going to sleep, so that gave me a welcome reprieve.
The first thing I did was reduce Apache’s KeepAlive timeout to 2 seconds. KeepAlive’s let clients hang on to a connection which someone else could be using. If a client uses keepalive properly then it can give you a nice performance boost, but set the timeout low so slow clients don’t waste server resources.
Then I added HTML caching for the widget serving routine using Perl’s Cache::FileCache. This gave me a huge speed increase but the stats on our widgets were 1 minute delayed – and that sucked.
By Saturday night I’d rolled out the new caching code and the server was a lot faster, but I knew it wouldn’t work long term and non-realtime stats for FEEDJIT was not an option.
By Sunday I was getting 40 hits per second and rising and the server was groaning again. I had to make some fundamental changes to the way the app was architected. The old mod_perl2, MySQL and Apache2 combination wasn’t going to cut it.
So I basically redesigned the data storage routines from the ground up. I moved from mysql to a home grown data access method.
I can’t tell you how gratifying it was when I rolled out the new code last night and watched the server load average drop from 2.5 to 0.3 (unix load where 1.0 = 100%) and hold there as our traffic continued to rise.
We have several high traffic blogs now and our busiest blogs generate around 1.5 widget loads (pageviews) per second. I’m confident that if for some reason TechCrunch added us tomorrow, we’d easily handle the traffic without breaking a sweat.
1.0 != 100%. It’s the average number of processes waiting in the run queue.
Commented on August 22, 2007 at 7:23 am
Have you considered switching to memcached?
(and saving the statistics to disk once in a while)
Commented on August 22, 2007 at 9:14 am
I’m honestly shocked that you don’t consider “1 minute behind real time” to be real time. That’s completely within acceptable boundaries, yo.
Commented on August 23, 2007 at 1:11 am