Month: December 2009

Crowdsourcing a real-time solution to air terrorism
At 09:28 on the morning of September 11, 2001, a Lebanese born engineering student named Ziad Jarrah and three “muscle” hijackers started moving the passengers of flight 93 to the rear of the plane and assaulting the cockpit of flight 93. Flights 11 and 175 had already crashed into the world trade center and flight 77 was within minutes of crashing into the Pentagon.

Two minutes after the hijacking started, passengers and crew started making phone calls to officials and family members using GTE Airphones and cellphones. A total of 35 airphone calls and two cellphone calls were made. 27 minutes later, the passengers had determined that flight 93 was a suicide mission, had voted to rush the hijackers and went to work taking back the plane.

On Christmass day this year, Jasper Schuringa, a Dutch passenger on Northwest flight 253, ran forward and tackled Umar Farouk Abdulmutallab as he was attempting to detonate 80 grams of PETN, the active ingredient in plastic explosives.

The passengers on these flights are lauded as heroes, but their actions have not been acknowledged by policy changes in the TSA and other security agencies around the world. On any flight that is the victim of a terrorist attack or hijacking, the vast majority of the passengers on the flight are capable “good-guys” that are also highly motivated to prevent the attack from succeeding. Yet the TSA continues to treat passengers as self-loading cargo that may harbor a terrorist.

In the online world we have been using the wisdom of crowds for years to determine what is good and what is bad. Google uses it to filter spam by having the millions of GMail users report spam messages and filtering out messages from sources reported frequently. My company, Feedjit, uses the wisdom of more than a million visitors monthly to filter out adult content and web spam from our site. Reddit, Slashdot.org, Digg, SumbleUpon and many other websites use crowd wisdom to expose the best of the web and to filter out malicious attempts to game the system.

Crowds are smart and most of the people in a crowd are good and are willing to help. Airline security agencies can harness this wisdom and this willingness to help by first acknowledging that it exists. Then implementing a few simple policy changes.
1. Give passengers a way to quickly and discreetly report suspicious activity to officials, both before boarding and during the flight. A button that simply reports suspicious activity within a 3 passenger radius may be all that’s needed.
2. Give crew a way to constantly monitor an aggregation of these reports in real-time and develop response policies depending on the severity or intensity of the reports about an individual.
3. Encourage passengers to engage their fellow passengers in conversation, but not in the context of preventing terrorism. For example: Include a simple statement during the pre-flight briefing of: “Northwest airlines passengers are known for their friendliness in the skies and many of us traveling to and from far off destinations. We encourage our passengers to compare travel notes and to get to know each other….etc..etc…”
With these simple changes, you have now created an on-board distributed intelligence gathering network that is reporting back to a central authority in real-time.

With these changes the recent Christmass day attack may have played out as follows:
1. A passenger sitting next to Umar Farouk Abdulmutallab tries to engage him in polite conversation and discoveres he can’t speak a word of english.
2. The passenger has the opportunity to get a good look at Umar and notices he is sweating and his hands are shaking.
3. The passenger discreetly hits a button in their arm-rest alerting flight attendants to something suspicious.
4. The passenger catches the eye of a woman in the isle seat and she notices it too and hits her button.
5. Flight attendants come over and ask Umar if he’s feeling OK and would like a drink of water.
6. Flight attendants verify that Umar is exhibiting symptoms of someone who is under extreme stress.
7. Flight attendants start a protocol of closely monitoring the passenger’s behavior by always having someone within 15 ft.
8. When Umar starts to rise to go to the toilet, flight attendants request that he come to the back of the plane so that they can do a cursory body search before he enters the toilet.
9. Flight attendants discover the PETN and the story has a very different ending.
The TSA can start by acknowledging that a vast untapped human intelligence resource exists on every flight. Then change their policy approach from “every passenger could be a terrorist” to “Only one in a million passengers are a potential terrorist and everyone else on the flight is smart, capable and highly motivated to prevent terrorism on their flight.”
December 30, 2009
How to handle 1000's of concurrent users on a 360MB VPS

There has been some recent confusion about how much memory you need in a web server to handle a huge number of concurrent requests. I also made a performance claim on the STS list that got me an unusual number of private emails.

Here’s how you run a highly concurrent website on a shoe-string budget:

The first thing you’ll do is get a Linode server because they have the fastest CPU and disk.

Install Apache with your web application running under mod_php, mod_perl or some other persistence engine for your language. Then you get famous and start getting emails about people not being able to access your website.

You increase the number of Apache threads or processes (depending on which Apache MPM you’re using) until you can’t anymore because you only have 360MB of memory in your server.

Then you’ll lower the KeepaliveTimeout and eventually disable Keepalive so that more users can access your website without tying up your Apache processes. Your users will slow down a little because they now have to re-establish a new connection for every piece of your website they want to fetch, but you’ll be able to serve more of them.

But as you scale up you will get a few more emails about your server being down. Even though you’ve disabled keepalive it still takes time for each Apache child to send data to users, especially if they’re on slow connections or connections with high latency. Here’s what you do next:

Install Nginx on your new Linode box and get it to listen on Port 80. Then reconfigure Apache so that it listens on another port – say port 81 – and can only be accessed from the local machine. Configure Nginx as a reverse proxy to Apache listening on port 81 so that it sits in front of Apache like so:

YourVisitor <—–> Nginx:Port80 <—–> Apache:Port81

Enable Keepalive on Nginx and set the Keepalive timeout as high as you’d like. Disable Keepalive on Apache – this is just-in-case because Nginx’s proxy engine doesn’t support Keepalive to the back-end servers anyway.

The 10 or so Apache children you’re running will be getting requests from a client (Nginx) that is running locally. Because there is zero latency and a huge amount of bandwidth (it’s a loopback request), the only time Apache takes to handle the request is the amount of CPU time it actually takes to handle the request. Apache children are no longer tied up with clients on slow connections. So each request is handled in a few microseconds, freeing up each child to do a hell of a lot more work.

Nginx will occupy about 5 to 10 Megs of Memory. You’ll see thousands of users concurrently connected to it. If you have Munin loaded on your server check out the netstat graph. Bitchin isn’t it? You’ll also notice that Nginx uses very little CPU – almost nothing in fact. That’s because Nginx is designed using a single threaded model where one thread handles a huge number of connections. It can do this with little CPU usage because it uses a feature in the Linux kernel called epoll().

Footnotes:

Lack of time forced me to leave out all explanations on how to install and configure Nginx (I’m assuming you know Apache already) – but the Nginx Wiki is excellent, even if the Russain translation is a little rough.

I’ve also purposely left out all references to solving disk bottlenecks (as I’ve left out a discussion about browser caching) because there has been a lot written about this and depending on what app or app-server you’re running, there are some very standard ways to solve IO problems already. e.g. Memcached, the InnoDB cache for MySQL, PHP’s Alternative PHP Cache, perstence engines that keep your compiled code in memory, etc..etc..

This technique works to speed up any back-end application server that uses a one-thread-per-connection model. It doesn’t matter if it’s Ruby via FastCGI, Mod_Perl on Apache or some crappy little Bash script spitting out data on a socket.

This is a very standard config for most high traffic websites today. It’s how they are able to leave keepalive enabled and handle a huge number of concurrent users with a relatively small app server cluster. Lighttpd and Nginx are the two most popular free FSM/epoll web servers out there and Nginx is the fastest growing, best designed (IMHO) and the one I use to serve 400 requests per second on a small Apache cluster. It’s also what guys like WordPress.com use.

December 1, 2009

Month: December 2009

Crowdsourcing a real-time solution to air terrorism

How to handle 1000's of concurrent users on a 360MB VPS