What the Web Sockets Protocol means for web startups

Ian Hickson’s latest draft of the Web Sockets Protocol (WSP) is up for your reading pleasure. It got me thinking about the tangible benefits the protocol is going to offer over the long polling that my company and others have been using for our real-time products.

The protocol works as follows:

Your browser accesses a web page and loads, lets say, a javascript application. Then the javascript application decides it needs a constant flow of data to and from it’s web server. So it sends an HTTP request that looks like this:

GET /demo HTTP/1.1
Upgrade: WebSocket
Connection: Upgrade
Host: example.com
Origin: http://example.com
WebSocket-Protocol: sample

The server responds with an HTTP response that looks like this:

HTTP/1.1 101 Web Socket Protocol Handshake
Upgrade: WebSocket
Connection: Upgrade
WebSocket-Origin: http://example.com
WebSocket-Location: ws://example.com/demo
WebSocket-Protocol: sample

Now data can flow between the browser and server without having to send HTTP headers until the connection is broken down again.

Remember that at this point, the connection has been established on top of a standard TCP connection. The TCP protocol provides a reliable delivery mechanism so the WSP doesn’t have to worry about that. It can just send or receive data and rest assured the very best attempt will be made to deliver it – and if delivery fails it means the connection has broken and WSP will be notified accordingly. WSP is not limited to any frame size because TCP takes care of that by negotiating an MSS (maximum segment size) when it establishes the connection. WSP is just riding on top of TCP and can shove as much data in each frame as it likes and TCP will take care of breaking that up into packets that will fit on the network.

The WSP sends data using very lightweight frames. There are two ways the frames can be structured. The first frame type starts with a 0×00 byte (zero byte), consists of UTF-8 text and ends with a 0xFF byte with the UTF-8 text in between.

The second WSP frame type starts with a byte that ranges from 0×80 to 0xFF, meaning the byte has the high-bit (or left-most binary bit) set to 1. Then there is a series of bytes that all have the high-bit set and the 7 right most bits define the data length. Then there’s a final byte that doesn’t have the high-bit set and the data follows and is the length specified. This second WSP frame type is presumably for binary data and is designed to provide some future proofing.

If you’re still with me, here’s what this all means. Lets say you have a web application that has a real-time component. Perhaps it’s a chat application, perhaps it’s Google Wave, perhaps it’s something like my Feedjit Live that is hopefully showing a lot of visitors arriving here in real-time. Lets say you have 100,000 people using your application concurrently.

The application has been built to be as efficient as possible using the current HTTP specification. So your browser connects and the server holds the connection open and doesn’t send the response until there is data available. That’s called long-polling and it avoids the old situation of your browser reconnecting every few seconds and getting told there’s no data yet along with a full load of HTTP headers moving back and forward.

Lets assume that every 10 seconds the server or client has some new data they need to send to each other. Each time a full set of client and server headers are exchanged. They look like this:

GET / HTTP/1.1
User-Agent: ...some long user agent string...
Host: markmaunder.com
Accept: */*

HTTP/1.1 200 OK
Date: Sun, 25 Oct 2009 17:32:19 GMT
Server: Apache
X-Powered-By: PHP/5.2.3
X-Pingback: http://markmaunder.com/xmlrpc.php
Connection: close
Transfer-Encoding: chunked
Content-Type: text/html; charset=UTF-8

That’s 373 bytes of data. Some simple math tells us that 100,000 people generating 373 bytes of data every 10 seconds gives us a network throughput of 29,840,000 bits per second or roughly 30 Megabits per second.

That’s 30 Mbps just for HTTP headers.

With the WSP every frame only has 2 bytes of packaging. 100,000 people X 2 bytes = 200,000 bytes per 10 seconds or 160 Kilobits per second.

So WSP takes 30 Mbps down to 160 Kbps for 100,000 concurrent users of your application. And that’s what Hickson and the WSP team and trying to do for us.

Google would be the single biggest winner if the WSP became standard in browsers and browser API’s like Javascript. Google’s goal is to turn the browser into an operating system and give their applications the ability to run on any machine that has a browser. Operating systems have two advantages over browsers: They have direct access to the network and they have local file system storage. If you solve the network problem you also solve the storage problem because you can store files over the network.

Hickson is also working on the HTML 5 specification for Google, but the current date the recommendation is expected to be ratified is 2022. WSP is also going to take time to be ratified and then incorporated into Javascript (and other) API’s. But it is so strategically important for Google that I expect to see it in Chrome and in Google’s proprietary web servers in the near future.