Bandwidth providers: Please follow Google’s lead in helping startups, the environment and yourselves

There’s a post on Hacker News today pointing to a few open source javascript libraries that Google is hosting on their content distribution network. ScriptSrc.net has a great UI that gives you an easy way to link to the libs from your web pages. Developers and companies can link to these scripts from their own websites and gain the following benefits:

  • Your visitor may have already cached the script on another website so your page will load faster
  • The script is hosted on a different domain which allows your browser to create more concurrent connections while fetching your content – another speed increase.
  • It saves you the bandwidth of having to serve that content up yourself which can result in massive cost savings if you’re a high traffic site.
  • Just like your visitor already cached the content, their workstation or local DNS server may also have the CDN’s IP address cached which further speeds load time.

While providing a service like this does cost Google or the providing company more in hosting, it provides an overall efficiency gain. Less bandwidth and CPU is used on the Web as a whole by Google providing this service. That means less cooling is required in data centers, less networking hardware needs to be manufactured to support the traffic on the web and so on.

The environment benefits as a whole by Google or another large provider hosting these frequently loaded scripts for us.

The savings are passed on to lone developers and startups who are using the scripts. For smaller companies who are trying to minimize costs while dealing with massive growth this can result in a huge cost savings that helps them to continue to innovate.

The savings are also passed on to bandwidth providers like NTT, AT&T, Comcast, Time Warner, Qwest and other bandwidth providers who’s customers consume less bandwidth as a result.

So my suggestion is that Google and bandwidth providers collaborate to come up with a package of the most used open source components online and keep the list up to date. Then provide local mirrors of each of these packages with a fallback mechanism if the package isn’t available. Google should define an IP address similar to their easy to remember DNS ip address 8.8.8.8 that hosts these scripts. Participating ISP’s route traffic destined for that IP address to a local mirror using a system similar to IP Anycast. An alternative URL is provided via a query string. e.g.

http://9.9.9.9/js/prototype.1.5.0.js?fallback=http://mysite.com/myjs/myprototype.1.5.0.js

If the local ISP isn’t participating the request is simply routed to Google’s 9.9.9.9 server as per normal.

If the local ISP (or Google) doesn’t have a copy of the script in their mirror it just returns a 302 redirect to the fallback URL which the webmaster has provided and which usually points to the webmaster’s own site. A mechanism for multiple fallbacks can easily be created e.g. fallback1, fallback2, etc.

Common scripts, icon libraries and flash components can be hosted this way. There may even be scenarios where a company (like Google) is used by such a large percentage of the Net population that it makes sense to put them on the 9.9.9.9 mirror system so that local bandwidth providers can serve up commonly used components rather than have to fetch them from via their upstream providers. Google’s logo for example.

What the Web Sockets Protocol means for web startups

Ian Hickson’s latest draft of the Web Sockets Protocol (WSP) is up for your reading pleasure. It got me thinking about the tangible benefits the protocol is going to offer over the long polling that my company and others have been using for our real-time products.

The protocol works as follows:

Your browser accesses a web page and loads, lets say, a javascript application. Then the javascript application decides it needs a constant flow of data to and from it’s web server. So it sends an HTTP request that looks like this:

GET /demo HTTP/1.1
Upgrade: WebSocket
Connection: Upgrade
Host: example.com
Origin: http://example.com
WebSocket-Protocol: sample

The server responds with an HTTP response that looks like this:

HTTP/1.1 101 Web Socket Protocol Handshake
Upgrade: WebSocket
Connection: Upgrade
WebSocket-Origin: http://example.com
WebSocket-Location: ws://example.com/demo
WebSocket-Protocol: sample

Now data can flow between the browser and server without having to send HTTP headers until the connection is broken down again.

Remember that at this point, the connection has been established on top of a standard TCP connection. The TCP protocol provides a reliable delivery mechanism so the WSP doesn’t have to worry about that. It can just send or receive data and rest assured the very best attempt will be made to deliver it – and if delivery fails it means the connection has broken and WSP will be notified accordingly. WSP is not limited to any frame size because TCP takes care of that by negotiating an MSS (maximum segment size) when it establishes the connection. WSP is just riding on top of TCP and can shove as much data in each frame as it likes and TCP will take care of breaking that up into packets that will fit on the network.

The WSP sends data using very lightweight frames. There are two ways the frames can be structured. The first frame type starts with a 0×00 byte (zero byte), consists of UTF-8 text and ends with a 0xFF byte with the UTF-8 text in between.

The second WSP frame type starts with a byte that ranges from 0×80 to 0xFF, meaning the byte has the high-bit (or left-most binary bit) set to 1. Then there is a series of bytes that all have the high-bit set and the 7 right most bits define the data length. Then there’s a final byte that doesn’t have the high-bit set and the data follows and is the length specified. This second WSP frame type is presumably for binary data and is designed to provide some future proofing.

If you’re still with me, here’s what this all means. Lets say you have a web application that has a real-time component. Perhaps it’s a chat application, perhaps it’s Google Wave, perhaps it’s something like my Feedjit Live that is hopefully showing a lot of visitors arriving here in real-time. Lets say you have 100,000 people using your application concurrently.

The application has been built to be as efficient as possible using the current HTTP specification. So your browser connects and the server holds the connection open and doesn’t send the response until there is data available. That’s called long-polling and it avoids the old situation of your browser reconnecting every few seconds and getting told there’s no data yet along with a full load of HTTP headers moving back and forward.

Lets assume that every 10 seconds the server or client has some new data they need to send to each other. Each time a full set of client and server headers are exchanged. They look like this:

GET / HTTP/1.1
User-Agent: ...some long user agent string...
Host: markmaunder.com
Accept: */*

HTTP/1.1 200 OK
Date: Sun, 25 Oct 2009 17:32:19 GMT
Server: Apache
X-Powered-By: PHP/5.2.3
X-Pingback: http://markmaunder.com/xmlrpc.php
Connection: close
Transfer-Encoding: chunked
Content-Type: text/html; charset=UTF-8

That’s 373 bytes of data. Some simple math tells us that 100,000 people generating 373 bytes of data every 10 seconds gives us a network throughput of 29,840,000 bits per second or roughly 30 Megabits per second.

That’s 30 Mbps just for HTTP headers.

With the WSP every frame only has 2 bytes of packaging. 100,000 people X 2 bytes = 200,000 bytes per 10 seconds or 160 Kilobits per second.

So WSP takes 30 Mbps down to 160 Kbps for 100,000 concurrent users of your application. And that’s what Hickson and the WSP team and trying to do for us.

Google would be the single biggest winner if the WSP became standard in browsers and browser API’s like Javascript. Google’s goal is to turn the browser into an operating system and give their applications the ability to run on any machine that has a browser. Operating systems have two advantages over browsers: They have direct access to the network and they have local file system storage. If you solve the network problem you also solve the storage problem because you can store files over the network.

Hickson is also working on the HTML 5 specification for Google, but the current date the recommendation is expected to be ratified is 2022. WSP is also going to take time to be ratified and then incorporated into Javascript (and other) API’s. But it is so strategically important for Google that I expect to see it in Chrome and in Google’s proprietary web servers in the near future.