Where’s the Disruption from the Change in Startup Economics?

It’s been a year long break from blogging and getting back to writing and getting a so many new visitors this soon is cool. [Thanks HN!]

 

This blog runs on the smallest available Linode 512 instance for $20/month. It runs several sites including family blogs and hobby sites. I run nginx on the front end and reverse proxy to 5 Apache children which saves me having to run roughly 100 Apache children to handle the brief spikes of around 20 hits per second I saw yesterday.

 

Technologies like event-servers (Nginx, node.js, etc) and cheap and reliable virtualization may seem like old hat, but in 2005 Linode was charging $40/month for a 128Meg instance (it’s now $20/month for 512Megs, 88% cheaper) and Nginx was only going to hit main-stream use two years later. In fact Nginx only hit version 1.0 last month.

Five years ago many companies or bloggers would have used a physical box with 3.5 Gigabytes of memory to handle 100 apache instances and the database for this kind of traffic. About $300/month based on current pricing for physical dedicated servers from ServerBeach which hasn’t changed much since 2005.

With the move from hardware and multiprocess servers to virtualization and event-servers, hosting costs have dropped to 6% of what they were 5 years ago. A drop of 94% in a variable cost for any sector changes the economics in a way that usually causes disruption and innovation.

So where is the disruption and innovation happening now that anyone can afford a million-hits-a-month server?

 

Footnotes: An unstable version of Nginx was available in 2005/2006 and Lighttpd was also an alternative back then for reverse proxying. But it was for hardcore hackers who didn’t mind relatively unstable and bleeding-edge configurations. Mainstream configuration in 2005 was running big memory servers on dedicated machines with a huge number of Apache children. Sadly, much of the web is still run this way. I shudder to think of the environmental impact of all those front-end web boxes. I also don’t address the subject of Keep-Alive on Apache. Disabling Keep-Alive is a way to get a lot more bang for your hardware (specifically you need less memory because you run less apache children) while sacrificing some browser performance. The norm in 2005 was to leave keepalive enabled, but set to a short timeout. With Keepalive set to 15 seconds, my estimate of 100 apache instances for 20 hits per second is probably way too optimistic. With Keep-Alive disabled you would barely handle 20 requests per second with 100 children when taking into account latency per request for slower connections. Bandwidth cost is also a consideration, but gzip and running compressed code, using CDN versions of libs like jQuery that someone else hosts and running a stripped down site with few images helps. [Think Craigslist] With a page size of 100K, Linode’s 400GB bandwidth allowance gives you 4,194,304 pageviews.

 

It’s OK to make an extra $2k per month if you’re a programmer. Here’s how.

This quote, which went viral 2 months ago and that Steinbeck probably never said, has stuck with me:

“Socialism never took root in America because the poor see themselves not as an exploited proletariat but as temporarily embarrassed millionaires.” ~Maybe not Steinbeck, but it’s cool and it’s true.

As temporarily embarrassed millionaire programmers I feel we sometimes don’t pursue projects that could be buying awesome toys every month, making up for that underwater mortgage or adding valuable incremental income. Projects in this space aren’t the next Facebook or Twitter so they don’t pass the knock-it-out-the-park test.

There are so many ideas in this neglected space that have worked and continue to work. Here’s a start:

  1. Do a site:.gov search on Google for downloadable government data.
  2. Come up with a range of data that you can republish in directory form. Spend a good few hours doing this and create a healthy collection of options.
  3. You might try a site:.edu search too and see if universities have anything interesting.
  4. site:.ac.uk site:.ac.za – you get the idea.
  5. Experiment with Google’s Keyword Tool.
  6. Make sure you’re signed in.
  7. Click Traffic Estimator on the left.
  8. Enter keywords that describe the data sets you’ve come up with. Enter a few to get a good indication each category or sector’s potential
  9. Look at search volume to find sectors that are getting high search volumes.
  10. Look at CPC to find busy sectors that also have advertisers that are paying top dollar for clicks.
  11. Finally, look at the Competition column to get an idea of how many advertisers are competing in the sector.
  12. First prize is high search volume, high CPC, high competition. Sometimes you can’t have it all, but get as close as you can.
  13. Now that you’ve chosen a lucrative sector with lots of spendy advertisers and have government or academic data you can republish, figure out a way to generate thousands of pages of content out of that data and solve someone’s problem. The problem could be “Why can’t I find a good site about XYZ when I google for such-and-such.”
  14. Give the site a good solid SEO link structure with breadcrumbs and cross-linking. Emphasize relevant keywords with the correct html tags and avoid duplicate content. Make sure the site performance is wicked fast or you’ll get penalized. Nginx reverse-proxying Apache is always a good bet.
  15. Tell the right people about your site and tell them regularly via great blog entries, insightful tweets, and networking in your site’s category.
  16. Keep monitoring Googlebot crawl activity, how your site is being indexed and tweak it for 6 months until it’s all indexed, ranking and getting around 50K visits per month (1666 visits per day).
  17. That’s 150,000 page views per month at 3 pages per visit average.
  18. At a 1.6% CTR with 0.85c CPC from Adsense you’re earning $2040 per month.

Update: To clarify, “competition” above refers to competition among advertisers paying for clicks in a sector. More competition is a good thing for publishers because it means higher CPC and more ad inventory i.e. a higher likelihood an ad will be available for a specific page with specific subject matter in your space. [Thanks Bill!]

Update2: My very good mate Joe Heitzeberg runs MediaPiston which is a great way to connect with high quality authors of original content. If you do have a moderate budget and are looking for useful and unique content to get started, give Joe and his crew a shout! They have great authors and have really nailed the QA and feedback process with their platform.

What the Web Sockets Protocol means for web startups

Ian Hickson’s latest draft of the Web Sockets Protocol (WSP) is up for your reading pleasure. It got me thinking about the tangible benefits the protocol is going to offer over the long polling that my company and others have been using for our real-time products.

The protocol works as follows:

Your browser accesses a web page and loads, lets say, a javascript application. Then the javascript application decides it needs a constant flow of data to and from it’s web server. So it sends an HTTP request that looks like this:

GET /demo HTTP/1.1
Upgrade: WebSocket
Connection: Upgrade
Host: example.com
Origin: http://example.com
WebSocket-Protocol: sample

The server responds with an HTTP response that looks like this:

HTTP/1.1 101 Web Socket Protocol Handshake
Upgrade: WebSocket
Connection: Upgrade
WebSocket-Origin: http://example.com
WebSocket-Location: ws://example.com/demo
WebSocket-Protocol: sample

Now data can flow between the browser and server without having to send HTTP headers until the connection is broken down again.

Remember that at this point, the connection has been established on top of a standard TCP connection. The TCP protocol provides a reliable delivery mechanism so the WSP doesn’t have to worry about that. It can just send or receive data and rest assured the very best attempt will be made to deliver it – and if delivery fails it means the connection has broken and WSP will be notified accordingly. WSP is not limited to any frame size because TCP takes care of that by negotiating an MSS (maximum segment size) when it establishes the connection. WSP is just riding on top of TCP and can shove as much data in each frame as it likes and TCP will take care of breaking that up into packets that will fit on the network.

The WSP sends data using very lightweight frames. There are two ways the frames can be structured. The first frame type starts with a 0x00 byte (zero byte), consists of UTF-8 text and ends with a 0xFF byte with the UTF-8 text in between.

The second WSP frame type starts with a byte that ranges from 0x80 to 0xFF, meaning the byte has the high-bit (or left-most binary bit) set to 1. Then there is a series of bytes that all have the high-bit set and the 7 right most bits define the data length. Then there’s a final byte that doesn’t have the high-bit set and the data follows and is the length specified. This second WSP frame type is presumably for binary data and is designed to provide some future proofing.

If you’re still with me, here’s what this all means. Lets say you have a web application that has a real-time component. Perhaps it’s a chat application, perhaps it’s Google Wave, perhaps it’s something like my Feedjit Live that is hopefully showing a lot of visitors arriving here in real-time. Lets say you have 100,000 people using your application concurrently.

The application has been built to be as efficient as possible using the current HTTP specification. So your browser connects and the server holds the connection open and doesn’t send the response until there is data available. That’s called long-polling and it avoids the old situation of your browser reconnecting every few seconds and getting told there’s no data yet along with a full load of HTTP headers moving back and forward.

Lets assume that every 10 seconds the server or client has some new data they need to send to each other. Each time a full set of client and server headers are exchanged. They look like this:

GET / HTTP/1.1
User-Agent: ...some long user agent string...
Host: markmaunder.com
Accept: */*

HTTP/1.1 200 OK
Date: Sun, 25 Oct 2009 17:32:19 GMT
Server: Apache
X-Powered-By: PHP/5.2.3
X-Pingback: http://markmaunder.com/xmlrpc.php
Connection: close
Transfer-Encoding: chunked
Content-Type: text/html; charset=UTF-8

That’s 373 bytes of data. Some simple math tells us that 100,000 people generating 373 bytes of data every 10 seconds gives us a network throughput of 29,840,000 bits per second or roughly 30 Megabits per second.

That’s 30 Mbps just for HTTP headers.

With the WSP every frame only has 2 bytes of packaging. 100,000 people X 2 bytes = 200,000 bytes per 10 seconds or 160 Kilobits per second.

So WSP takes 30 Mbps down to 160 Kbps for 100,000 concurrent users of your application. And that’s what Hickson and the WSP team and trying to do for us.

Google would be the single biggest winner if the WSP became standard in browsers and browser API’s like Javascript. Google’s goal is to turn the browser into an operating system and give their applications the ability to run on any machine that has a browser. Operating systems have two advantages over browsers: They have direct access to the network and they have local file system storage. If you solve the network problem you also solve the storage problem because you can store files over the network.

Hickson is also working on the HTML 5 specification for Google, but the current date the recommendation is expected to be ratified is 2022. WSP is also going to take time to be ratified and then incorporated into Javascript (and other) API’s. But it is so strategically important for Google that I expect to see it in Chrome and in Google’s proprietary web servers in the near future.

An immaginary conversation about immigration with Glenn Beck

Update: I wrote this blog entry and then predictably, I unposted it after my more diplomatic side took over. But it got out via my RSS feed anyway and a friend enjoyed it. So here it is in all it’s left wing liberal glory. I’m switching the published date to today. Enjoy.

I’m an immigrant.glennBeckXenophobe

“Oooh nasty! Are you here to send your dirty kids to our schools?”

No.

“Are you going to leech of our social security?”

No

“Are you going to steal jobs from my family and my kids?”

No.

“Are you going to rip off our great health care system and then scuttle back to the dirty little hole you came from?”

Um, no. Hey I didn’t accidentally cross the Canadian border did I?

“So what are you doing in this here land of the free and home of the brave boy?”

I’m here to create jobs for your kids. I moved here in 2003. Since then I’ve created four technology startups with the goal of building a profitable business, bringing foreign currency to the United States and creating jobs for Americans. I created one of the worlds largest job search engines to help Americans find jobs. I currently run a software business who’s products are used by over 300,000 websites world-wide and that brings foreign currency to the USA.

“Oh come now. You’re just taking money away from American investors.”

Actually most of my investors are self-made and are also immigrants. Some of them helped create Google, that great company co-founded by Sergei Brin, also an immigrant.

“So what’s your point?”

Well my point is that I’m surprised I have to have this conversation with you at all my little xenophobic marshmallow-faced friend. You may not realize it but you are costing this country billions in future earnings with your crappy attitude. Immigrant entrepreneurs are feeling pretty damn unappreciated thanks to you.

“OK so what are you going to do? Move to Russia or something?”

Actually Chile is sounding pretty good right now and is probably going to steal a truckload of talent that would have created millions of jobs and billions in future taxable dollars for the USA. If you invest $500,000 over 5 years, they’ll give you permanent residency, $30,000 to visit and explore Chile for due diligence, another $30,000 to launch your company in Chile, give you up to $1 Million for rent if you’re in one of their tech centers, up to $25,000 per year for training expenses for each of the locals you hire from one of their excellent engineering schools. You can even bring your own talented people to the country from anywhere in the world and Chile will pay for their training too. They’ll pay 40% of your costs if you want to build your own office up to $2 Million. And if your talented friends want to move to Chile they automatically get a working visa if they get a legitimate job.

“So go! American’s are a tough breed. We know how to take care of ourselves!”

Actually, you’ve been relying on us immigrant types for some time now. Albert Einstein immigrated to the United States and brought with him the physics you needed to create the first atomic bomb. Wernher Von Braun and 1,600 other scientists and engineers were brought to the United States post World War 2 as part of operation paperclip and Von Braun and his men were the creators of the Saturn V rocket that took the US to the moon. The space race gave birth to Silicon Valley, much of which continues to be powered by immigrant intellects today. Over half of all Valley Startups and one quarter of all American tech companies are started by immigrants.

“So what the hell do you want me to do?”

I want you to stop promoting a culture of xenophobia in this country. I want you to start thinking about what an opportunity this country has right now because, for all the America haters out there, there are still boatloads of PhD’s and business creators who want to come to this country. All we have to do is open our front doors to them and make them feel welcome. We don’t even have to throw tax dollars at them. They are self sufficient and through fulfilling their own dreams they’ll help fulfill the dreams you have for your children.

americanBorders

The importance of not knowing what isn’t possible

A Microsoft quote from an NY Times article I’ve already cited has been bugging the crap out of me. It bugged me when I first blogged about this article and it bugged me as I wandered around B&N last night doing the last of my xmass shopping. I wound up in the management section and picked up a book on the top 10 mistakes leaders make. Staring at me as I flipped open chapter 5 was confirmation that I wasn’t nuts.

Here’s the quote that bugged me:

“I’m happy that by hiring a bunch of old hands, who have been through these wars for 10 or 20 years, we at least have a nucleus of people who kind of know what’s possible and what isn’t,”

I’ve lost count of how many times as a software developer I’ve sat down and said “I wonder if this is possible?”. When I created WorkZoo I wondered if it was possible to aggregate all the worlds jobs into a single database – and I got pretty darn close. When I created Geojoey I wondered if it was possible to have a rich pure Ajax application with a client-side MVC model – and it was. When I created LineBuzz I wondered if it was possible to post inline comments on arbitrary text on any web page – yes it’s possible. When I created Feedjit I wondered if it was possible to scale to serve real-time traffic data in a widget. We’re serving almost 100 Million real-time widgets per month now.

I started coding on an Apple IIe and later moved to IBM PC’s so in my youth Apple and Microsoft were symbols of innovation and I wanted to innovate the way they did. Apple’s still doing a great job, but it breaks my heart to see MS floundering like a fish out of water in the new world of broadband, browser standards, open source and dynamic web applications.

Come on guys. Get it together already!! Fire those know-it-alls, hire some new blood and pretend for a moment that the past doesn’t matter and that anything is possible.

Saving server costs with Javascript using distributed processing

I run two consumer web businesses. LineBuzz.com and Geojoey.com. Both have more than 50% of the app impelemented in Javascript and execute in the browser environment.

Something that occurred to me a while ago is that, because most of the execution happens inside the browser and uses our visitors CPU and memory, I don’t have to worry about my servers having to provide that CPU and memory.

I found myself moving processing to the client side where possible.

[Don’t worry, we torture our QA guru with a slow machine on purpose so she will catch any browser slowness we cause]

One down side is that everyone can see the Javascript source code – although it’s compressed which makes it a little harder to reverse engineer. Usually the most CPU intensive code is also the most interesting.

Another disadvantage is that I’m using a bit more bandwidth. But if the app is not shoveling vasts amount of data to do its processing and if I’m OK with exposing parts of my source to competitors, then these issues go away.

Moving execution to the client side opens up some interesting opportunities for distributed processing.

Lets say you have 1 million page views a day on your home page. That’s 365 Million views per year. Lets say each user spends an average of 1 minute on your page because they’re reading something interesting.

So that’s 365 million minutes of processing time you have available per year.

Converted to years, that’s 694 server years. One server working for 694 years or 694 servers working for 1 year.

But lets halve it because we haven’t taken into account load times or the fact that javascript is slower than other languages. So we have 347 server years.

Or put another way, it’s like having 347 additional servers per year.

The cheapest server at ServerBeach.com costs $75 per month or $900 per year. [It’s a 1.7Ghz Celeron with 512Megs RAM – we’re working on minimums here!]

So that translates 347 servers per year into $312,300 per year.

My method isn’t very scientific – and if you go around slowing down peoples machines, you’re not going to have 1 million page views per day for very long. But it gives you a general indication of how much money you can save if you can move parts of a CPU intensive web application to the client side.

So going beyond saving server costs, it’s possible for a high traffic website to do something similar to SETI@HOME and essentially turn the millions of workstations that spend a few minutes on the site each day into a giant distributed processing beowulf cluster using little old Javascript.

Business innovation for developers

Many entrepreneurs, particularly the MBA set, start with competitive analysis. Sure, it’s a valid approach and you might find a gap in the market that you can easily fill or a product or service that could do with some improvement. But if Larry and Sergei did that before they started playing with the PageRank algorithm, they might not have gotten as far as the first keystroke.

Here’s a list of 98 social networking websites on Wikipedia – in case you’re looking at getting into that space.

Many of Einstein’s most original ideas occured to him outside of academia while at the patent office from 1903 to 1911, including his paper on electrodynamics of moving bodies which proposed the idea of special relativity.

When I chat with friends and fellow entrepreneurs, I’ll throw out an idea and the reaction is often a comparison to other ideas. “So and so is working on something similar” or “you should take a look at such and such”. So existing ideas and products are the departure point for our conversation.

If you’re a developer, your strength is in your ability write original code, not in your ability to analyze the market-place.

If you have an idea and you have the ability to implement it yourself, I recommend developing it somewhat before doing any competitive analysis or exposing it to your friends and family. Just take an extra week to play with it. You might come up with a completely original idea.