Month: May 2011

HN is about to overtake Slashdot.

The Alexa graph above only shows the ycombinator.com domain, but most of the traffic to the domain is HN. Also HN is visible on several other domains like hackerne.ws, which isn’t counted in the above graph, so it’s probably already passed Slashdot.

Footnote: This post was something more political re comment scores. But I decided I don’t have the stomach for that fight, so editing and leaving just the data.

May 28, 2011
What would a self-launching Space Shuttle look like?

This is the OK-GLI, part of the Soviet Shuttle Buran program, the largest and most expensive space program in the history of the Soviet Union.

The OK-GLI completed 25 test flights between 1985 and 1988 before being retired. The OK-GLI was powered by four AL-31 jet engines with the fuel tank in the cargo bay. The highest altitude it achieved was 6000 meters or 19,000 ft. It never reached space.

A sister ship in the Buran program, the Buran Spacecraft did reach orbit and completed two unmanned Earth orbits. It was the only orbital flight in the Buran program.

May 28, 2011
What I love about the HN community

The smartest most helpful people hang out there. Thanks for the awesome theme Lucian!

May 28, 2011
Insulin may be a steroid masquerading as a hormone.
Computer-generated image of six insulin molecules assembled in a hexamer.

At the 1998 Winter Olympic Games in Nagano, a Russian medical officer asked the Olympic Committee whether the use of insulin was restricted to athletes who are insulin dependent diabetics. The incident drew attention to insulin and the IOC were swift to ban it as a performance enhancing drug.

I recently posted a question on Quora asking what the best nutrition book is, and Nutrient Timing by Ivy and Portman came up. The book is excellent and has a huge amount of physiology data on how the human body makes and uses energy. The core concept is this:

In the 45 minutes post exercise, your body has very high insulin sensitivity. This period is referred to as the Anabolic phase. By consuming a drink of protein and carbs in a 1:3 or 1:4 ratio, you can significantly boost your insulin level during this period. You can also prolong this period and increase recovery and growth by continuing to consume said drink 2 hours and again at 4 hours post exercise.

Boosting your insulin levels post exercise reduces protein loss from muscles and improves protein retention. It also speeds recovery by replenishing glycogen and creatine stores. Ivy and Portman spend much of the book citing supporting research from many studies including the Marine Corps.

The book also recommends taking an anti-oxidant post exercise to reduce muscle oxidation.

According to Ivy and Portman and many other nutritionists, the best source of protein is Whey Protein Isolate (links to the one I bought recently) which is rich in branched chain amino acids (BCAA’s). The best source of carbs in their recipe is good old Sucrose (table sugar).

Dara Torres at the 2008 summer Olympics.

I was chatting to my wife about the book and she mentioned that Dara Torres (three silvers in the previous summer Olympics and the oldest swimmer to ever be on the US Olympic team) drinks chocolate milk as her favorite recovery drink. Chocolate is rich in anti-oxidants, it contains sucrose and milk has some protein, but not enough to make the ratio 4:1. (The sucrose to protein ratio is probably more like 16:1). So I’m guessing that Dara adds a source of protein like whey protein isolate to the drink.

I’m training this year for either a half or full ironman next year and doing a half and full marathon this year to build up to it. I’m currently doing two 5 mile runs and one long run (currently 10 miles) each week. I also swim 2000m two to three times a week and I do the occasional core strength workout. As I built up to my current volume my energy level collapsed – both mental and physical. Once I started looking at my nutrition and using a post workout recovery nutrition plan I came back with a vengance. Two weeks after starting the plan I ran the fastest 5 mile pace I’ve ever run and felt great afterwards.

After doing further reading online I’ve modified my recipe to have a 1:1 ratio of protein to carbohydrates post workout. A 3:1 or 4:1 ratio seems to build a lot of muscle and my goal is to stay lean but recover fast.

My current post workout nutrition plan is:
- 1 whole raw egg, 56 grams whey protein (two scoops), two tablespoons of molasses (high in phosphorus), a tablespoon of brown sugar, two cups of skim milk, a heaped spoon of cocoa powder. Blend and drink two thirds.
- Drink the remaining third 1.5 to two hours after workout.
May 28, 2011
Where's the Disruption from the Change in Startup Economics?

It’s been a year long break from blogging and getting back to writing and getting a so many new visitors this soon is cool. [Thanks HN!]

This blog runs on the smallest available Linode 512 instance for $20/month. It runs several sites including family blogs and hobby sites. I run nginx on the front end and reverse proxy to 5 Apache children which saves me having to run roughly 100 Apache children to handle the brief spikes of around 20 hits per second I saw yesterday.

Technologies like event-servers (Nginx, node.js, etc) and cheap and reliable virtualization may seem like old hat, but in 2005 Linode was charging $40/month for a 128Meg instance (it’s now $20/month for 512Megs, 88% cheaper) and Nginx was only going to hit main-stream use two years later. In fact Nginx only hit version 1.0 last month.

Five years ago many companies or bloggers would have used a physical box with 3.5 Gigabytes of memory to handle 100 apache instances and the database for this kind of traffic. About $300/month based on current pricing for physical dedicated servers from ServerBeach which hasn’t changed much since 2005.

With the move from hardware and multiprocess servers to virtualization and event-servers, hosting costs have dropped to 6% of what they were 5 years ago. A drop of 94% in a variable cost for any sector changes the economics in a way that usually causes disruption and innovation.

So where is the disruption and innovation happening now that anyone can afford a million-hits-a-month server?

Footnotes: An unstable version of Nginx was available in 2005/2006 and Lighttpd was also an alternative back then for reverse proxying. But it was for hardcore hackers who didn’t mind relatively unstable and bleeding-edge configurations. Mainstream configuration in 2005 was running big memory servers on dedicated machines with a huge number of Apache children. Sadly, much of the web is still run this way. I shudder to think of the environmental impact of all those front-end web boxes. I also don’t address the subject of Keep-Alive on Apache. Disabling Keep-Alive is a way to get a lot more bang for your hardware (specifically you need less memory because you run less apache children) while sacrificing some browser performance. The norm in 2005 was to leave keepalive enabled, but set to a short timeout. With Keepalive set to 15 seconds, my estimate of 100 apache instances for 20 hits per second is probably way too optimistic. With Keep-Alive disabled you would barely handle 20 requests per second with 100 children when taking into account latency per request for slower connections. Bandwidth cost is also a consideration, but gzip and running compressed code, using CDN versions of libs like jQuery that someone else hosts and running a stripped down site with few images helps. [Think Craigslist] With a page size of 100K, Linode’s 400GB bandwidth allowance gives you 4,194,304 pageviews.

May 28, 2011
Domain name search tools

Clarence from Panabee pinged me a few minutes ago mentioning Panabee.com. I hadn’t heard of it and along with nxdom.com I’m going to add it to my toolkit to brainstorm available domain names.

My attitude re names these days fluctates between the-name-is-everything and back to sanity.

A week ago I was obsessed with the domain name WordPrice.com which a friendly cybersquatter wanted to sell me for $700. I even contacted the owner of a very similar mark and kindly got the OK to use it for what I intended. Then backed off at the last minute because a) I refuse to support cybersquatting and b) names are more about creating a well loved and well remembered brand than pretty words.

Keep in mind the relative strength of different types of trademarks when you’re thinking about future brands. Make sure you do a USPTO search and at some point spend $500 with a TM attorney to get your use of your new mark on record and start the trademark clock. I also tend to screenshot a few 100-result google searches for any new potentially strong mark I’m going to use. I date them and file them. [Once you’ve had your ass handed to you in a trademark lawsuit like I have, you get paranoid]

May 27, 2011
It's OK to make an extra $2k per month if you're a programmer. Here's how.
This quote, which went viral 2 months ago and that Steinbeck probably never said, has stuck with me:

“Socialism never took root in America because the poor see themselves not as an exploited proletariat but as temporarily embarrassed millionaires.” ~Maybe not Steinbeck, but it’s cool and it’s true.

As temporarily embarrassed millionaire programmers I feel we sometimes don’t pursue projects that could be buying awesome toys every month, making up for that underwater mortgage or adding valuable incremental income. Projects in this space aren’t the next Facebook or Twitter so they don’t pass the knock-it-out-the-park test.

There are so many ideas in this neglected space that have worked and continue to work. Here’s a start:
1. Do a site:.gov search on Google for downloadable government data.
2. Come up with a range of data that you can republish in directory form. Spend a good few hours doing this and create a healthy collection of options.
3. You might try a site:.edu search too and see if universities have anything interesting.
4. site:.ac.uk site:.ac.za – you get the idea.
5. Experiment with Google’s Keyword Tool.
6. Make sure you’re signed in.
7. Click Traffic Estimator on the left.
8. Enter keywords that describe the data sets you’ve come up with. Enter a few to get a good indication each category or sector’s potential
9. Look at search volume to find sectors that are getting high search volumes.
10. Look at CPC to find busy sectors that also have advertisers that are paying top dollar for clicks.
11. Finally, look at the Competition column to get an idea of how many advertisers are competing in the sector.
12. First prize is high search volume, high CPC, high competition. Sometimes you can’t have it all, but get as close as you can.
13. Now that you’ve chosen a lucrative sector with lots of spendy advertisers and have government or academic data you can republish, figure out a way to generate thousands of pages of content out of that data and solve someone’s problem. The problem could be “Why can’t I find a good site about XYZ when I google for such-and-such.”
14. Give the site a good solid SEO link structure with breadcrumbs and cross-linking. Emphasize relevant keywords with the correct html tags and avoid duplicate content. Make sure the site performance is wicked fast or you’ll get penalized. Nginx reverse-proxying Apache is always a good bet.
15. Tell the right people about your site and tell them regularly via great blog entries, insightful tweets, and networking in your site’s category.
16. Keep monitoring Googlebot crawl activity, how your site is being indexed and tweak it for 6 months until it’s all indexed, ranking and getting around 50K visits per month (1666 visits per day).
17. That’s 150,000 page views per month at 3 pages per visit average.
18. At a 1.6% CTR with 0.85c CPC from Adsense you’re earning $2040 per month.
Update: To clarify, “competition” above refers to competition among advertisers paying for clicks in a sector. More competition is a good thing for publishers because it means higher CPC and more ad inventory i.e. a higher likelihood an ad will be available for a specific page with specific subject matter in your space. [Thanks Bill!]

Update2: My very good mate Joe Heitzeberg runs MediaPiston which is a great way to connect with high quality authors of original content. If you do have a moderate budget and are looking for useful and unique content to get started, give Joe and his crew a shout! They have great authors and have really nailed the QA and feedback process with their platform.
May 27, 2011
SEO: Don't use private registration

This one is short and sweet. A new domain recently wasn’t getting any SEO traffic after 2 months. As soon as the registration was made non-private i.e. we removed the domainsByProxy mask on who owns the domain, it started getting traffic and has been growing ever since.

Correlation does not equal causation, but it does give me pause.

While ICANN has made it clear that the whois database has one purpose only, Google publicly stated they became a registrar to “increase the quality of our search results“.

May 26, 2011
SEO: Google may treat blogs differently

A hobby site I have has around 300,000 pages indexed and good pagerank. It gets a fair amount of SEO traffic which has been growing. The rate at which Google indexes the site has been steadily climbing and is now indexing at around 2 to 3 pages per second.

I added a new page on the site that was linked to from most other pages about a week ago. The page had a query string variable called “ref”. The instant it went live, Googlebot went crazy indexing the page and considering every permutation of “ref” to be a different page, even though the page generated was identical every time. The page quickly appeared in Googles index. I solved it by telling Googlebot to ignore “ref” through Webmaster Tools and temporarily disallowed indexing using robots.txt.

A week later I added another new page. This time I used WordPress.org as a CMS and created a URL, lets call it “/suburl/” and published the new page as “/suburl/blog-entry-name.html”. Again I linked to it from every page on the site.

Googlebot took a sniff at “/suburl/” and at “/suburl/?feed=rss2” and then a day later it grabbed “/suburl/author/authorname” but it never put the page in it’s search index and hasn’t visited since. The bot continues to crawl the rest of the site aggressively.

Back in 2009, Matt Cutts (Google search quality team) mentioned that “WordPress takes care of 80-90% of (the mechanics of) Search Engine Optimization (SEO)”.

A different interpretation is that “WordPress gives Google a machine readable platform with many heuristics that can be used to more accurately assess page quality”.

One of those heuristics is age of the blog and number of blog entries. Creating a fresh blog on a fresh domain or subdomain and publishing a handful of affiliate targeted pages is a common splog (spam blog) tactic. So it’s possible that Google saw my one-page-blog and decided the page doesn’t get put in the index until the blog has credibility.

So from now on when I have content to put online, I’m going to consider carefully whether I’m going to publish it using WordPress as a CMS with just a handful of blog entries, or if I’m going to hand-publish it (which has worked well for me so far).

Let me know if your mileage varies.

May 25, 2011
What an Instant-Edu machine might do to Education

The last two scifi novels I’ve read coincidentally both had a machine that can upload several years of education to your brain in a matter of hours. I was ruminating on what the effect would be on education if we invented the instant-edu machine today.

Imagine you could instant-edu the Harvard Business School syllabus in a few hours. HBS’s 2010 revenue was $467 million. The 2011 MBA program has 937 students. My HBS graduate friends tell me that it’s not about the education, it’s about the networking opportunities. So in the case of HBS, the instant-edu machine would not replace the experience, because really the HBS MBA program is quite possibly the most expensive and time consuming business networking program in the world.

So how would HBS adapt to the instant-edu machine? They might revise the $102,000 tuition fees down slightly since all data contained in textbooks will simply be uploaded in a matter of hours.

Since all documented parts of the syllabus will be instantly absorbed by all students, networking will be the core activity. But students won’t spend the time helping each other retain knowledge because it will already be retained. Instead they would focus on innovating using the knowledge they’ve gained. Throughout the 2 year period, they could innovate in different settings. One class might drop LSD and see if a new interpretation arises. Another might use debate to provoke innovative arguments or solutions.

Or perhaps institutions like Harvard will disappear over time and we will revert to the 17th century Persian coffee house scene where thinkers are free to gather for the price of a cup of coffee and share and debate ideas and come up with new ones. Perhaps each coffee shop could have their own football team…

May 25, 2011