Blog

SEO: Google may treat blogs differently

A hobby site I have has around 300,000 pages indexed and good pagerank. It gets a fair amount of SEO traffic which has been growing. The rate at which Google indexes the site has been steadily climbing and is now indexing at around 2 to 3 pages per second.

I added a new page on the site that was linked to from most other pages about a week ago. The page had a query string variable called “ref”. The instant it went live, Googlebot went crazy indexing the page and considering every permutation of “ref” to be a different page, even though the page generated was identical every time. The page quickly appeared in Googles index. I solved it by telling Googlebot to ignore “ref” through Webmaster Tools and temporarily disallowed indexing using robots.txt.

A week later I added another new page. This time I used WordPress.org as a CMS and created a URL, lets call it “/suburl/” and published the new page as “/suburl/blog-entry-name.html”. Again I linked to it from every page on the site.

Googlebot took a sniff at “/suburl/” and at “/suburl/?feed=rss2” and then a day later it grabbed “/suburl/author/authorname” but it never put the page in it’s search index and hasn’t visited since. The bot continues to crawl the rest of the site aggressively.

Back in 2009, Matt Cutts (Google search quality team) mentioned that “WordPress takes care of 80-90% of (the mechanics of) Search Engine Optimization (SEO)”.

A different interpretation is that “WordPress gives Google a machine readable platform with many heuristics that can be used to more accurately assess page quality”.

One of those heuristics is age of the blog and number of blog entries. Creating a fresh blog on a fresh domain or subdomain and publishing a handful of affiliate targeted pages is a common splog (spam blog) tactic. So it’s possible that Google saw my one-page-blog and decided the page doesn’t get put in the index until the blog has credibility.

So from now on when I have content to put online, I’m going to consider carefully whether I’m going to publish it using WordPress as a CMS with just a handful of blog entries, or if I’m going to hand-publish it (which has worked well for me so far).

Let me know if your mileage varies.

May 25, 2011
What an Instant-Edu machine might do to Education

The last two scifi novels I’ve read coincidentally both had a machine that can upload several years of education to your brain in a matter of hours. I was ruminating on what the effect would be on education if we invented the instant-edu machine today.

Imagine you could instant-edu the Harvard Business School syllabus in a few hours. HBS’s 2010 revenue was $467 million. The 2011 MBA program has 937 students. My HBS graduate friends tell me that it’s not about the education, it’s about the networking opportunities. So in the case of HBS, the instant-edu machine would not replace the experience, because really the HBS MBA program is quite possibly the most expensive and time consuming business networking program in the world.

So how would HBS adapt to the instant-edu machine? They might revise the $102,000 tuition fees down slightly since all data contained in textbooks will simply be uploaded in a matter of hours.

Since all documented parts of the syllabus will be instantly absorbed by all students, networking will be the core activity. But students won’t spend the time helping each other retain knowledge because it will already be retained. Instead they would focus on innovating using the knowledge they’ve gained. Throughout the 2 year period, they could innovate in different settings. One class might drop LSD and see if a new interpretation arises. Another might use debate to provoke innovative arguments or solutions.

Or perhaps institutions like Harvard will disappear over time and we will revert to the 17th century Persian coffee house scene where thinkers are free to gather for the price of a cup of coffee and share and debate ideas and come up with new ones. Perhaps each coffee shop could have their own football team…

May 25, 2011
Back blogging

After a 1 year without feeling the need to hold forth on issues I know very little about, I’m back blogging. The spammers got hold of my blog and I deleted thousands of garbage comments that managed to get through my spam filter. If I accidentally deleted yours or you’re unable to post a comment because you’re flagged as a spammer, email me to fix it.

May 25, 2011
How to reliably limit the amount of bandwidth your room mate or bad office colleague uses

Update: It seems I’ve created a monster. I’ve had my first two Google searchers arrive on this blog entry searching for “limit roomate downloading” and “netgear limit roomate”. Well after years of experimenting with QoS this is the best method I’ve found to do exactly that, so enjoy.

For part of the year I’m on a rural wifi network that, on a good day, gives me 3 megabits per second download speed and 700kbps upload speed. I’ve tried multiple rural providers, had them rip out their equipment because of the packet loss (that means you Skybeam), I’ve shouted at Qwest to upgrade the local exchange so we can get DSL, but for now I’m completely and utterly stuck on a 3 megabits downlink using Mile High Internet.

I have an occasional room-mate, my nephew, who downloads movies on iTunes and it uses about 1.5 to 3 megabits. I’ve tried configuring quality of service (QoS) on various routers including Netgear and Linksys/Cisco and the problem is that I need a zero latency connection for my SSH sessions to my servers. So while QoS might be great if everyone’s using non-realtime services like iTunes downloads and web browsing, when you are using SSH or a VoIP product like Skype, it really sucks when someone is hogging the bandwidth.

The problem arises because of the way most streaming movie players download movies. They don’t just do it using a smooth 1 megabit stream. They’ll suck down as much as your connection allows, buffer it and then use very little bandwidth for a few seconds, and then hog the entire connection again. If you are using SSH and you hit a key, it takes a while for the router to say: “Oh, you wanted some bandwidth, ok fine let me put this guy on hold. There. Now what did you want from me again? Hey you still there? Oh you just wanted one real-time keystroke. And now you’re gone. OK I guess I’ll let the other guy with a lower priority hog the bandwidth again until you hit another keystroke.”

So the trick, if you want to effectively deal with the movie downloading room-mate is to limit the amount of bandwidth they can use. That way netflix, iTunes, youtube, amazon unbox or any other streaming service has to use a constant 1 megabit rather than bursting to 3 megabits and then dropping to zero – and you always have some bandwidth available without having to wait for the router to do it’s QoS thing.

Here’s how you do it.

First install DD-WRT firmware on your router. I use a Netgear WNDR3300 router and after using various Linksys/Cisco routers I swear by this one. It has two built in radios so you can create two wireless networks, one on 2Ghz and one of 5Ghz. It’s also fast and works 100% reliably.

Then look up your router on dd-wrt’s site and download DD-WRT for your router and install it. I use version “DD-WRT v24-sp2 (10/10/09) std – build 13064”. There are newer builds available, but when I wrote this this was the recommended version.

Once you’re all set up and you have your basic wireless network with DD-WRT, make sure you disable QoS (it’s disabled by default).

Then configure SSH on DD-WRT. It’s a two step process. First you have to click the “Services” tab and enable SSHd. Then you have to click the Administration tab and enable SSH remote management.

Only the paid version of DD-WRT supports per user bandwidth limits, but I’m going to show you how to do it free with a few shell commands. I actually tried to buy the paid version of DD-WRT to do this, but their site is confusing and I couldn’t get confirmation they actually support this feature. So perhaps the author can clarify in a comment.

Because you’re going to enter shell commands, I recommend adding a public key for password-less authentication when you log in to DD-WRT. It’s on the same DD-WRT page where you enabled the SSHd.

Tip: Remember that with DD-WRT, you have to “Save” any config changes you make and then “Apply settings”. Also DD-WRT gets confused sometimes when you make a lot of changes, so just reboot after saving and it’ll unconfuse itself.

Now that you have SSHd set up, remote ssh login enabled and hopefully your public ssh keys all set up, here’s what you do.

SSH to your router IP address:

ssh root@192.168.1.1

Enter password.

Type “ifconfig” and check which interface your router has configured as your internal default gateway. The IP address is often 192.168.1.1. The interface is usually “br0”.

Lets assume it’s br0.

Enter the following command which clears all traffic control settings on interface br0:

tc qdisc del dev br0 root

Then enter the following:

tc qdisc add dev br0 root handle 1: cbq \ avpkt 1000 bandwidth 2mbit

tc class add dev br0 parent 1: classid 1:1 cbq \
rate 700kbit allot 1500 prio 5 bounded isolated

tc filter add dev br0 parent 1: protocol ip \
prio 16 u32 match ip dst 192.168.1.133 flowid 1:1

tc filter add dev br0 parent 1: protocol ip \
prio 16 u32 match ip src 192.168.1.133 flowid 1:1

These commands will rate limit the IP address 192.168.1.133 to 700 kilobits per second.

If you’ve set up automatic authentication and you’re running OS X, here’s a perl script that will do all this for you:
#!/usr/bin/perl

my $ip = $ARGV[0];
my $rate = $ARGV[1];

$ip =~ m/^\d+\.\d+\.\d+\.\d+$/ &&
$rate =~ m/^\d+$/ ||
die “Usage: ratelimit.pl\n”;

$rate = $rate . ‘kbit’;

print `ssh root\@192.168.1.1 “tc qdisc del dev br0 root”`;

print `ssh root\@192.168.1.1 “tc qdisc add dev br0 root handle 1: cbq avpkt 1000 bandwidth 2mbit ; tc class add dev br0 parent 1: classid 1:1 cbq rate $rate allot 1500 prio 5 bounded isolated ; tc filter add dev br0 parent 1: protocol ip prio 16 u32 match ip dst $ip flowid 1:1 ; tc filter add dev br0 parent 1: protocol ip prio 16 u32 match ip src $ip flowid 1:1″`;

You’ll see a few responses for DD-WRT when you run the script and might see an error about a file missing but that’s just because you tried to delete a rule on interface br0 that might not have existed when the script starts.

These rules put a hard limit on how much bandwidth an IP address can use. What you’ll find is that even if you rate limit your room mate to 1 megabit, as long as you have 500 kbit all to yourself, your SSH sessions will have absolutely no latency, Skype will not stutter, and life will be good again. I’ve tried many different configurations with various QoS products and have not ever achieved results as good as I’ve gotten with these rules.

Notes: I’ve configured the rules on the internal interface even though most QoS rules are generally configured on an external interface because it’s the only thing that really really seems to work. The Cisco engineers among you may disagree, but go try it yourself before you comment. I’m using the Linux ‘tc’ command and the man page is here.

PS: If you are looking for a great router to install DD-WRT on, try the Cisco-Linksys E3200. It has a ton of RAM and the CPU is actually faster at 500 MHz than the E4200 which is more expensive and only has a 480 MHz CPU. It also is the cheapest Gigabit Ethernet E series router that Cisco-Linksys offers. Here is the Cisco-Linksys E3200’s full specs on DD-WRT’s site. The E3200 is fully DD-WRT compatible but if you are lazy and don’t want to mess with DD-WRT, check out the built in QoS (Quality of Service) that the E3200 has built in on this video.

January 26, 2011
The relative non-risk of startups

Based on recent events I suspect an investment axiom might exist that says: The further an investor is abstracted away from the underlying asset they’re investing in, the greater the risk.

This has been shown recently to be true with Mortgage backed securities, credit default swaps, the black box that is the hedge fund industry and even sovereign debt may qualify.

When you are shielded from your investment by layers of structure, marketing, repackaging and sales teams, you are too far away to hear the alarm bells when they’re ringing.

That got me thinking about the relative risk of being an angel investor in young companies. Angel investors meet with the founders, use the product and in many cases craft the investment terms themselves. Spending a few weeks negotiating a deal with an entrepreneur is itself a revealing process. The investor is exposed to a mountain of data on the underlying asset they’re investing in.

The recent excellent Bloomberg article on the under performance of commodity ETF’s brought this difference home for me. Suited and booted bankers sell commodity ETF’s daily with a prospectus that tells you you’re investing in gold or oil or copper. The impression created is that you’re investing in the underlying asset when in fact you’re investing in a fund that is trading monthly futures contracts for the commodity. Two years later you’re left wondering why your investment has lost 20% while the underlying commodity has gained.

The complexity of financial products and the distance between the average investor and the underlying assets they’re investing in has, I believe, peaked. As the financial crisis that was started in 2008 continues to play out, during next decade I strongly suspect there will be a return to less complexity and a desire to know, touch and meet with the assets that underlie each investment.

While the likelihood of failure in young businesses is high, as an angel investor you know exactly what you’re getting and you have the ability to influence the performance of your asset. Try finding that on Wall Street.

July 29, 2010
Are you building an R&D lab or a business

Take Twitter in a parallel universe. The team builds a great useful and viral product. They start growing like crazy and hit their first million members. The growth machine keeps pumping and everyone is watching the hot Alexa and Compete graphs cranking away.

They start getting their first acquisition offers. But the smart folks know the second differential of their graphs is still wildly positive (it’s curving up). They decide to hold off on a sale because they figure that even though they have to raise another round to buy infrastructure, their equity will still be worth more net net.

They keep growing and that second differential gets a little smaller as the curve starts flattening out into a line. Then right before the line turns into the other half of an S they hire Allen and Company, line up all the acquirors and sell for $3Bn to Google.

What just happened is a kick ass group of product guys teamed up with a kick ass group of financiers to create an R&D lab. The lab came up with a hit product and was acquired. Make no mistake, this is a very very good thing! In this parallel universe the amazing product that is Twitter is combined with a company with the business infrastructure and knowledge to turn it into a money printing machine. That creates jobs, brings foreign currency back into the US through exported services and of course the wealth creation event for the founders has a trickle-down effect if you’re a fan of supply side economics.

Now lets step back into our Universe (capital U because I don’t really believe in this parallel universe stuff). Another group of kick-ass product guys called Larry and Sergei teamed up with a group of kick-ass financiers called Sequoia in 1999. A guy called Eric Schmidt who is a battle hardened CEO from a profit making company that got their ass handed to them by Microsoft joins the party.

In 2000 Google launched AdWords and the rest is business model history. A history that you will never hear because once the company started printing money they went dark. There are tales of Bill Gross having invented AdWords, legal action, a possible out of court settlement – but no one will ever know the full details of these early days and we have almost zero visibility into the later story of how Google turned that product into a money printing business.

The stories of successful transitions from product to business are never told. Even if they were they would bore most of us because they are not fun garage-to-zillionare stories. They are stories where the star actors are cash-flow plans, old guys with experience and teams of suit-wearing sales people.

The thing that attracts most geeks (also called Product Guys) to startups is the garage to zillionare story through an exit. And that’s OK provided you get your head screwed on straight and understand that you are an R&D lab who’s goal is to get acquired. So go and make yourself a credible threat. Make yourself strategically interesting. Go and build the kinds of relationships that demonstrate your worth to potential acquirors, get them addicted to your data and result in an exit.

[Quick aside: I spent the day skiing a while back with a great guy who heads up a certain lab at Stanford. They came up with an amazing product that you now use every day. They teamed up with an A list VC with the specific intent of selling to Google. That’s exactly what they did and it has improved our lives and Google’s business model. So again, the R&D lab approach is very very OK.]

The other smaller group of founders are business geeks. I’m friends with a handful of company founders and CEO’s in Seattle who absolutely personify this group. Everyone of them was a VP in a larger company. They all have MBA’s from top schools. And every one of them is focused on generating cash in their business. The road they’ve chosen is a longer, harder road with a lower chance of success but a much higher reward (think Michael Dell, Bill Gates, Larry Ellison) if they succeed.

Both paths are morally and strategically OK. You just need to know which you’re on and make sure your investors and the rest of the team are using the same playbook.

temet nosce (“thine own self thou must know”)

July 8, 2010
Bandwidth providers: Please follow Google's lead in helping startups, the environment and yourselves
There’s a post on Hacker News today pointing to a few open source javascript libraries that Google is hosting on their content distribution network. ScriptSrc.net has a great UI that gives you an easy way to link to the libs from your web pages. Developers and companies can link to these scripts from their own websites and gain the following benefits:
- Your visitor may have already cached the script on another website so your page will load faster
- The script is hosted on a different domain which allows your browser to create more concurrent connections while fetching your content – another speed increase.
- It saves you the bandwidth of having to serve that content up yourself which can result in massive cost savings if you’re a high traffic site.
- Just like your visitor already cached the content, their workstation or local DNS server may also have the CDN’s IP address cached which further speeds load time.
While providing a service like this does cost Google or the providing company more in hosting, it provides an overall efficiency gain. Less bandwidth and CPU is used on the Web as a whole by Google providing this service. That means less cooling is required in data centers, less networking hardware needs to be manufactured to support the traffic on the web and so on.

The environment benefits as a whole by Google or another large provider hosting these frequently loaded scripts for us.

The savings are passed on to lone developers and startups who are using the scripts. For smaller companies who are trying to minimize costs while dealing with massive growth this can result in a huge cost savings that helps them to continue to innovate.

The savings are also passed on to bandwidth providers like NTT, AT&T, Comcast, Time Warner, Qwest and other bandwidth providers who’s customers consume less bandwidth as a result.

So my suggestion is that Google and bandwidth providers collaborate to come up with a package of the most used open source components online and keep the list up to date. Then provide local mirrors of each of these packages with a fallback mechanism if the package isn’t available. Google should define an IP address similar to their easy to remember DNS ip address 8.8.8.8 that hosts these scripts. Participating ISP’s route traffic destined for that IP address to a local mirror using a system similar to IP Anycast. An alternative URL is provided via a query string. e.g.

http://9.9.9.9/js/prototype.1.5.0.js?fallback=http://mysite.com/myjs/myprototype.1.5.0.js

If the local ISP isn’t participating the request is simply routed to Google’s 9.9.9.9 server as per normal.

If the local ISP (or Google) doesn’t have a copy of the script in their mirror it just returns a 302 redirect to the fallback URL which the webmaster has provided and which usually points to the webmaster’s own site. A mechanism for multiple fallbacks can easily be created e.g. fallback1, fallback2, etc.

Common scripts, icon libraries and flash components can be hosted this way. There may even be scenarios where a company (like Google) is used by such a large percentage of the Net population that it makes sense to put them on the 9.9.9.9 mirror system so that local bandwidth providers can serve up commonly used components rather than have to fetch them from via their upstream providers. Google’s logo for example.
July 6, 2010
The Coming Social Advertising Revolution
Facebook has over 400 million active users and members spend over 951 man-years on the site each month. Facebook is passing Google this year as the most visited site in the US and is going to earn somewhere between $710M and $1.1B in revenue this year.

Google on the other hand have a $27B revenue run rate for 2010 [based on Q1 2010 earnings]. With similar on-site traffic they are doing 25 times Facebook’s revenue. Google have had a long time to learn about printing money efficiently, but even so that’s a blush-worthy statistic for the Facebook executive team. So why the difference in performance?

Facebook has a crisis of intent. When a visitor signs in to Facebook their intent is to socialize. They don’t want to buy anything and they certainly don’t want to click on ads that lead them to buying something. Facebook has the best data on the web about the people using their service. But all that wonderful data is useless without intent.

When a visitor hits Google their intent is to see something, learn something, do something etc and these can be cajoled into buying decisions. If Google guides the user to the right vendor, they make a vendor money and can share in some of the revenue. Google’s data on each visitor pales in comparison to Facebook. But Google catches each visitor at the moment they have intent. And that is the power of the search business.

Facebook needs to solve their crisis of intent. Intent is the missing ingredient that stands between Facebook and $27 Billion in revenue multiplied by the social graph and profile data that Google doesn’t have.

Changing Facebook.com to capture visitor attention when they have buying intent risks destroying a valuable asset. So instead Facebook have decided to take their data to the places where visitors have intent: The rest of the web.

“If intent won’t come to Facebook, we’ll take Facebook to intent.” ~Mark Zuckerburg [may have said this]

In the next 3 to 12 months Facebook are going to roll out their own ad network for publishers – a direct competitor to Google AdSense.

If Facebook can use my interests, sex, age, location, who I’m friends with and their age, location, interests etc. to infer that when I’m searching for a ‘bobbin’ it’s probably because I want to tie steelhead flies with it, then it makes more sense for every publisher on the web to use Facebook’s ad network than Google or anyone else because they will simply make more money.

Facebook’s Ad Network will make publishers more money and increase engagement.

Facebook Connect was phase 1: “Lets see if a distributed Facebook gets traction and doesn’t raise privacy flags.” It was a resounding success.

The Social Web and Open Graph is phase 2: “Lets see if we can share some user data using an opt-out model.” From the Facebook blog: “For example, now if you’re logged into Facebook and go to Pandora for the first time, it can immediately start playing songs from bands you’ve liked across the web.”

There have been the usual privacy rumblings, but so far the Facebook community seems to be OK with an opt-out model of distributed data sharing.

The significance of this is staggering: Facebook have positioned themselves for the perfect AdSense kill-shot. 6 to 12 months from now publishers will be able to integrate Facebook’s applications and ad network on their blog or website and get:
- Better revenue than Google AdSense or any other ad network due to better targeting
- Increased user engagement through social features
- Increased virality through recruiting other Facebook members
- Increased data on each visitor from their very first pageview reducing bounce.
Advertisers will get:
- Less click fraud because you’re no longer just an IP address and a cookie.
- Better targeting including the holy grail of demographics: Age, Sex, Location.
- Ability to show your ad at the moment a user has buying intent on a search engine, a blog about visiting Egypt, etc.
A significant portion of Google’s $27 Billion in revenue this year will come from their publisher ad network. Google knows what’s at stake. That is why they are willing to bet GMail on products like Google Buzz.

Facebook is the most serious threat to Google’s business that they have faced. If Facebook plays this perfectly, they will kill the bear and 5 to 10 years from now will be the largest and most profitable ad network on Earth.

Anyone who plans to compete with them will have to do better than textual ad targeting.
May 3, 2010
PlatformFu for Hackers and Startups
Being over 35 has it’s advantages. Us old(ish) timers have lived through Microsoft using their platform to beat the hell out of Novell, Netscape, Real Player and others. Watched Eric Schmidt’s ascension from platform victim to platform player. And learned that platforms are honey traps that give good honey but you might get caught.

Twitter Investor Fred Wilson wrote a much talked about post earlier this week that sparked a discussion about whether Twitter would implement critical apps themselves. Seesmic founder Loic issued a stark warning to Twitter developers today. Apple continues to bar Adobe’s Flash platform from Apple’s iPhone platform and Adobe evangelist Lee Brimelow pulls no punches in his “Apple slaps developers in the face” post today.

Ten years ago a developer was faced with a much scarier platform landscape. You either build on Microsoft’s monopoly operating system and risk them implementing your app themselves, or stop being a desktop developer. Web Applications were really Web Sites, web platforms didn’t exist and mobile platforms were completely proprietary.

These days playing with platforms is a little easier because you have a range of platforms and integration methods to choose from. You can build a Facebook app that runs inside Facebook or integrate via FB Connect. You can choose to build on Twitter instead. And if you like you can integrate both to hedge your bets and add social features of your own on a completely external website. If you’re building a mobile app you have Droid and the iPhone to choose from and if both suck, well both platforms have a web browser so a lightweight web interface is an option too. Even in the desktop OS arena if Microsoft rubs you the wrong way there’s always the smaller but more spendy Apple market to go after.

When formulating your platform strategy it’s important to put yourself in the providers shoes and think about the following:
1. Are they wildly profitable or is it possible they might go out of business or radically redefine their business?
2. Have they figured out their business model yet or might your app become their model?
3. Is their API locked down and unlikely to change or is it evolving as they figure out what business they’re in and how much of their revenue they want to give away via their API?
4. Are they waging a strategic war with anyone that may affect your business and your app?
5. Does any part of your own business compete with any part of their business? How about in future?
Being first to market on a new platform has it’s advantages. My former colleagues at UrbanSpoon got their iPhone app in an Apple ad because they were early adopters of the platform. Smart move – and smarter given that they weren’t betting the farm on the platform. But early adopters of the Facebook platform saw revenues and traffic change as Facebook evolved the platform early on.

So when building your app, first carefully assess the state of the platform and then decide how and at what level you want to engage it.
April 9, 2010
Perl: Kicking your language's ass since 1988

The video below is Perl’s development history since Larry Wall and a small team started out in January 1988. It’s visualized using gource. Notice how dev activity has continued to increase all the way to 2010.

Perl is a powerful language. It’s also fast and everything you need has already been implemented, debugged, refactored, reimplemented and made rock solid. If you ever have a problem, it’s already been solved by someone else. When I was in my 20’s I was a big fan of the new-new thing. Now, as a startup owner taking strategic risks and trying to reduce risks in other areas of the business, I love Perl because I know it will do right by me and I deploy code knowing I’m not betting on a language that might one day grow up to be what Perl already is.

Watch the video in 720 hidef on full screen (Youtube link) to see all the labels. It’s awesome realizing how much work and evolution has gone into some of the core libs that I use.

April 3, 2010