Blog

  • Domain name search tools

    Clarence from Panabee pinged me a few minutes ago mentioning Panabee.com. I hadn’t heard of it and along with nxdom.com I’m going to add it to my toolkit to brainstorm available domain names.

    My attitude re names these days fluctates between the-name-is-everything and back to sanity.

    A week ago I was obsessed with the domain name WordPrice.com which a friendly cybersquatter wanted to sell me for $700. I even contacted the owner of a very similar mark and kindly got the OK to use it for what I intended. Then backed off at the last minute because a) I refuse to support cybersquatting and b) names are more about creating a well loved and well remembered brand than pretty words.

    Keep in mind the relative strength of different types of trademarks when you’re thinking about future brands. Make sure you do a USPTO search and at some point spend $500 with a TM attorney to get your use of your new mark on record and start the trademark clock. I also tend to screenshot a few 100-result google searches for any new potentially strong mark I’m going to use. I date them and file them. [Once you’ve had your ass handed to you in a trademark lawsuit like I have, you get paranoid]

     

  • It's OK to make an extra $2k per month if you're a programmer. Here's how.

    This quote, which went viral 2 months ago and that Steinbeck probably never said, has stuck with me:

    “Socialism never took root in America because the poor see themselves not as an exploited proletariat but as temporarily embarrassed millionaires.” ~Maybe not Steinbeck, but it’s cool and it’s true.

    As temporarily embarrassed millionaire programmers I feel we sometimes don’t pursue projects that could be buying awesome toys every month, making up for that underwater mortgage or adding valuable incremental income. Projects in this space aren’t the next Facebook or Twitter so they don’t pass the knock-it-out-the-park test.

    There are so many ideas in this neglected space that have worked and continue to work. Here’s a start:

    1. Do a site:.gov search on Google for downloadable government data.
    2. Come up with a range of data that you can republish in directory form. Spend a good few hours doing this and create a healthy collection of options.
    3. You might try a site:.edu search too and see if universities have anything interesting.
    4. site:.ac.uk site:.ac.za – you get the idea.
    5. Experiment with Google’s Keyword Tool.
    6. Make sure you’re signed in.
    7. Click Traffic Estimator on the left.
    8. Enter keywords that describe the data sets you’ve come up with. Enter a few to get a good indication each category or sector’s potential
    9. Look at search volume to find sectors that are getting high search volumes.
    10. Look at CPC to find busy sectors that also have advertisers that are paying top dollar for clicks.
    11. Finally, look at the Competition column to get an idea of how many advertisers are competing in the sector.
    12. First prize is high search volume, high CPC, high competition. Sometimes you can’t have it all, but get as close as you can.
    13. Now that you’ve chosen a lucrative sector with lots of spendy advertisers and have government or academic data you can republish, figure out a way to generate thousands of pages of content out of that data and solve someone’s problem. The problem could be “Why can’t I find a good site about XYZ when I google for such-and-such.”
    14. Give the site a good solid SEO link structure with breadcrumbs and cross-linking. Emphasize relevant keywords with the correct html tags and avoid duplicate content. Make sure the site performance is wicked fast or you’ll get penalized. Nginx reverse-proxying Apache is always a good bet.
    15. Tell the right people about your site and tell them regularly via great blog entries, insightful tweets, and networking in your site’s category.
    16. Keep monitoring Googlebot crawl activity, how your site is being indexed and tweak it for 6 months until it’s all indexed, ranking and getting around 50K visits per month (1666 visits per day).
    17. That’s 150,000 page views per month at 3 pages per visit average.
    18. At a 1.6% CTR with 0.85c CPC from Adsense you’re earning $2040 per month.

    Update: To clarify, “competition” above refers to competition among advertisers paying for clicks in a sector. More competition is a good thing for publishers because it means higher CPC and more ad inventory i.e. a higher likelihood an ad will be available for a specific page with specific subject matter in your space. [Thanks Bill!]

    Update2: My very good mate Joe Heitzeberg runs MediaPiston which is a great way to connect with high quality authors of original content. If you do have a moderate budget and are looking for useful and unique content to get started, give Joe and his crew a shout! They have great authors and have really nailed the QA and feedback process with their platform.

  • SEO: Don't use private registration

    This one is short and sweet. A new domain recently wasn’t getting any SEO traffic after 2 months. As soon as the registration was made non-private i.e. we removed the domainsByProxy mask on who owns the domain, it started getting traffic and has been growing ever since.

    Correlation does not equal causation, but it does give me pause.

    While ICANN has made it clear that the whois database has one purpose only, Google publicly stated they became a registrar to “increase the quality of our search results“.

     

  • SEO: Google may treat blogs differently

    A hobby site I have has around 300,000 pages indexed and good pagerank. It gets a fair amount of SEO traffic which has been growing. The rate at which Google indexes the site has been steadily climbing and is now indexing at around 2 to 3 pages per second.

    I added a new page on the site that was linked to from most other pages about a week ago. The page had a query string variable called “ref”. The instant it went live, Googlebot went crazy indexing the page and considering every permutation of “ref” to be a different page, even though the page generated was identical every time. The page quickly appeared in Googles index. I solved it by telling Googlebot to ignore “ref” through Webmaster Tools and temporarily disallowed indexing using robots.txt.

    A week later I added another new page. This time I used WordPress.org as a CMS and created a URL, lets call it “/suburl/” and published the new page as “/suburl/blog-entry-name.html”. Again I linked to it from every page on the site.

    Googlebot took a sniff at “/suburl/” and at “/suburl/?feed=rss2” and then a day later it grabbed “/suburl/author/authorname” but it never put the page in it’s search index and hasn’t visited since. The bot continues to crawl the rest of the site aggressively.

    Back in 2009, Matt Cutts (Google search quality team) mentioned that “WordPress takes care of 80-90% of (the mechanics of) Search Engine Optimization (SEO)”.

    A different interpretation is that “WordPress gives Google a machine readable platform with many heuristics that can be used to more accurately assess page quality”.

    One of those heuristics is age of the blog and number of blog entries. Creating a fresh blog on a fresh domain or subdomain and publishing a handful of affiliate targeted pages is a common splog (spam blog) tactic. So it’s possible that Google saw my one-page-blog and decided the page doesn’t get put in the index until the blog has credibility.

    So from now on when I have content to put online, I’m going to consider carefully whether I’m going to publish it using WordPress as a CMS with just a handful of blog entries, or if I’m going to hand-publish it (which has worked well for me so far).

    Let me know if your mileage varies.

  • What an Instant-Edu machine might do to Education

    The last two scifi novels I’ve read coincidentally both had a machine that can upload several years of education to your brain in a matter of hours. I was ruminating on what the effect would be on education if we invented the instant-edu machine today.

    Imagine you could instant-edu the Harvard Business School syllabus in a few hours. HBS’s 2010 revenue was $467 million. The 2011 MBA program has 937 students.  My HBS graduate friends tell me that it’s not about the education, it’s about the networking opportunities. So in the case of HBS, the instant-edu machine would not replace the experience, because really the HBS MBA program is quite possibly the most expensive and time consuming business networking program in the world.

    So how would HBS adapt to the instant-edu machine? They might revise the $102,000 tuition fees down slightly since all data contained in textbooks will simply be uploaded in a matter of hours.

    Since all documented parts of the syllabus will be instantly absorbed by all students, networking will be the core activity. But students won’t spend the time helping each other retain knowledge because it will already be retained. Instead they would focus on innovating using the knowledge they’ve gained. Throughout the 2 year period, they could innovate in different settings. One class might drop LSD and see if a new interpretation arises. Another might use debate to provoke innovative arguments or solutions.

    Or perhaps institutions like Harvard will disappear over time and we will revert to the 17th century Persian coffee house scene where thinkers are free to gather for the price of a cup of coffee and share and debate ideas and come up with new ones. Perhaps each coffee shop could have their own football team…

     

  • Back blogging

    After a 1 year without feeling the need to hold forth on issues I know very little about, I’m back blogging. The spammers got hold of my blog and I deleted thousands of garbage comments that managed to get through my spam filter. If I accidentally deleted yours or you’re unable to post a comment because you’re flagged as a spammer, email me to fix it.

     

  • How to reliably limit the amount of bandwidth your room mate or bad office colleague uses

    Update: It seems I’ve created a monster. I’ve had my first two Google searchers arrive on this blog entry searching for “limit roomate downloading” and “netgear limit roomate”. Well after years of experimenting with QoS this is the best method I’ve found to do exactly that, so enjoy.

    For part of the year I’m on a rural wifi network that, on a good day, gives me 3 megabits per second download speed and 700kbps upload speed. I’ve tried multiple rural providers, had them rip out their equipment because of the packet loss (that means you Skybeam), I’ve shouted at Qwest to upgrade the local exchange so we can get DSL, but for now I’m completely and utterly stuck on a 3 megabits downlink using Mile High Internet.

    I have an occasional room-mate, my nephew, who downloads movies on iTunes and it uses about 1.5 to 3 megabits. I’ve tried configuring quality of service (QoS) on various routers including Netgear and Linksys/Cisco and the problem is that I need a zero latency connection for my SSH sessions to my servers. So while QoS might be great if everyone’s using non-realtime services like iTunes downloads and web browsing, when you are using SSH or a VoIP product like Skype, it really sucks when someone is hogging the bandwidth.

    The problem arises because of the way most streaming movie players download movies. They don’t just do it using a smooth 1 megabit stream. They’ll suck down as much as your connection allows, buffer it and then use very little bandwidth for a few seconds, and then hog the entire connection again. If you are using SSH and you hit a key, it takes a while for the router to say: “Oh, you wanted some bandwidth, ok fine let me put this guy on hold. There. Now what did you want from me again? Hey you still there? Oh you just wanted one real-time keystroke. And now you’re gone. OK I guess I’ll let the other guy with a lower priority hog the bandwidth again until you hit another keystroke.”

    So the trick, if you want to effectively deal with the movie downloading room-mate is to limit the amount of bandwidth they can use. That way netflix, iTunes, youtube, amazon unbox or any other streaming service has to use a constant 1 megabit rather than bursting to 3 megabits and then dropping to zero – and you always have some bandwidth available without having to wait for the router to do it’s QoS thing.

    Here’s how you do it.

    First install DD-WRT firmware on your router. I use a Netgear WNDR3300 router and after using various Linksys/Cisco routers I swear by this one. It has two built in radios so you can create two wireless networks, one on 2Ghz and one of 5Ghz. It’s also fast and works 100% reliably.

    Then look up your router on dd-wrt’s site and download DD-WRT for your router and install it. I use version “DD-WRT v24-sp2 (10/10/09) std – build 13064”. There are newer builds available, but when I wrote this this was the recommended version.

    Once you’re all set up and you have  your basic wireless network with DD-WRT, make sure you disable QoS (it’s disabled by default).

    Then configure SSH on DD-WRT. It’s a two step process. First you have to click the “Services” tab and enable SSHd. Then you have to click the Administration tab and enable SSH remote management.

    Only the paid version of DD-WRT supports per user bandwidth limits, but I’m going to show you how to do it free with a few shell commands. I actually tried to buy the paid version of DD-WRT to do this, but their site is confusing and I couldn’t get confirmation they actually support this feature. So perhaps the author can clarify in a comment.

    Because you’re going to enter shell commands, I recommend adding a public key for password-less authentication when you log in to DD-WRT. It’s on the same DD-WRT page where you enabled  the SSHd.

    Tip: Remember that with DD-WRT, you have to “Save” any config changes you make and then “Apply settings”. Also DD-WRT gets confused sometimes when you make a lot of changes, so just reboot after saving and it’ll unconfuse itself.

    Now that you have SSHd set up, remote ssh login enabled and hopefully your public ssh keys all set up, here’s what you do.

    SSH to your router IP address:

    ssh root@192.168.1.1

    Enter password.

    Type “ifconfig” and check which interface your router has configured as your internal default gateway. The IP address is often 192.168.1.1. The interface is usually “br0”.

    Lets assume it’s br0.

    Enter the following command which clears all traffic control settings on interface br0:

    tc qdisc del dev br0 root

    Then enter the following:


    tc qdisc add dev br0 root handle 1: cbq \
    avpkt 1000 bandwidth 2mbit

    tc class add dev br0 parent 1: classid 1:1 cbq \
    rate 700kbit allot 1500 prio 5 bounded isolated

    tc filter add dev br0 parent 1: protocol ip \
    prio 16 u32 match ip dst 192.168.1.133 flowid 1:1

    tc filter add dev br0 parent 1: protocol ip \
    prio 16 u32 match ip src 192.168.1.133 flowid 1:1

    These commands will rate limit the IP address 192.168.1.133 to 700 kilobits per second.

    If you’ve set up automatic authentication and you’re running OS X, here’s a perl script that will do all this for you:

    #!/usr/bin/perl

    my $ip = $ARGV[0];
    my $rate = $ARGV[1];

    $ip =~ m/^\d+\.\d+\.\d+\.\d+$/ &&
    $rate =~ m/^\d+$/ ||
    die “Usage: ratelimit.pl\n”;

    $rate = $rate . ‘kbit’;

    print `ssh root\@192.168.1.1 “tc qdisc del dev br0 root”`;

    print `ssh root\@192.168.1.1 “tc qdisc add dev br0 root handle 1: cbq avpkt 1000 bandwidth 2mbit ; tc class add dev br0 parent 1: classid 1:1 cbq rate $rate allot 1500 prio 5 bounded isolated ; tc filter add dev br0 parent 1: protocol ip prio 16 u32 match ip dst $ip flowid 1:1 ; tc filter add dev br0 parent 1: protocol ip prio 16 u32 match ip src $ip flowid 1:1″`;

    You’ll see a few responses for DD-WRT when you run the script and might see an error about a file missing but that’s just because you tried to delete a rule on interface br0 that might not have existed when the script starts.

    These rules put a hard limit on how  much bandwidth an IP address can use. What you’ll find is that even if you rate limit your room mate to 1 megabit, as long as you have 500 kbit all to yourself, your SSH sessions will have absolutely no latency, Skype will not stutter, and life will be good again. I’ve tried many different configurations with various QoS products and have not ever achieved results as good as I’ve gotten with these rules.

    Notes: I’ve configured the rules on the internal interface even though most QoS rules are generally configured on an external interface because it’s the only thing that really really seems to work. The Cisco engineers among you may disagree, but go try it yourself before you comment. I’m using the Linux ‘tc’ command and the man page is here.

    PS: If you are looking for a great router to install DD-WRT on, try the Cisco-Linksys E3200. It has a ton of RAM and the CPU is actually faster at 500 MHz than the E4200 which is more expensive and only has a 480 MHz CPU. It also is the cheapest Gigabit Ethernet E series router that Cisco-Linksys offers. Here is the Cisco-Linksys E3200’s full specs on DD-WRT’s site. The E3200 is fully DD-WRT compatible but if you are lazy and don’t want to mess with DD-WRT, check out the built in QoS (Quality of Service) that the E3200 has built in on this video.

  • The relative non-risk of startups

    Based on recent events I suspect an investment axiom might exist that says: The further an investor is abstracted away from the underlying asset they’re investing in, the greater the risk.

    This has been shown recently to be true with Mortgage backed securities, credit default swaps, the black box that is the hedge fund industry and even sovereign debt may qualify.

    When you are shielded from your investment by layers of structure, marketing, repackaging and sales teams, you are too far away to hear the alarm bells when they’re ringing.

    That got me thinking about the relative risk of being an angel investor in young companies. Angel investors meet with the founders, use the product and in many cases craft the investment terms themselves. Spending a few weeks negotiating a deal with an entrepreneur is itself a revealing process. The investor is exposed to a mountain of data on the underlying asset they’re investing in.

    The recent excellent Bloomberg article on the under performance of commodity ETF’s brought this difference home for me. Suited and booted bankers sell commodity ETF’s daily with a prospectus that tells you you’re investing in gold or oil or copper. The impression created is that you’re investing in the underlying asset when in fact you’re investing in a fund that is trading monthly futures contracts for the commodity. Two years later you’re left wondering why your investment has lost 20% while the underlying commodity has gained.

    The complexity of financial products and the distance between the average investor and the underlying assets they’re investing in has, I believe, peaked. As the financial crisis that was started in 2008 continues to play out, during next decade I strongly suspect there will be a return to less complexity and a desire to know, touch and meet with the assets that underlie each investment.

    While the likelihood of failure in young businesses is high, as an angel investor you know exactly what you’re getting and you have the ability to influence the performance of your asset. Try finding that on Wall Street.

  • Are you building an R&D lab or a business

    Take Twitter in a parallel universe. The team builds a great useful and viral product. They start growing like crazy and hit their first million members. The growth machine keeps pumping and everyone is watching the hot Alexa and Compete graphs cranking away.

    They start getting their first acquisition offers. But the smart folks know the second differential of their graphs is still wildly positive (it’s curving up). They decide to hold off on a sale because they figure that even though they have to raise another round to buy infrastructure, their equity will still be worth more net net.

    They keep growing and that second differential gets a little smaller as the curve starts flattening out into a line. Then right before the line turns into the other half of an S they hire Allen and Company, line up all the acquirors and sell for $3Bn to Google.

    What just happened is a kick ass group of product guys teamed up with a kick ass group of financiers to create an R&D lab. The lab came up with a hit product and was acquired. Make no mistake, this is a very very good thing! In this parallel universe the amazing product that is Twitter is combined with a company with the business infrastructure and knowledge to turn it into a money printing machine. That creates jobs, brings foreign currency back into the US through exported services and of course the wealth creation event for the founders has a trickle-down effect if you’re a fan of supply side economics.

    Now lets step back into our Universe (capital U because I don’t really believe in this parallel universe stuff). Another group of kick-ass product guys called Larry and Sergei teamed up with a group of kick-ass financiers called Sequoia in 1999. A guy called Eric Schmidt who is a battle hardened CEO from a profit making company that got their ass handed to them by Microsoft joins the party.

    In 2000 Google launched AdWords and the rest is business model history. A history that you will never hear because once the company started printing money they went dark. There are tales of Bill Gross having invented AdWords, legal action, a possible out of court settlement – but no one will ever know the full details of these early days and we have almost zero visibility into the later story of how Google turned that product into a money printing business.

    The stories of successful transitions from product to business are never told. Even if they were they would bore most of us  because they are not fun garage-to-zillionare stories. They are stories where the star actors are cash-flow plans, old guys with experience and teams of suit-wearing sales people.

    The thing that attracts most geeks (also called Product Guys) to startups is the garage to zillionare story through an exit. And that’s OK provided you get your head screwed on straight and understand that you are an R&D lab who’s goal is to get acquired. So go and make yourself a credible threat. Make yourself strategically interesting. Go and build the kinds of relationships that demonstrate your worth to potential acquirors, get them addicted to your data and result in an exit.

    [Quick aside: I spent the day skiing a while back with a great guy who heads up a certain lab at Stanford. They came up with an amazing product that you now use every day. They teamed up with an A list VC with the specific intent of selling to Google. That’s exactly what they did and it has improved our lives and Google’s business model. So again, the R&D lab approach is very very OK.]

    The other smaller group of founders are business geeks. I’m friends with a handful of company founders and CEO’s in Seattle who absolutely personify this group. Everyone of them was a VP in a larger company. They all have MBA’s from top schools. And every one of them is focused on generating cash in their business. The road they’ve chosen is a longer, harder road with a lower chance of success but a much higher reward (think Michael Dell, Bill Gates, Larry Ellison) if they succeed.

    Both paths are morally and strategically OK. You just need to know which you’re on and make sure your investors and the rest of the team are using the same playbook.

    temet nosce (“thine own self thou must know”)

  • Bandwidth providers: Please follow Google's lead in helping startups, the environment and yourselves

    There’s a post on Hacker News today pointing to a few open source javascript libraries that Google is hosting on their content distribution network. ScriptSrc.net has a great UI that gives you an easy way to link to the libs from your web pages. Developers and companies can link to these scripts from their own websites and gain the following benefits:

    • Your visitor may have already cached the script on another website so your page will load faster
    • The script is hosted on a different domain which allows your browser to create more concurrent connections while fetching your content – another speed increase.
    • It saves you the bandwidth of having to serve that content up yourself which can result in massive cost savings if you’re a high traffic site.
    • Just like your visitor already cached the content, their workstation or local DNS server may also have the CDN’s IP address cached which further speeds load time.

    While providing a service like this does cost Google or the providing company more in hosting, it provides an overall efficiency gain. Less bandwidth and CPU is used on the Web as a whole by Google providing this service. That means less cooling is required in data centers, less networking hardware needs to be manufactured to support the traffic on the web and so on.

    The environment benefits as a whole by Google or another large provider hosting these frequently loaded scripts for us.

    The savings are passed on to lone developers and startups who are using the scripts. For smaller companies who are trying to minimize costs while dealing with massive growth this can result in a huge cost savings that helps them to continue to innovate.

    The savings are also passed on to bandwidth providers like NTT, AT&T, Comcast, Time Warner, Qwest and other bandwidth providers who’s customers consume less bandwidth as a result.

    So my suggestion is that Google and bandwidth providers collaborate to come up with a package of the most used open source components online and keep the list up to date. Then provide local mirrors of each of these packages with a fallback mechanism if the package isn’t available. Google should define an IP address similar to their easy to remember DNS ip address 8.8.8.8 that hosts these scripts. Participating ISP’s route traffic destined for that IP address to a local mirror using a system similar to IP Anycast. An alternative URL is provided via a query string. e.g.

    http://9.9.9.9/js/prototype.1.5.0.js?fallback=http://mysite.com/myjs/myprototype.1.5.0.js

    If the local ISP isn’t participating the request is simply routed to Google’s 9.9.9.9 server as per normal.

    If the local ISP (or Google) doesn’t have a copy of the script in their mirror it just returns a 302 redirect to the fallback URL which the webmaster has provided and which usually points to the webmaster’s own site. A mechanism for multiple fallbacks can easily be created e.g. fallback1, fallback2, etc.

    Common scripts, icon libraries and flash components can be hosted this way. There may even be scenarios where a company (like Google) is used by such a large percentage of the Net population that it makes sense to put them on the 9.9.9.9 mirror system so that local bandwidth providers can serve up commonly used components rather than have to fetch them from via their upstream providers. Google’s logo for example.