Finding Cheap Fast Internet in South Africa

I’ve been in Cape Town for a little over two months now and will be here for a few more weeks. I’ve hunted around for fast Internet and tried a few options. Here’s what I’ve found and maybe it’ll help you.

I’m specifically interested in international bandwidth to the USA and my benchmarks are based on buying 1.5 to 2 gigabyte movies from the iTunes store and downloading them or transferring big chunks of data from our Seattle data center via SCP [or what you might think of as SFTP].

  • Mweb home ADSL is generally slow for international bandwidth. You’re lucky if you get 200 kbps on the 1 megabit line. This is my absolute-in-case-of-emergency option I’m using at the place I’m staying because it is so slow. 
  • The 10 megabit business ADSL option that Mweb provides is nice and fast and you’ll get 3 to 6 megabits per second international bandwidth but it’s quite expensive. A friend has this at a building where I rent office space in Cape Town city bowl. As a side note: When the Seacom cable went down recently they didn’t slow down at all even though Mweb home subscribers were horribly slow because Mweb prioritizes their business customers much higher than home.
  • Vodacom’s little USB 3G pay as you go modem is very nice and fast at around 3 to 5 megabits international bandwidth, but it’s quite expensive. They charge per gig transferred and it’s something like $20 per gigabyte. I’ve run through my Vodacom little red USB modem and won’t be refilling it because it’s too pricey, although very reliable.
  • Vodacom’s portable hotspot option if you have a pay as you go sim card and a cellphone that supports portable hotspot also performs well and is also expensive for data transfer. This is currently my backup option to my Cell C modem. Whenever I use it, it’s wicked fast but I can see the dollar signs racking up.
  • The real winner in my opinion is Cell C’s 100 Gig USB pay as you go modem. It’s horribly unreliable but I get 6 megabits per second international bandwidth at times. More below:

Cell C has a package called Giga100 which is R2499 or $270 for 100 gigabytes of transfer which is not limited to off-peak hours. You have to go into a Cell C store and they might not have stock, so call ahead. This option gives you a little white USB modem but you need to know how to use it to get fast speeds. Here’s how:

  • Get a USB extension cable as long as you can get. I use a 5 meter extension. 
  • Put the modem at the end of the extension preferably outside and make sure it isn’t raining.
  • Try to put the modem on a ledge so it’s hanging off with space underneath it for better signal. What also works is hanging it from the top of an umbrella.
  • Another trick that works is putting it into a small metal pot with the lid off. Believe it or not this can boost signal. I think some Russian posted a video proving this a while back on Youtube.
  • Even if your software is telling you you’re getting 5 bars of HSPA signal inside or outside, you’ll still notice a better transfer rate when it’s outside.
  • When connecting, here’s the process: Connect, start transfer, if it’s slow, disconnect and reconnect and start transfer. Repeat until you’re getting a fast transfer speed. Cell C seem to have 3 subnets they allocate IP addresses from. They start with 10.*.*.*, 41.*.*.* and 197.*.*.* and you’ll randomly get assigned an IP address from one of those. Sometimes I’ll connect and an entire subnet will be down. I’ll have no connectivity. So I’ll reconnect and get a different IP address and get wicked fast international transfer. So just keep trying.

It’s 1:20pm on Wednesday and here’s my current transfer rate downloading a movie from iTunes:

Screen Shot 2013-04-03 at 12.52.07 PM

My theory is that Cell C has bought a large international pipe, but their engineers are wildly incompetent and their cellphone network is spotty. The result is that unless you know how to get a kick ass signal and land on a working subnet, you are not going to get a working connection. So the fat pipe that Cell C has is underutilized and those who manage to actually get a working connection enjoy an empty international super-highway.

To summarize: If money is no object, just buy a Vodacom USB modem and pay an extra $20 to $30 in bandwidth charges for every movie you rent from Apple. If you want a deal and don’t mind hacking the system a little and putting in some effort, get a Cell C modem and pay $2.70 per gigabyte with (when it works) a kick ass connection.

Disclaimer: If you do get a Cell C modem and it’s awful, don’t blame me.

 

Time for a Linode downgrade

My credit card number was stolen a few days ago by someone in Palo Alto right after my site was on Hacker News’s home page. I’m going to choose to believe they are unrelated. Interesting though since I don’t live or work in California and this card has never visited there. On the positive side, Visa Signature customer service is worth every penny and 2 new cards arrived on my doorstep in France in 48 hours.

But moving on to the point of this blog entry… it forced me to look at all the recurring fees I’m paying for and either update the card number to my new card or ditch the service.

I discovered my Linode fees had crept up to $115 a month for three servers and one getting backed up. So I ditched the two dev servers and was still paying $60 for a Linode 1536 instance with backup fees.

  • So I deleted log files and brought the disk space down to 12 gigs from 50 gigs.
  • Added more aggressive log rotation to protect from running out of space.
  • Optimized Apache to only have 5 children.
  • Optimized nginx as a reverse proxy so slow clients won’t hog the apache children by setting a shorter proxy timeout.
  • Added mod_status to do real-time checks on how many apache children are busy and what site they’re serving. (This server actually runs 3 sites including skipthepie.org and the website for my sister’s amazing Cape Town restaurant.)
  • Set MaxRequestsPerChild for apache to be 100 to make sure the apache kids don’t grow if there’s a memory leak.

This of course assumes you’re running nginx in front of apache as a reverse proxy, without which you absolutely can’t run a medium traffic website on nothing but thin-air.

Once it was all done, I shrunk the disk down to 20 gigs, rebuilt the server as a 512, got my $36 prorated refund from Linode (thanks guys, very nice policy!!) and I’m now paying $25 a month for hosting instead of $115 (Or saving $1080 per year)

Not exactly rocket science or Earth Shattering, but always nice to keep things lean and mean.

 

The Net will not be bound or gagged

I remember seeing Napster in 2000 when I worked for eToys.com and thinking “This isn’t going away. It has too much momentum and we always move forward.” I was wrong. Today I’m wondering about the free Web and whether it will ever go away. Our intuition tells us we always move forward and things will become better, faster, cheaper and more free. But the brief history of the Net has shown that is not always true.

In 1990 the Internet was completely free. It was an academic network, run by universities with almost no commercial involvement. The Web wasn’t invented yet, Archie, FTP, Gopher, IRC and network news (NNTP) were how we got around. Piracy was of course alive and well in the form of files uuencoded, broken into parts and posted on NNTP servers. If you wanted porn, it was really, really hard work just to reassemble a GIF.

When the Web came along, it was just another app layer protocol, like Archie or Gopher. But hyperlinks and the eventual embedding of images into HTML pages is what made it far better than any other app protocol.

There is nothing that prevents us from creating as many protocols riding on TCP/IP as we would like. Gnutella has spent 10 years showing us that distributed content is feasible. Tor has shown us that online anonymity is there for the taking. The Web is just another app layer protocol. DNS is just a phonebook for IP addresses and the Net survived the first 13 years of its life without it.

If governments ever decide to take control of basic Internet infrastructure like DNS, the Net will simply change form. The way we get content may stop being the Web and it may start being a new democratic protocol that provides client and server anonymity as well as massive redundancy against government or institutional interference.

What we think of as the free and open Web today may become a place like CompuServe used to be. A place you go to access large incumbents like Facebook and Google. Then there will be that other place where only tech geeks and people in academia go to interact freely with the rest of the world. Initially bandwidth may be slow and connections may be few, but soon the new protocol will mature, become easier to use and will gradually become mainstream, sparking a firestorm of innovation in a new environment that allows truly free communication.

DARPA built TCP/IP to survive a nuclear war. It may yet survive a worse attack by its creator.

Footnote: This post was inspired by the South African Government passing the “Protection of State Information” act today. It restricts the press from publishing what the government deems a state secret with penalty of 25 years in jail for violating the law. Many journalists in my birth country will now have to choose between a lengthy jail term and doing what is in the public interest.

Your Vision May be Clouded

I took a lot of crap when I decided to vertically integrate our business four years ago and I invested around $40,000 with Dell to buy our own server cluster. Right then THE CLOUD was the hot new thing, and still is and I was not getting on board. I leased a rack at a respectable Seattle based hosting facility and my wife added the ability to unbox and rack Dell 2950 servers to her long list of talents. The hosting facility team would have done it for us, but we like to get our hands dirty.

That was the most work we did to set up our own server cluster. Four years later we have a 99.9% uptime record and we run a profitable company with an ad network, real-time analytics product and a free virally distributed service off our cluster of 20 machines. When we mail our customers we send over half a million emails in less than 24 hours off our own email server. We serve between 400 and 800 application requests per second all day long.

During the last four years I’ve watched friends and acquaintances get burned by the cloud either due to down time or cost. We pay $3400 per month to host our 20 dedicated machines in a single rack. We have a gigabit connection to the Net and our average bandwidth throughput is around 125 megabits per second constantly.

I’m tired of the Wired Magazine crowd giving me crap for not “being in the cloud” or “getting with the cloud” or whatever. So I’m throwing this down: During the last 4 years I’ve had 99.9% uptime and I’ve spent a total of $190,000 during those 4 years on hosting, which includes the capital investment in the servers. We’ve had a constant throughput of 80 to 120 megabits per second (increasing over time) and roughly 40% avg CPU usage on 20 dual CPU machines (with dedicated Intel E5410 CPU’s each with 4 cores).  As I mentioned we do 400 to 800 app requests per second and we also have an average of 25,000 concurrent connections on our front-end server. I’ll bet anyone who reads this a beer that you won’t find a cloud provider who can do this for you for less than 3X what I’ve paid. [That works out to $3,958 per month.]

If you think having your own dedicated servers in a colocation facility ties you geographically to one place, it doesn’t. I work wherever I want. For 3 out of the last 4 years I was in Seattle. The last year I’ve been in Colorado. I spent 3 weeks in France this month and while I was there I diagnosed a failing drive in one of our servers, ordered the replacement from Dell which will arrive today and be racked by the support team at our hosting facility. We’ve done hardware replacements or upgrades like this many times, including ordering new servers, upgrading memory, upgrading Ubuntu versions and it’s no big deal. A local support person with an anti-static strap and a basic knowledge of linux shell commands can resolve 99% of issues that come up.

I encourage everyone reading this to challenge the marketing hype around THE CLOUD. Go to Dell’s site, get a feel for price/performance, call your local colo provider and get prices on a full rack with a gigabit connection. You will almost certainly be surprised at the bang you’ll get for your buck and how easy it is to manage your own physical machines.

Understand that THE CLOUD exists as a buzzword to help software companies sell more software as a service. It’s sad when software startups who should be using the buzzword to sell more service get taken in by the marketing and outsource their core infrastructure.

A peek into our Space Intelligence Community

I spent the day in a secure area on Buckley Air Force Base called ADF-C or Air Force Data Facility, Colorado. A relative of mine works there and I got an invite to a family day, which I thought was impressive so I thought I’d share some of what I saw.

Walking into ADF-C we had to leave all cellphones, cameras and electronic devices behind and produce two forms of ID to get in. Once inside, there were a wide variety of military personell mixing with civilian contractors. What has surprised me about Buckley on previous visits and again today is the international presence including Canadian and Australian military personell.

The base colonel did an impressive speech on opsec and the importance of the work done on Buckley including the sacrifice families in secure jobs make. “Hi Honey, what did you do at work today?”. “Oh nothing.”. Most families I know, including my own immediate family, talk passionately about our jobs among each other, debate decisions we made, discuss colleagues and work events and so on. Families in secure jobs, including many of my extended family, can never discuss things they work on now or worked on many decades ago. This includes military contractors. Maintaining that discipline is an impressive sacrifice that I don’t think many people appreciate.

Walking into the base, there were many areas we could not access. But they had put together an impressive display for us. The first desk absolutely blew me away. The National Geospatial-Intelligence Agency is based at Buckely. I’ve been using their data for years and recommending it to others and I walked up to the young sergeant behind the desk and literally shook the guys hand and thanked him for the awesome data they make available to the public. Any online business, world-wide, that provides a city or point of interest radius search, uses the NGA’s data and probably dont’ even realize it.

Next up was AGI that makes software to track objects in orbit. The demo they had up was impressive, tracking items in low and medium earth orbits in real-time. The guy was telling me they provide API’s in .NET and Java for developers and as I was listening I looked over my shoulder and totally lost interest because….


The National Security Agency had a booth there. My wife and I immediately headed over and the three people behind the desk were incredibly friendly and forthcoming about their work. But the real treat was that they had a working original enigma encryption machine from WWII. The engima created the strong awareness of the importance of cryptography we have today and it’s one of the main reasons the NSA exists today. Most of the folks behind the desk were mathematicians or worked with, or are married to mathematicians. They have a presence on Buckley and they told us that post 9/11 they diversified beyond Fort Meade (Maryland).

Next up was the National Reconnaissance Office or NRO. These are the guys who actually launch and operate the spy satellites that the NSA and other agencies use. I picked up these cool postcards of a few of the 2010 and 2011 launches they’ve done:

I also chatted to folks from a software division in Lockheed that have designed a 3D walkthrough app that uses real-world photography taken from a reconnaissance aircraft to create a model of an environment. Imagine a Quake walkthrough game of Vegas with actual footage taken at an instant in time of the city. That’s what they had on a demo system. It’s designed to take battlefield intel and provide a walkthrough for folks planning an operation.

We went back to the NSA booth later to play with that enigma some more. It has 3 sets of numbers that are synchronized when two machines are together. Then before a transmission is sent, the sending station will broadcast how much the receiver needs to increment their machine’s numbers by in order to receive the code. The NSA person I spoke to told me that was one of the weaknesses that helped the Polish cryptanalysts (and then Blechley Park) crack the code. That transmission containing the increments always contained no data.

Next up, we took a tour of one of the base radome’s, but on the way I spotted an interesting plaque on a wall in the hallway. It said “Echelon” with a coat of arms and the slogan “Acta Non Verba”. I went back and did a double-take. One the way back I did a triple take. Some amazing history there if you know anything about signals intelligence.

I always thought those Radome’s contained radar systems for local aircraft, considering it’s an Air Force base. But they contain 85 ft diameter satellite dishes that weigh almost 200 tons and rotate at 2 degrees per second when they’re moving. The dome’s are constructed out of a material that seems similar to mylar (main sail material) and are kept at a positive pressure to strengthen them. They can handle winds up to 125mph. If you live in Colorado you’ll know they dot the landscape for hundreds of miles in the Denver and Colorado Springs areas.

I chatted to a bomb tech for way too long about a display they had. Did you know you can fire a rifle into C4 and it won’t detonate? Or the most time consuming explosive to dispose of is sweaty dynamite? My wife chatted to a hostage negotiator. They had a glider and pilot from the civil aviation patrol and we chatted to him for ages about local gliding conditions and riding thermals into Wyoming and back.

They had a cool karate demo at the end of the day – a full contact style I did briefly some time ago called Ken Po. The acrobatics were matrix-like and the base commander broke a pile of 8 bricks and didn’t even flinch when I shook his hand as we were leaving. Cool guy and he seems to be an inspirational leader.

Thanks to all the volunteers at Buckley for spending your Saturday morning letting us civilians peek behind the curtain.

 

Which programming language should I learn?

I’ve been asked this question twice in the last 2 weeks by people wanting to write their first Web application. So I’m going to answer it here for anyone else interested:

If you want to write Web applications you need to learn the following languages: Javascript, PHP, HTML, CSS and SQL. It sounds like a lot, but it really is not. You can learn enough of each of these languages to write a basic Web application within a week. Trust me. It’s easy!

PHP is the guts of your Web application. It is the language that runs on your web server. It is also the only language where you have a choice about learning it or learning another language. You must learn HTML, CSS and Javascript and 99% of web programmers learn SQL to talk to a database. But there are many other languages to choose from that can do the same thing that PHP does.

However, if you are starting out writing web applications, PHP is the first server language you should learn and here is why:

  1. PHP is used by a huge number of websites, both big and small. Most of Facebook is written in PHP. Wikipedia powered by Mediawiki is written in PHP.
  2. WordPress, the worlds most popular open source blog platform is written in PHP. If you know PHP you can change it any way you like or even contribute to the community. WordPress is used by eBay, Yahoo, Digg, The Wall Street Journal, Techcrunch, TMZ, Mashable and of course the whole of WordPress.com is powered by WordPress written in PHP.
  3. Most of the worlds best content management systems are written in PHP.
  4. The PHP community is massive and supportive, unlike the Ruby on Rails community for example.
  5. 99% of web programmers can understand PHP, even though some don’t realize it. (like Perl Developers)
  6. PHP is a mature language which means the bugs have all been ironed out and it runs fast!
  7. If you Google a question you have about PHP, you have a much higher likelihood of finding an answer than any other server programming language.
  8. Don’t learn Perl because even though it’s a mature, fast and popular language, it’s harder to learn than PHP.
  9. Don’t learn Java because Java is better suited to launching spacecraft and running systems that control oil rigs or banking software than Web applications. It is strongly typed which means that you need to write more lines of code to get the same thing done. It’s also harder to learn because it’s a purer object oriented language . It also is owned by Oracle which means it’s a commercial language and that means Oracle will continually be trying to sell you stuff by making things seem harder than they are and claiming they have the answer to the problem they created in your mind.
  10. Don’t learn .NET because it’s also a commercial language and pretty much everything made by Microsoft either will cost you money or will break a lot.
  11. Don’t learn Ruby because the guys who run the community are total a-holes who will insult you for asking beginner questions. Ruby is also way less popular than PHP or Perl even though it’s used to power Twitter. It’s also the reason Twitter is down so often.
  12. Don’t learn Brainf*ck, Cobol, D, Erlang, Fortran, Go, Haskell, Lisp, OCaml, Python or Smalltalk because these are languages that people tell you they know to show off. Some of them have specific advantages like parallelism, being a pure object oriented language or being compact. But they are not for you if you are starting out. In fact, the combination of PHP and Javascript will give you 99.99999% of what all these languages offer.
You also need to learn two presentation languages: HTML and CSS. They are actually part of each other because HTML is not too useful without CSS and vice versa.

HTML tells the browser the structure and content of a page. e.g. Put a form after a paragraph and have one field for email and one for full name.

CSS tells the browser how to make that page look e.g. Which fonts to use, what size they should be, what colors, how wide or tall things on the page should be, how thick borders should be, how much padding to use and how thick to make margins.

Then you also need to learn a data storage language called SQL which lets you talke to a database to store things like visitor names, email addresses and so on. For example, using SQL you can tell a database to store an email address and full name by saying “INSERT INTO visitors (name, email) values (‘Mark Maunder’, ‘mark@example.com’);. There are other ways to store data and a popular terrorist movement calling itself NoSQL has formed in the last 4 years and they spend their time sowing fear and doubt about SQL and confusing beginners like you. The reality is that 99% of web applications use SQL and continue to use SQL. It works, it’s fast, it’s easy to learn and everyone understands it. It’s used by WordPress, Wikipedia, Facebook and everyone else who counts, whether they like it or not. Just learn SQL!! I also recommend you use MySQL to store your web application’s data (even though it’s owned by Oracle) because it’s the most popular open source database out there. PHP applications use MySQL more than any other database engine on the web.

To summarize, so far you need to learn:

  • Javascript (a programming language that runs on inside your visitor’s browser)
  • PHP (a programming language that runs on the server)
  • HTML (a presentation language that tells the browser the structure of a page)
  • CSS (a presentation language that tells the browser how to make a page look once it’ knows the structure)
  • SQL (a data access language that lets you store and retrieve data from a database)
Each of these languages runs or executes in a certain place or environment:
  • Javascript runs inside the browser of someone who has visited your website. What’s cool about this is that it uses your visitor’s CPU and memory instead of the resources on your server.
  • PHP runs on your own web server. Most websites use a kind of “container” or application server to run PHP called Apache Web Server with something called mod_php installed. Apache handles all the web server stuff like receiving the request for the document and making sure it’s formatted correctly. It then passes the request to mod_php which is executing your PHP code. This actually runs your web application written in PHP, your program sends the response back to Apache which sends it back to your visitor.
  • HTML is interpreted by a visitor’s browser and tells the browser how to structure the page as it loads.
  • CSS is also interpreted by a visitor’s browser and tells the browser how to make the HTML look.
  • SQL is a language that you use inside your PHP application to talk to a database like MySQL. You will actually write SQL in your PHP code but it will be sent to the database engine which is where it is interpreted and executed. The database then sends your PHP code whatever it asked for, if anything. (sometimes you’re just inserting data and not asking a question)
As you progress you will get familiar with the platforms you run each of these languages on. They include:
  • Linux is the operating system you will run on your server. Everything else on your server runs on top of Linux. Linux lets your web application talk to the server’s hardware.
  • Apache Web Server running mod_php. This is the application server you will use to run your PHP code. It will receive the web requests, forward them to your PHP code, and receive the response which is forwarded to your visitor.
  • MySQL database engine. You will talk to mysql using SQL which is written inside your PHP code.

One last note to help you in your language decision making. It’s important that you understand there are a few phenomenon that may confuse you in your language research:

The first is that some software developers have little life beyond writing software and have large egos. One of the few things they have to impress you with is their own intelligence. They will try to make programming sound harder than it actually is. It’s not hard. It’s easy.

Secondly, an arrogant programmer may regale you with a list of programming languages to choose from and tell you that he or she knows them all. They may make the choice sound complicated. It’s not. They’re just showing off. Choose PHP and the set of tools listed above and you’ll be fine.

Third, remember that there is always something new and shiny coming out that will get a lot of attention and is advertised to “change the way we…” or will “make everything you know about programming irrelevant”. Ignore the noise and stay focused on the basics. Until a new language, operating system, application or piece of hardware has been around for a while (usually at least 5 years), it’s going to be full of bugs, run slow, break often and it will be hard to get help by Googling because few people are using it and have had the problem you’re having.

Lastly, many companies like Google and Facebook spend a lot of time and energy trying to attract the best software engineers in the world. Google associated themselves with NASA purely for this reason, even though they’re in completely different businesses. To draw attention to themselves as thought leaders in software they talk a lot about languages like Erlang, Haskell and so on. The reality is that their bread and butter languages are pretty ordinary – languages like C++ and PHP. So don’t get confused when you see Facebook talking about using Erlang for real-time chat. They’re just showing off. Their bread and butter is PHP, HTML, CSS, SQL and Javascript, like most of the rest of the Web.

Who am I and how dare I express an opinion on this? I’ve been programming web applications since 2 years after the Web was invented. I’m the CEO and CTO of a company who’s web apps are seen by over 200 million unique people every month. I also own the company.  I’ve seen languages and platforms come and go including Netscape Commerce Server, Java Applets, Visual Basic, XML, NetWare, Windows NT, Microsoft IIS, thin clients, network computers, etc.

The Web is Simple. Programming is Easy. Now go have fun!!

 

MI6 to Rest of World: Cyber War is On. Anyone, Anywhere is Fair Game. Arm yourselves.

This incredibly disturbing story was posted on Hacker News 26 minutes ago.

Summary: The London Daily Telegraph (via TheAge.com.au) is reporting that British Intelligence agents from MI6 and GCHQ hacked into an AlQueda online magazine and removed instructions for making a pipe bomb. They replaced the article with a cupcake recipe. A Pentagon operation was blocked by the CIA because the website was seen as an important source of intelligence. Furthermore, both British and US intelligence have developed “a variety of cyber-weapons such as computer viruses, to use against enemy states and terrorists”.

There is no reporting on where the servers of the magazine are based, who owns the lease on them (a US or British citizen?) and under what jurisdiction these attacks were made.

The message this attack sends to the rest of the world is “Cyber war is on. Anyone, anywhere is fair game. Arm yourselves.”.

As an Internet entrepreneur this is incredibly disturbing because it makes it OK for any government agency to target our servers and the tone of the article suggests moral impunity for government agencies engaging in these attacks. If it’s OK for British intelligence to hack (most likely) US based servers then it’s OK for Chinese officials to attack an ad network based in the USA if they run an ad for a dissident website.

At first glance this looks like a cute prank. But this attack may spark the beginning of a global cyber war fought by government agencies and private contractors, the logical conclusion of which is an Iron Curtain descending on what was once an open and peaceful communication medium.

Bandwidth providers: Please follow Google’s lead in helping startups, the environment and yourselves

There’s a post on Hacker News today pointing to a few open source javascript libraries that Google is hosting on their content distribution network. ScriptSrc.net has a great UI that gives you an easy way to link to the libs from your web pages. Developers and companies can link to these scripts from their own websites and gain the following benefits:

  • Your visitor may have already cached the script on another website so your page will load faster
  • The script is hosted on a different domain which allows your browser to create more concurrent connections while fetching your content – another speed increase.
  • It saves you the bandwidth of having to serve that content up yourself which can result in massive cost savings if you’re a high traffic site.
  • Just like your visitor already cached the content, their workstation or local DNS server may also have the CDN’s IP address cached which further speeds load time.

While providing a service like this does cost Google or the providing company more in hosting, it provides an overall efficiency gain. Less bandwidth and CPU is used on the Web as a whole by Google providing this service. That means less cooling is required in data centers, less networking hardware needs to be manufactured to support the traffic on the web and so on.

The environment benefits as a whole by Google or another large provider hosting these frequently loaded scripts for us.

The savings are passed on to lone developers and startups who are using the scripts. For smaller companies who are trying to minimize costs while dealing with massive growth this can result in a huge cost savings that helps them to continue to innovate.

The savings are also passed on to bandwidth providers like NTT, AT&T, Comcast, Time Warner, Qwest and other bandwidth providers who’s customers consume less bandwidth as a result.

So my suggestion is that Google and bandwidth providers collaborate to come up with a package of the most used open source components online and keep the list up to date. Then provide local mirrors of each of these packages with a fallback mechanism if the package isn’t available. Google should define an IP address similar to their easy to remember DNS ip address 8.8.8.8 that hosts these scripts. Participating ISP’s route traffic destined for that IP address to a local mirror using a system similar to IP Anycast. An alternative URL is provided via a query string. e.g.

http://9.9.9.9/js/prototype.1.5.0.js?fallback=http://mysite.com/myjs/myprototype.1.5.0.js

If the local ISP isn’t participating the request is simply routed to Google’s 9.9.9.9 server as per normal.

If the local ISP (or Google) doesn’t have a copy of the script in their mirror it just returns a 302 redirect to the fallback URL which the webmaster has provided and which usually points to the webmaster’s own site. A mechanism for multiple fallbacks can easily be created e.g. fallback1, fallback2, etc.

Common scripts, icon libraries and flash components can be hosted this way. There may even be scenarios where a company (like Google) is used by such a large percentage of the Net population that it makes sense to put them on the 9.9.9.9 mirror system so that local bandwidth providers can serve up commonly used components rather than have to fetch them from via their upstream providers. Google’s logo for example.

The Coming Social Advertising Revolution

Facebook has over 400 million active users and members spend over 951 man-years on the site each month. Facebook is passing Google this year as the most visited site in the US and is going to earn somewhere between $710M and $1.1B in revenue this year.

Google on the other hand have a $27B revenue run rate for 2010 [based on Q1 2010 earnings]. With similar on-site traffic they are doing 25 times Facebook’s revenue. Google have had a long time to learn about printing money efficiently, but even so that’s a blush-worthy statistic for the Facebook executive team. So why the difference in performance?

Facebook has a crisis of intent. When a visitor signs in to Facebook their intent is to socialize. They don’t want to buy anything and they certainly don’t want to click on ads that lead them to buying something. Facebook has the best data on the web about the people using their service. But all that wonderful data is useless without intent.

When a visitor hits Google their intent is to see something, learn something, do something etc and these can be cajoled into buying decisions. If Google guides the user to the right vendor, they make a vendor money and can share in some of the revenue. Google’s data on each visitor pales in comparison to Facebook. But Google catches each visitor at the moment they have intent. And that is the power of the search business.

Facebook needs to solve their crisis of intent. Intent is the missing ingredient that stands between Facebook and $27 Billion in revenue multiplied by the social graph and profile data that Google doesn’t have.

Changing Facebook.com to capture visitor attention when they have buying intent risks destroying a valuable asset. So instead Facebook have decided to take their data to the places where visitors have intent: The rest of the web.

“If intent won’t come to Facebook, we’ll take Facebook to intent.” ~Mark Zuckerburg [may have said this]

In the next 3 to 12 months Facebook are going to roll out their own ad network for publishers – a direct competitor to Google AdSense.

If Facebook can use my interests, sex, age, location, who I’m friends with and their age, location, interests etc. to infer that when I’m searching for a ‘bobbin’ it’s probably because I want to tie steelhead flies with it, then it makes more sense for every publisher on the web to use Facebook’s ad network than Google or anyone else because they will simply make more money.

Facebook’s Ad Network will make publishers more money and increase engagement.

Facebook Connect was phase 1: “Lets see if a distributed Facebook gets traction and doesn’t raise privacy flags.” It was a resounding success.

The Social Web and Open Graph is phase 2: “Lets see if we can share some user data using an opt-out model.” From the Facebook blog: “For example, now if you’re logged into Facebook and go to Pandora for the first time, it can immediately start playing songs from bands you’ve liked across the web.”

There have been the usual privacy rumblings, but so far the Facebook community seems to be OK with an opt-out model of distributed data sharing.

The significance of this is staggering: Facebook have positioned themselves for the perfect AdSense kill-shot. 6 to 12 months from now publishers will  be able to integrate Facebook’s applications and ad network on their blog or website and get:

  • Better revenue than Google AdSense or any other ad network due to better targeting
  • Increased user engagement through social features
  • Increased virality through recruiting other Facebook members
  • Increased data on each visitor from their very first pageview reducing bounce.

Advertisers will get:

  • Less click fraud because you’re no longer just an IP address and a cookie.
  • Better targeting including the holy grail of demographics: Age, Sex, Location.
  • Ability to show your ad at the moment a user has buying intent on a search engine, a blog about visiting Egypt, etc.

A significant portion of Google’s $27 Billion in revenue this year will come from their publisher ad network. Google knows what’s at stake. That is why they are willing to bet GMail on products like Google Buzz.

Facebook is the most serious threat to Google’s business that they have faced. If Facebook plays this perfectly, they will kill the bear and 5 to 10 years from now will be the largest and most profitable ad network on Earth.

Anyone who plans to compete with them will have to do better than textual ad targeting.

How to limit website visitor bandwidth by country

This technique is great if you have no customers from countryX but are being targeted by a DoS, unwanted crawlers, bots, scrapers and other baddies. Please don’t use this to discriminate against less profitable countries. The web should be open for all. Thanks.

If you’re not already using Nginx, you should get it even if you already have a great web server. Put it in front and get it to act as a reverse proxy.

First grab this perl script which you will use to convert Maxmind’s geo IP database into a format usable by Nginx.

Then download Maxmind’s latest GeoLite country database in CSV format on this page.

Then run:

geo2nginx.pl < maxmind.csv > nginxGeo.txt

Copy nginxGeo.txt into your nginx config directory.

Then add the following text in the ‘http’ section of your nginx.conf file:

geo $country {
default no;
include nginxGeo.txt;
}

Then add the following in the ‘server’ section of your nginx.conf file:

if ($country ~ ^(?:US|CA|ES)$ ){
set $limit_rate 10k;
}
if ($country ~ ^(?:BR|ZA)$ ){
set $limit_rate 20k;
}

This limits anyone from the USA, Canada and Spain to a maximum of 10 kilobits per second of bandwidth. It gives anyone from Brazil and South Africa 20 Kbps of bandwidth. Every other country gets the maximum.

You could use a exclamation character before the tilde (!~) to do the opposite. In other words, if you’re NOT from US, Canada or Spain, you get 10 Kbps, although I strongly advise against this policy.

Remember that $limit_rate only limits per connection, so the amount of bandwidth each visitor has is $limit_rate X number_of_connections. See below to limit connections.

Another interesting variable is limit_rate_after. The documentation on this is very very sparse, but from what I’ve gathered it is time based. So the first 1 minute of a connection will get full bandwidth, and then after that the limiting starts. Great for streaming sites I would think.

There are two other great modules in Nginx but neither of them work inside ‘if’ directives which means you can’t use them to limit by country. They are the Limit Zone module which lets you limit the number of concurrent connections and the Limit Requests module which lets you limit the number of requests over a period of time. The Limit Requests module also has a burst variable which is very useful. Once again the documentation is sparse, but this comment from Igor (Nginx author) sheds some light on how bursting works.

I’ve enabled all three features on our site. Bandwidth limiting by country, limiting concurrent connections and limiting requests over a time period. I serve around 20 to 40 million requests a day on a single nginx box and I haven’t noticed much performance degradation with the new config. It has quadrupled the size of each nginx process though to about 46M per process, but that’s still a lot smaller than most web server processes.