Author: mark

  • Can WordPress Developers survive without InnoDB? (with MyISAM vs InnoDB benchmarks)

    Update: Thanks Matt for the mention and Joseph for the excellent point in the comments that WordPress in fact uses whatever MySQL’s default table handler is, and from 5.5 onwards, that’s InnoDB – and for his comments on InnoDB durability.

    My development energy has been focused on WordPress.org a lot during the past few months for the simple reason that I love the publishing platform and it’s hard to resist getting my grubby paws stuck into the awesome plugin API.

    To my horror I discovered that WordPress installs using the MyISAM table engine. [Update: via Joseph Scott on the Automattic Akismet team: WordPress actually specifies no table type in the create statements, so it uses MySQL’s default table engine which is InnoDB from version 5.5 onwards. See the rest of his comment below.] I absolutely loathe MyISAM because it has burned me badly in the past when table locking killed performance in very high traffic apps I’ve built. Converting to InnoDB saved the day and so I have a lot of love for the InnoDB storage engine.

    Many WordPress hosts like Hostgator don’t support anything but the default MyISAM table type that WordPress uses and they’ve made it clear they don’t plan to change.

    While WordPress is a mostly read-only application that doesn’t suffer too badly with read/write locking issues, the plugin I’m developing will be doing a fair amount of writing, so the prospect of being stuck with MyISAM is a little horrifying.

    So I set out to figure out exactly how bad my situation is being stuck with MyISAM.

    I created a little PHP script (hey, it’s WordPress so gotta go with PHP) to benchmark MySQL with both table types. Here’s what the script does:

    • Create a table using the MyISAM or InnoDB table type (depending on what we’re benching)
    • The table has two integer columns (one is a primary key) and a 255 character varchar.
    • Insert 10,000 records with randomly generated strings of 255 chars in length.
    • Fork X number of processes (I start with one and gradually increase to 56 processes)
    • Each process has a loop that iterates a number of times so that we end up with 1000 iterations evenly spaced across all processes.
    • In each iteration we do an “insert IGNORE” that may or may not insert a record, a “select *” that selects a random record that may or may not exist, and then a “delete from table order by id asc limit 3” and every second delete orders “desc” instead.
    • Because the delete is the most resource intensive operation I decided to bench it without the delete too.

    The goal is not to figure out if WordPress needs InnoDB. The answer is that it doesn’t. The goal is to figure out if plugin developers like me should be frustrated or worried that we don’t have access to InnoDB on many WordPress hosting provider platforms.

     

    NOTE: A lower graph is better in the benchmarks below.

     

    MyISAM vs InnoDB doing Insert IGNORE and Selects up to 56 processes

    The X axis shows number of threads and the Y axis shows the total time for each thread to complete it’s share of the loop iterations. Each loop does an “insert ignore” and a “select” from the primary key id.

    The benchmark below is for a typical write intensive application where multiple threads (up to 56) are selecting and inserting into the same table. I was expecting InnoDB to murder MyISAM with this benchmark, but as you can see they are extremely close (look at the Y axis on the left) and are both very fast. Not only that but they are both very stable as concurrency increases.

     

     

    MyISAM vs InnoDB doing Insert IGNORE, Selects and delete with order by

    The X axis shows number of threads and the Y axis shows the total time for each thread to complete it’s share of the loop iterations. Each loop does an “insert ignore”, a “select” from the primary key id AND a “delete from table order by id desc limit 3 (or asc every second iteration).

    This is a very intensive test in terms of locking because you have both the write operation of the insert combined with a small ordered range of records being deleted each iteration. I was expecting MyISAM to basically stall as threads increased. Instead I saw the strangest thing…..


    Rather than MyISAM stalling and InnoDB getting better under a highly concurrent load, InnoDB gave me two spikes as concurrency increased. So I did the benchmark again because I thought maybe a cron job fired on my test machine…..

    While doing the second test I kept a close eye on memory and the machine had plenty to spare. The only explanation I can come up with is that InnoDB did a buffer flush or log flush at the same point in each benchmark which killed performance.

     

    Conclusions

    Firstly, I’m blown away by the performance and level of concurrency that MyISAM delivers under heavy writes. It may be benefitting from concurrent inserts, but even so I would have expected it to get killed with my “delete order by” query.

    I don’t think the test I threw at InnoDB gives a complete picture of how amazing the InnoDB storage engine actually is. I use it in extremely large scale and highly concurrent environments and I use features like clustered indexes and cascading deletes via relational constraints and the performance and reliability is spectacular. But as a basic “does MyISAM completely suck vs InnoDB?” test I think this is a useful comparison, if somewhat anecdotal.

    Resources and Footnotes

    You can find my benchmark script here. Rename it to a .php extension once you download it.

    I’m running MySQL Server version: 5.1.49-1ubuntu8.1 (Ubuntu)

    I’m running PHP:
    PHP 5.3.3-1ubuntu9.5 with Suhosin-Patch
    Zend Engine v2.3.0

    Both InnoDB and MyISAM engines were tuned using the following parameters:

    InnoDB:
    innodb_flush_log_at_trx_commit = 0
    innodb_buffer_pool_size = 256M
    innodb_additional_mem_pool_size = 20M
    innodb_log_buffer_size = 8M
    innodb_max_dirty_pages_pct = 90
    innodb_thread_concurrency = 4
    innodb_commit_concurrency = 4
    innodb_flush_method = O_DIRECT

    MyISAM:
    key_buffer = 100M
    query_cache_limit = 1M
    query_cache_size = 16M

    Binary logging was disabled.
    The Query log was disabled.

    The machine I used is a stock Linode 512 instance with 512 megs of memory.
    The virtual CPU shows up as four:
    Intel(R) Xeon(R) CPU L5520 @ 2.27GHz
    with bogomips : 4533.49

    I’m running Ubuntu 10.10 Maverick Meerkat
    I’m running: Linux dev 2.6.32.16-linode28 #1 SMP

  • Running form

    I’ve starting taking my running a bit more seriously this year, feeling the need for speed, so I’ve been looking at running form. My two favorite videos so far:

    My favorite video – Ryan Hall in super slow motion with pretty much god-like form at the Boston 2010 Marathon. This video has caused me to completely adjust my form. Initially I’m focusing on landing without heel strike i.e. on a more flat foot and more forward lean with more pronounced kick as my foot leaves the ground. My shins, achilles and calves aren’t thanking me for the change, but they’re getting used to it quickly. The next goal is to focus on higher butt kicks which makes my leg a more efficient lever arm on the forward swing.

    And Robert Cheriuyot who won the 2010 Boston (with an unfortunate slip on the finish line that resulted in a concussion) also showing perfect form.

  • MI6 to Rest of World: Cyber War is On. Anyone, Anywhere is Fair Game. Arm yourselves.

    This incredibly disturbing story was posted on Hacker News 26 minutes ago.

    Summary: The London Daily Telegraph (via TheAge.com.au) is reporting that British Intelligence agents from MI6 and GCHQ hacked into an AlQueda online magazine and removed instructions for making a pipe bomb. They replaced the article with a cupcake recipe. A Pentagon operation was blocked by the CIA because the website was seen as an important source of intelligence. Furthermore, both British and US intelligence have developed “a variety of cyber-weapons such as computer viruses, to use against enemy states and terrorists”.

    There is no reporting on where the servers of the magazine are based, who owns the lease on them (a US or British citizen?) and under what jurisdiction these attacks were made.

    The message this attack sends to the rest of the world is “Cyber war is on. Anyone, anywhere is fair game. Arm yourselves.”.

    As an Internet entrepreneur this is incredibly disturbing because it makes it OK for any government agency to target our servers and the tone of the article suggests moral impunity for government agencies engaging in these attacks. If it’s OK for British intelligence to hack (most likely) US based servers then it’s OK for Chinese officials to attack an ad network based in the USA if they run an ad for a dissident website.

    At first glance this looks like a cute prank. But this attack may spark the beginning of a global cyber war fought by government agencies and private contractors, the logical conclusion of which is an Iron Curtain descending on what was once an open and peaceful communication medium.

  • Money Doesn't Talk

    Money talks. Or, in this case it doesn’t.

    Have you noticed that the vast majority of published ideas will not increase your business or personal revenue? If someone has a truly great idea for increasing earnings or creating new revenue  out of thin air, they will implement or trade it themselves and will never share.

    At the point a great (tech sector) business concept is shared, it enters the highly efficient ideas market that is the Tech Echo Chamber (HN, Reddit, Slashdot, TC, etc..)  – which efficiently propagates it out to the rest of the world’s population of innovators. At this point the idea is undifferentiated, rapidly being implemented by all, and you’re in a price or other kind of efficiency war.

    This, combined with the truism that it’s not a bad idea to completely ignore your competitors and focus on your customers, makes it a pretty darn good idea to avoid spending too much time on tech publications and social media outlets. You will learn nothing new and what you will learn loses much of its value the moment it’s published. The temptation to imitate will probably harm your business as you’re bounced along in the current of swarming incompetents.

    The main (possibly only) thing I use blogs and social media reporting on tech news for is to keep track of landscape changes. Changes in the economics of a sector or changes in technology. Either of these almost always signal the start of a firestorm of innovation.

    Focus on your customers, find the truly brilliant ideas that solve customer problems and beware of sharing them too early.

    Footnote: The concept I’m describing relates to Efficient Market Hypothesis and Information Asymmetry if you’d like to read more.

     

  • BitCoin, Chastened

    The wannabe economist in me has been following the BitCoin phenomenon with great interest during the last few months. The algorithmic side of bitcoin is fascinating, but a few things bugged me about the system. One of them was that the maximum number of bitcoins that can ever exist is limited to 21 million.

    Most of the coverage on bitcoin has been bubbly-positive even though it’s not certain you can reliably convert bitcoins into real currency.

    Adam Cohen took a wonderfully lucid stab at bitcoin on Quora recently, focusing on the built in deflation that is a result of the hard limit on the number of coins that can exist. He makes the point that early adopters holding bitcoins will automatically get richer and it smacks of a scam.

    While scam is clearly not the intention of the creators, deflation is any economists worst nightmare and built-in deflation will probably result in bitcoin being stillborn.

  • HN is about to overtake Slashdot.

    The Alexa graph above only shows the ycombinator.com domain, but most of the traffic to the domain is HN. Also HN is visible on several other domains like hackerne.ws, which isn’t counted in the above graph, so it’s probably already passed Slashdot.

    Footnote: This post was something more political re comment scores. But I decided I don’t have the stomach for that fight, so editing and leaving just the data.

     

  • What would a self-launching Space Shuttle look like?

    This is the OK-GLI, part of the Soviet Shuttle Buran program, the largest and most expensive space program in the history of the Soviet Union.

    The OK-GLI completed 25 test flights between 1985 and 1988 before being retired. The OK-GLI was powered by four AL-31 jet engines with the fuel tank in the cargo bay. The highest altitude it achieved was 6000 meters or 19,000 ft. It never reached space.

    A sister ship in the Buran program, the Buran Spacecraft did reach orbit and completed two unmanned Earth orbits. It was the only orbital flight in the Buran program.

  • What I love about the HN community

    The smartest most helpful people hang out there. Thanks for the awesome theme Lucian!

     

     

     

     

  • Insulin may be a steroid masquerading as a hormone.

    Insulin
    Computer-generated image of six insulin molecules assembled in a hexamer.

    At the 1998 Winter Olympic Games in Nagano, a Russian medical officer asked the Olympic Committee whether the use of insulin was restricted to athletes who are insulin dependent diabetics. The incident drew attention to insulin and the IOC were swift to ban it as a performance enhancing drug.

    I recently posted a question on Quora asking what the best nutrition book is, and Nutrient Timing by Ivy and Portman came up. The book is excellent and has a huge amount of physiology data on how the human body makes and uses energy. The core concept is this:

    In the 45 minutes post exercise, your body has very high insulin sensitivity. This period is referred to as the Anabolic phase. By consuming a drink of protein and carbs in a 1:3 or 1:4 ratio, you can significantly boost your insulin level during this period. You can also prolong this period and increase recovery and growth by continuing to consume said drink 2 hours and again at 4 hours post exercise.

    Boosting your insulin levels post exercise reduces protein loss from muscles and improves protein retention. It also speeds recovery by replenishing glycogen and creatine stores. Ivy and Portman spend much of the book citing supporting research from many studies including the Marine Corps.

    The book also recommends taking an anti-oxidant post exercise to reduce muscle oxidation.

    According to Ivy and Portman and many other nutritionists, the best source of protein is Whey Protein Isolate (links to the one I bought recently) which is rich in branched chain amino acids (BCAA’s). The best source of carbs in their recipe is good old Sucrose (table sugar).

    Dara Torres
    Dara Torres at the 2008 summer Olympics.

    I was chatting to my wife about the book and she mentioned that Dara Torres (three silvers in the previous summer Olympics and the oldest swimmer to ever be on the US Olympic team) drinks chocolate milk as her favorite recovery drink. Chocolate is rich in anti-oxidants, it contains sucrose and milk has some protein, but not enough to make the ratio 4:1. (The sucrose to protein ratio is probably more like 16:1). So I’m guessing that Dara adds a source of protein like whey protein isolate to the drink.

    I’m training this year for either a half or full ironman next year and doing a half and full marathon this year to build up to it. I’m currently doing two 5 mile runs and one long run (currently 10 miles) each week. I also swim 2000m two to three times a week and I do the occasional core strength workout. As I built up to my current volume my energy level collapsed – both mental and physical. Once I started looking at my nutrition and using a post workout recovery nutrition plan I came back with a vengance. Two weeks after starting the plan I ran the fastest 5 mile pace I’ve ever run and felt great afterwards.

    After doing further reading online I’ve modified my recipe to have a 1:1 ratio of protein to carbohydrates post workout. A 3:1 or 4:1 ratio seems to build a lot of muscle and my goal is to stay lean but recover fast.

    My current post workout nutrition plan is:

    • 1 whole raw egg, 56 grams whey protein (two scoops), two tablespoons of molasses (high in phosphorus), a tablespoon of brown sugar, two cups of skim milk, a heaped spoon of cocoa powder. Blend and drink two thirds.
    • Drink the remaining third 1.5 to two hours after workout.

     

  • Where's the Disruption from the Change in Startup Economics?

    It’s been a year long break from blogging and getting back to writing and getting a so many new visitors this soon is cool. [Thanks HN!]

     

    This blog runs on the smallest available Linode 512 instance for $20/month. It runs several sites including family blogs and hobby sites. I run nginx on the front end and reverse proxy to 5 Apache children which saves me having to run roughly 100 Apache children to handle the brief spikes of around 20 hits per second I saw yesterday.

     

    Technologies like event-servers (Nginx, node.js, etc) and cheap and reliable virtualization may seem like old hat, but in 2005 Linode was charging $40/month for a 128Meg instance (it’s now $20/month for 512Megs, 88% cheaper) and Nginx was only going to hit main-stream use two years later. In fact Nginx only hit version 1.0 last month.

    Five years ago many companies or bloggers would have used a physical box with 3.5 Gigabytes of memory to handle 100 apache instances and the database for this kind of traffic. About $300/month based on current pricing for physical dedicated servers from ServerBeach which hasn’t changed much since 2005.

    With the move from hardware and multiprocess servers to virtualization and event-servers, hosting costs have dropped to 6% of what they were 5 years ago. A drop of 94% in a variable cost for any sector changes the economics in a way that usually causes disruption and innovation.

    So where is the disruption and innovation happening now that anyone can afford a million-hits-a-month server?

     

    Footnotes: An unstable version of Nginx was available in 2005/2006 and Lighttpd was also an alternative back then for reverse proxying. But it was for hardcore hackers who didn’t mind relatively unstable and bleeding-edge configurations. Mainstream configuration in 2005 was running big memory servers on dedicated machines with a huge number of Apache children. Sadly, much of the web is still run this way. I shudder to think of the environmental impact of all those front-end web boxes. I also don’t address the subject of Keep-Alive on Apache. Disabling Keep-Alive is a way to get a lot more bang for your hardware (specifically you need less memory because you run less apache children) while sacrificing some browser performance. The norm in 2005 was to leave keepalive enabled, but set to a short timeout. With Keepalive set to 15 seconds, my estimate of 100 apache instances for 20 hits per second is probably way too optimistic. With Keep-Alive disabled you would barely handle 20 requests per second with 100 children when taking into account latency per request for slower connections. Bandwidth cost is also a consideration, but gzip and running compressed code, using CDN versions of libs like jQuery that someone else hosts and running a stripped down site with few images helps. [Think Craigslist] With a page size of 100K, Linode’s 400GB bandwidth allowance gives you 4,194,304 pageviews.