Category: Code

  • Saving server costs with Javascript using distributed processing

    I run two consumer web businesses. LineBuzz.com and Geojoey.com. Both have more than 50% of the app impelemented in Javascript and execute in the browser environment.

    Something that occurred to me a while ago is that, because most of the execution happens inside the browser and uses our visitors CPU and memory, I don’t have to worry about my servers having to provide that CPU and memory.

    I found myself moving processing to the client side where possible.

    [Don’t worry, we torture our QA guru with a slow machine on purpose so she will catch any browser slowness we cause]

    One down side is that everyone can see the Javascript source code – although it’s compressed which makes it a little harder to reverse engineer. Usually the most CPU intensive code is also the most interesting.

    Another disadvantage is that I’m using a bit more bandwidth. But if the app is not shoveling vasts amount of data to do its processing and if I’m OK with exposing parts of my source to competitors, then these issues go away.

    Moving execution to the client side opens up some interesting opportunities for distributed processing.

    Lets say you have 1 million page views a day on your home page. That’s 365 Million views per year. Lets say each user spends an average of 1 minute on your page because they’re reading something interesting.

    So that’s 365 million minutes of processing time you have available per year.

    Converted to years, that’s 694 server years. One server working for 694 years or 694 servers working for 1 year.

    But lets halve it because we haven’t taken into account load times or the fact that javascript is slower than other languages. So we have 347 server years.

    Or put another way, it’s like having 347 additional servers per year.

    The cheapest server at ServerBeach.com costs $75 per month or $900 per year. [It’s a 1.7Ghz Celeron with 512Megs RAM – we’re working on minimums here!]

    So that translates 347 servers per year into $312,300 per year.

    My method isn’t very scientific – and if you go around slowing down peoples machines, you’re not going to have 1 million page views per day for very long. But it gives you a general indication of how much money you can save if you can move parts of a CPU intensive web application to the client side.

    So going beyond saving server costs, it’s possible for a high traffic website to do something similar to SETI@HOME and essentially turn the millions of workstations that spend a few minutes on the site each day into a giant distributed processing beowulf cluster using little old Javascript.

  • Lessons from three weeks of intensive I18N

    When we launched LineBuzz on May 10, we had no idea that most of our press coverage was going to be Japanese. A site called 100Shiki.com put us up as dot-com of the day. All of a sudden we had lots of Japanese users. A few days later, a very popular blogger in China gave us a mention and we had lots of Chinese users too. Within a week we had over 15 languages on the site.

    Three intense weeks later we launched an I18N version of the site.

    Here’s a brief summary of some of the key issues we had to deal with when i18n’ing an app that has 50/50 client-server code and lots of communication between the two.

    The code that is LineBuzz is very text intensive by the nature of the application. We provide inline comments without a browser plugin. One of the unique things about LineBuzz is that it doesn’t matter which page you post an inline comment on. The comment will appear anywhere on the website where the text and its surrounding paragraph appears.

    So as you can imagine, we use a lot of regular expressions, character code conversions and text lengths.

    Safari – not the worlds best browser

    The first thing that broke was Safari. Safari’s regex engine in Javascript is seriously busted. It doesn’t support unicode characters at all. IIRC it simply returns true for any regex with unicode. So their claim that it’s the worlds best browser really irks me. So I had to write a fix-safari layer for anything that involved processing unicode.

    No round-trip for jp charsets

    The next thing that bit me was Japanese character set support. The Japanese use two main character sets: EUC-JP and Shift_JIS. The latter is a product of windows and the former is from unix. These both caused a major headache because they don’t round-trip convert to Unicode. Translated, that means that you can’t convert these characters to a unicode character set like UTF-8 and then convert them back to their native character set and expect the original to equal the converted characters. The solution: Store the raw character data for all character sets as binary and only convert to unicode if I absolutely must. I use UTF-8 on linebuzz.com, so that’s a scenario where I convert from binary to UTF-8.

    When is a space not a space

    Another thing that bit me was space character codes and spaces in regex. In unicode there are about 20 different space characters. Some regex engines are smart and recognize them all. Others only recognize the traditional ascii space character. So routines that for example, removed spaces, had to be hand tailored to deal with every unicode space.

    String.charCodeAt() == lies lies lies!!

    Character codes differ across operating systems. Some character sets contain characters that have a different character code on OS X than on Unix. Yes, even in the same browser using the same javascript engine (firefox for example), the character codes are different. So any routines that rely on consistent character codes across platforms have to deal with this little nightmare.

    All this is behind us now and the Linebuzz code handles any character set in any language beautifully.