Code


Code01 Aug 2007 03:21 pm

I couldn’t find any docs on compiling mod_perl2 alongside php5 with apache 2.2, so hopefully this helps someone.

I’ve always statically compiled mod_perl into apache, but the easiest way to get mod_perl to play nice with PHP under apache2 is to compile them as DSO’s or dynamic modules that are inserted at runtime. I’ve tested this under Ubuntu 7 and CentOS 5.

At the time of this writing the server hosting this page is running with this config and handles a not-insignificant amount of traffic.

NOTE: I use a worker MPM with Apache to get the best possible performance. The worker MPM is a hybrid thread/process model. It requires that PHP be threadsafe when compiled.

Here are the commands I use. I’m assuming you’ve downloaded the latest apache httpd 2.2 source code, php’s source code, mod_perl’s source code and libapreq2’s source code. I’m assuming you’re smart enough to know when to CD to the directory of each app to compile and install that app, so I’ve left out basic steps like that.

First compile apache with DSO support. Enable the worker MPM, enable mod_rewrite, enable mod_expires, and add a little magic to make libapreq work:

./configure –prefix=/usr/local/apache2 –with-mpm=worker –enable-so –enable-rewrite –enable-expires –with-included-apr

make

make install

Now that apache is installed, compile and install a thread-safe PHP DSO . Note the enable-maintainer-zts compiles a threadsafe PHP. I’ve also added mysql support.

./configure –with-mysql –enable-maintainer-zts –with-apxs2=/usr/local/apache2/bin/apxs

make

make install

Now you compile and install a mod_perl DSO.

perl Makefile.PL –with-apache2-apxs=/usr/local/apache2/bin/apxs

make

make install

Next you compile and install libapreq as a DSO

perl Makefile.PL –with-apache2-apxs=/usr/local/apache2/bin/apxs

make

make install

Make sure your httpd.conf contains the following to enable mod_perl, php and libapreq:

LoadModule apreq_module /usr/local/apache2/modules/mod_apreq2.so
LoadModule perl_module modules/mod_perl.so
LoadModule php5_module modules/libphp5.so
DirectoryIndex index.html index.htm index.php
AddType application/x-httpd-php .php
AddType application/x-httpd-php-source .phps

Startups and Code23 Jul 2007 09:40 am

Starting a software business? Looking for a software engineering process? You can spend a month getting your head around one of these:

Agile software development
Crystal Clear
Extreme programming
Lean software development
ISO 12207
Rational Unified Process
CMM
ISO 15504

Or.. 2 seconds learning the Nike method:

Code20 Jul 2007 09:37 pm

A while back, Jobster CTO Phil Bogle blogged about some of the tricks I’ve used to do fast location queries in SQL. The link to my SQL query to generate the zip lookup table for radius searches is now dead (a cybersquatter stole my domain name and I don’t want to discuss it!). So here’s the original blog post:

If you need to build a radius search for something on your website, then creating a zip code distance lookup table performs much better than calculating distance for every search. This monster chunk of SQL will create that table for you. It takes about 12 hours to run on a fast machine.

insert into zip_dist2 (fromzip, tozip, dist) select z1.zip, z2.zip, ROUND((3956 * 2 * atan2(sqrt((POW(sin(((z1.lat * (atan2(1,1) * 4) / 180) - (z2.lat * (atan2(1,1) * 4) / 180))/2.0),2) + cos((z2.lat * (atan2(1,1) * 4) / 180)) * cos((z1.lat * (atan2(1,1) * 4) / 180)) * POW(sin(((z1.lon * (atan2(1,1) * 4) / 180) - (z2.lon * (atan2(1,1) * 4) / 180))/2.0),2))),sqrt(1-(POW(sin(((z1.lat * (atan2(1,1) * 4) / 180) - (z2.lat * (atan2(1,1) * 4) / 180))/2.0),2) + cos((z2.lat * (atan2(1,1) * 4) / 180)) * cos((z1.lat * (atan2(1,1) * 4) / 180)) * POW(sin(((z1.lon * (atan2(1,1) * 4) / 180) - (z2.lon * (atan2(1,1) * 4) / 180))/2.0),2))))),2) as distance from zip_data as z1, zip_data as z2 where (3956 * 2 * atan2(sqrt((POW(sin(((z1.lat * (atan2(1,1) * 4) / 180) - (z2.lat * (atan2(1,1) * 4) / 180))/2.0),2) + cos((z2.lat * (atan2(1,1) * 4) / 180)) * cos((z1.lat * (atan2(1,1) * 4) / 180)) * POW(sin(((z1.lon * (atan2(1,1) * 4) / 180) - (z2.lon * (atan2(1,1) * 4) / 180))/2.0),2))),sqrt(1-(POW(sin(((z1.lat * (atan2(1,1) * 4) / 180) - (z2.lat * (atan2(1,1) * 4) / 180))/2.0),2) + cos((z2.lat * (atan2(1,1) * 4) / 180)) * cos((z1.lat * (atan2(1,1) * 4) / 180)) * POW(sin(((z1.lon * (atan2(1,1) * 4) / 180) - (z2.lon * (atan2(1,1) * 4) / 180))/2.0),2))))) < 100 and z2.zip != z1.zip

This generates a lookup table that contains two zip codes and their distance from each other for every zip in the USA within 100 miles of each other. It uses the Haversine formula to calculate the distance between two points on the surface of a sphere.

I used this when designing our vertical search engine to create the lookup tables we use for our radius search. I use MySQL for this. You’re also going to need a zip_data table that contains zip codes and their respective latitudes and longitudes. You can buy this data for about $50 from the many retailers online.

Technology and Code20 Jul 2007 09:27 pm

The National Geospatial Intelligence Agency is one of my favorite data sources - it’s also one of my favorite names for any government agency. The agency provides a database of world-wide features which I use as a data source for Geojoey.com’s landmark search feature (top right of the screen).

These guys are selling the equivalent data for over $300.

For an up to date ZIP code database, you should contact USPS.gov and order it from them - which may take a while - government companies grumble grumble. Or just buy it from one of the many online sellers for about $50. Census.gov doesn’t do ZIP codes anymore, but if you don’t care about it being current, there’s an old ZIP code database available for download.

Census.gov’s TIGER database is definitely THE source for US geospatial data. My favorite page is the cartographic boundary files they’ve extracted from the database. It has things like ZCTA’s (ZIP borders), County boundaries, etc. If you’re handy with a graphics app, you can do all kinds of fun stuff with this data.

Technology and Startups and Code17 Jul 2007 06:00 am

I’ll often find myself chatting about choice of technology with fellow entrepreneurs and invariably it’s assumed the new web app is going to be developed in Rails.

I don’t know enough about Rails to judge it’s worth. I do know that you can develop applications in Rails very quickly and that it scales complexity better than Perl. Rails may have problems scaling performance. I also know that you can’t hire a Rails developer in Seattle for love or money.

So here are some things to think about when choosing a programming language and platform for your next consumer web business. They are in chronological order - the order you’re going to encounter each issue:

  1. Are you going to be able to hire great talent in languageX for a reasonable price?
  2. Can you code it quickly in languageX?
  3. Is languageX going to scale to handle your traffic?
  4. Is languageX going to scale to handle your complexity?
  5. Is languageX going to be around tomorrow?

If you answered yes to all 5 of these, then you’ve made the right choice.

I use Perl for my projects, and it does fairly well on most criteria. It’s weakest is scaling to handle complexity. Perl lets you invent your own style of coding, so it can become very hard to read someone else’s code. Usually that’s solved through coding by convention. Damian Conway’s Object Oriented Perl is the bible of Perl convention in case you’re considering going that route.

Technology and Innovation and Startups and Code16 Jul 2007 05:55 pm

I run two consumer web businesses. LineBuzz.com and Geojoey.com. Both have more than 50% of the app impelemented in Javascript and execute in the browser environment.

Something that occurred to me a while ago is that, because most of the execution happens inside the browser and uses our visitors CPU and memory, I don’t have to worry about my servers having to provide that CPU and memory.

I found myself moving processing to the client side where possible.

[Don’t worry, we torture our QA guru with a slow machine on purpose so she will catch any browser slowness we cause]

One down side is that everyone can see the Javascript source code - although it’s compressed which makes it a little harder to reverse engineer. Usually the most CPU intensive code is also the most interesting.

Another disadvantage is that I’m using a bit more bandwidth. But if the app is not shoveling vasts amount of data to do its processing and if I’m OK with exposing parts of my source to competitors, then these issues go away.

Moving execution to the client side opens up some interesting opportunities for distributed processing.

Lets say you have 1 million page views a day on your home page. That’s 365 Million views per year. Lets say each user spends an average of 1 minute on your page because they’re reading something interesting.

So that’s 365 million minutes of processing time you have available per year.

Converted to years, that’s 694 server years. One server working for 694 years or 694 servers working for 1 year.

But lets halve it because we haven’t taken into account load times or the fact that javascript is slower than other languages. So we have 347 server years.

Or put another way, it’s like having 347 additional servers per year.

The cheapest server at ServerBeach.com costs $75 per month or $900 per year. [It’s a 1.7Ghz Celeron with 512Megs RAM - we’re working on minimums here!]

So that translates 347 servers per year into $312,300 per year.

My method isn’t very scientific - and if you go around slowing down peoples machines, you’re not going to have 1 million page views per day for very long. But it gives you a general indication of how much money you can save if you can move parts of a CPU intensive web application to the client side.

So going beyond saving server costs, it’s possible for a high traffic website to do something similar to SETI@HOME and essentially turn the millions of workstations that spend a few minutes on the site each day into a giant distributed processing beowulf cluster using little old Javascript.

I18N and Code and LineBuzz14 Jul 2007 04:36 pm

When we launched LineBuzz on May 10, we had no idea that most of our press coverage was going to be Japanese. A site called 100Shiki.com put us up as dot-com of the day. All of a sudden we had lots of Japanese users. A few days later, a very popular blogger in China gave us a mention and we had lots of Chinese users too. Within a week we had over 15 languages on the site.

Three intense weeks later we launched an I18N version of the site.

Here’s a brief summary of some of the key issues we had to deal with when i18n’ing an app that has 50/50 client-server code and lots of communication between the two.

The code that is LineBuzz is very text intensive by the nature of the application. We provide inline comments without a browser plugin. One of the unique things about LineBuzz is that it doesn’t matter which page you post an inline comment on. The comment will appear anywhere on the website where the text and its surrounding paragraph appears.

So as you can imagine, we use a lot of regular expressions, character code conversions and text lengths.

Safari - not the worlds best browser

The first thing that broke was Safari. Safari’s regex engine in Javascript is seriously busted. It doesn’t support unicode characters at all. IIRC it simply returns true for any regex with unicode. So their claim that it’s the worlds best browser really irks me. So I had to write a fix-safari layer for anything that involved processing unicode.

No round-trip for jp charsets

The next thing that bit me was Japanese character set support. The Japanese use two main character sets: EUC-JP and Shift_JIS. The latter is a product of windows and the former is from unix. These both caused a major headache because they don’t round-trip convert to Unicode. Translated, that means that you can’t convert these characters to a unicode character set like UTF-8 and then convert them back to their native character set and expect the original to equal the converted characters. The solution: Store the raw character data for all character sets as binary and only convert to unicode if I absolutely must. I use UTF-8 on linebuzz.com, so that’s a scenario where I convert from binary to UTF-8.

When is a space not a space

Another thing that bit me was space character codes and spaces in regex. In unicode there are about 20 different space characters. Some regex engines are smart and recognize them all. Others only recognize the traditional ascii space character. So routines that for example, removed spaces, had to be hand tailored to deal with every unicode space.

String.charCodeAt() == lies lies lies!!

Character codes differ across operating systems. Some character sets contain characters that have a different character code on OS X than on Unix. Yes, even in the same browser using the same javascript engine (firefox for example), the character codes are different. So any routines that rely on consistent character codes across platforms have to deal with this little nightmare.

All this is behind us now and the Linebuzz code handles any character set in any language beautifully.

« Previous Page

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License.