Perl: Kicking your language’s ass since 1988

The video below is Perl’s development history since Larry Wall and a small team started out in January 1988. It’s visualized using gource. Notice how dev activity has continued to increase all the way to 2010.

Perl is a powerful language. It’s also fast and everything you need has already been implemented, debugged, refactored, reimplemented and made rock solid. If you ever have a problem, it’s already been solved by someone else. When I was in my 20’s I was a big fan of the new-new thing. Now, as a startup owner taking strategic risks and trying to reduce risks in other areas of the business, I love Perl because I know it will do right by me and I deploy code knowing I’m not betting on a language that might one day grow up to be what Perl already is.

Watch the video in 720 hidef on full screen (Youtube link) to see all the labels. It’s awesome realizing how much work and evolution has gone into some of the core libs that I use.

How to integrate PHP, Perl and other languages on Apache

I have this module that a great group of guys in Malaysia have put together. But their language of choice is PHP and mine is Perl. I need to modify it slightly to integrate it. For example, I need to add my own session code so that their code knows if my user is logged in or not and who they are.

I started writing PHP but quickly started duplicating code I’d already written in Perl. Fetch the session from the database, de-serialize the session data, that sort of thing. I also ran into issues trying to recreate my Perl decryption routines in PHP. [I use non-mainstream ciphers]

Then I found ways to run Perl inside PHP and vice-versa. But I quickly realized that’s a very bad idea. Not only are you creating a new Perl or PHP interpreter for every request, but you’re still duplicating code, and you’re using a lot more memory to run interpreters in addition to what mod_php and mod_perl already run.

Eventually I settled on creating a very lightweight wrapper function in PHP called doPerl. It looks like this:

$associativeArrayResult = doPerl(functionName, associativeArrayWithParameters);

function doPerl($func, $arrayData){
 $ch = curl_init();
 $ip = '127.0.0.1';
 $postData = array(
 json => json_encode($arrayData),
 auth => 'myPassword',
 );
 curl_setopt($ch,CURLOPT_POST, TRUE);
 curl_setopt($ch,CURLOPT_POSTFIELDS, $postData);
 curl_setopt($ch, CURLOPT_URL, "http://" . $ip . "/webService/" . $func . "/");
 curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
 $output = curl_exec($ch);
 curl_close($ch);
 $data = json_decode($output, TRUE);
 return $data;
}

On the other side I have a very fast mod_perl handler that only allows connections from 127.0.0.1 (the local machine). I deserialise the incoming JSON data using Perl’s JSON::from_json(). I use eval() to execute the function name that is, as you can see above, part of the URL. I reserialize the result using Perl’s JSON::to_json($result) and send it back to the PHP app as the HTML body.

This is very very fast because all PHP and Perl code that executes is already in memory under mod_perl or mod_php. The only overhead is the connection creation, sending of packet data across the network connection and connection breakdown. Some of this is handled by your server’s hardware. [And of course the serialization/deserialization of the JSON data on both ends.]

The connection creation is a three way handshake, but because there’s no latency on the link it’s almost instantaneous. The transferring of data is faster than a network because the MTU on your lo interface (the 127.0.0.1 interface) is 16436 bytes instead of the normal 1500 bytes. That means the entire request or response fits inside a single packet. And connection termination is again just two packets from each side and because of the zero latency it’s super fast.

I use JSON because it’s less bulky than XML and on average it’s faster to parse across all languages. Both PHP and Perl’s JSON routines are ridiculously fast.

My final implementation on the PHP side is a set of wrapper classes that use the doPerl() function to do their work. Inside the classes I use caching liberally, either in instance variables, or if the data needs to persist across requests I use PHP’s excellent APC cache to store the data in shared memory.

Update: On request I’ve posted the perl web service handler for this here. The Perl code allows you to send parameters via POST using either a query parameter called ‘json’ and including escaped JSON that will get deserialized and passed to your function, or you can just use regular post style name value pairs that will be sent as a hashref to your function. I’ve included one test function called hello() in the code. Please note this web service module lets you execute arbitrary perl code in the web service module’s namespace and doesn’t filter out double colon’s, so really you can just do whatever the hell you want. So I’ve included two very simple security mechanisms that I strongly recommend you don’t remove. It only allows requests from localhost, and you must include an ‘auth’ post parameter containing a password (currently set to ‘password’). You’re going to have to implement the MM::Util::getIP() routine to make this work and it’s really just a one liner:

sub getIP {
 my $r = shift @_;
 return $r->headers_in->{'X-Forwarded-For'} ? 
    $r->headers_in->{'X-Forwarded-For'} : 
    $r->connection->get_remote_host();
}

MySQL GIS Extensions Quick Start

A friend is busy putting together a kick ass startup with a strong geographic component. He’s using Google Maps API version 3 which is a vast improvement (and total rewrite) from previous versions. But he needs to store and query his geographic data in a fast efficient way. There are many options out there, but if you want massive speed and scaleability you really want to use MySQL. So I’m writing this quickstart guide for his (and your) benefit.

Most websites don’t need to do complicated things like store polygon data. They just need to store points on a map and then retrieve those points. They also need to be able to ask the database for all points within a rectangle. So I’m going to run you through schema creation, inserting data, getting your lat/lon data out of the database again, and querying the database for all points within a rectangle. We’re also going to deal with the nasty little issue of asking the database for points in a rectangle that crosses the 180 degree boundary (or the International Date Line).

Why use MySQL’s GIS extensions?

The main, and possibly only reason is because you want speed. You could store lat/lon coordinates as decimal degrees in MySQL. Then when you query the database you’d say “Give me all records where the lat is > X and < Y and the lon is > A and < B. But MySQL (and many other databases) is slow when you’re doing range queries like that because it can’t use it’s regular B-Tree indexes effectively for a range query.

So instead you create a column called a geometry. Then you create an index on that column called a spatial index. This is really an R-Tree index that is very fast when you’re doing range queries.

It’s really that simple. You want to use MySQL’s GIS because spatial indexes are faster for lat/lon range queries than regular indexes. I honestly can’t think of another reason I’d go through the effort of storing/retrieving my data using GIS functions.

How do I create a table to store lat/lon points?

I’m assuming you know how to create a regular table in MySQL. You create a basic table containing coordinates like so:

CREATE TABLE geom (
  lat float(10,7) NOT NULL,
  lon float(10,7) NOT NULL,
  g GEOMETRY NOT NULL,
  SPATIAL INDEX(g)
) ENGINE=MyISAM;

Firstly note that I’m storing the coordinates as decimal degrees AND in a spatial column (g). I’m a little paranoid and I like to store the source data.

Note that I’ve specified the table type to be MyISAM. You currently can only create a spatial index on MyISAM tables. You CAN create geometry columns on other table types like InnoDB, but without a spatial index it’s going to be slow which defeates the whole point of using GIS extensions because the only reason you’re using them is SPEED. The down-side of using MyISAM is that if you’re going to be doing a lot of writes to your table (by a lot I mean more than 10 per second) then MyISAM is going to slow down because it doesn’t support row level locking. But if you’re just going to be adding a few hundred thousand records a day and doing a lot more reads than writes, then this will work just fine for you. And remember that you can always replicate this table to a small cluster of slaves and have your web servers query the slaves when you want to scale your website.

How do I insert data into my fancy new spatial table?

I’m going to assume you’ve figured out how to get the lat/lon coordinates you need from the Google Maps API or whatever your source is. Here’s how you insert the data:

INSERT INTO geom (lat, lon, g) VALUES
    (47.37, -122.21, GeomFromText('POINT(47.37 -122.21)'));

Some things to note here: The value inside the GeomFromText function is a string. So in your application you’re going to have to create that string by concatenating together ‘POINT(‘, your lat, a space, your lon and ‘)’. Then you’re probably going to prepare a statement that looks like:

insert into geom (lat, lon, g) values
   (?, ?, GeomFromText(?))

When you execute it you’ll pass in the decimal degrees and the string you created.

Great, so how do I get the data back out again?

MySQL will tell you to use the AsText function to turn that geometry point back into something your application can use or pass to the Google Maps API. But because you also stored it as decimal degrees you can just do:

select lat, lon from geom;

But what you really care about is getting it back out FAST! When you ask the database for all points inside a rectangle you need to define that rectangle. So if you imagine a map you need to give the database two points on that map. So we’ll use a point in the South West and a point in the North East.

Lets say you want all points inside the rectangle where the South West point is latitude 46, longitude -123 and the North East point is latitude 48 and longitude -121. NOTE: -121 is further east than -123 degrees of longitude. Just to make it clearer:

SW Lat: 46
SW Lon: -123
NE Lat: 48
NE Lon: -121

You’ll do the following query:

select lat, lon from geom where
   MBRContains(
    GeomFromText('Polygon((46 -123, 48 -123, 48 -121, 46 -121, 46 -123))'),
    p
   );

If all those numbers look a little confusing, what you’re actually doing is drawing a square (polygon) starting at the south west corner and ending back at the south west corner. A square has four corners, but you have to close the box for MySQL so you have to repeat the last coordinate. ‘p’ is the second parameter to MBRContains and it specifies which column in the table must be contained in the box you’ve created. Lets replace the coordinates with variables to make it easier to read:

select lat, lon from geom where
   MBRContains(
    GeomFromText('Polygon((swLat swLon, neLat swLon, neLat neLon,
      swLat neLon, swLat swLon))'),
   p );

NOTE: I had to break the above lines up for readability. You may want to have this all as a single line in your code.

Well that’s just peachy, but what if my rectangle is in the middle of the Pacific ocean and crosses the International Date Line?

The Earth is, unfortunately, round. If it were flat we could end the conversation here, I could get on with some work and you could do whatever it is you do on a Saturday at 11:46am mountain standard time. But the Earth is round, so you and I are stuck with each other for another few minutes.

The reason roundness matters is because if the rectangle you are painting on Earth crosses 180 (or -180) degrees of longitude, then you need to change your logic a little. Normally your south-west longitude will be less than your north-east longitude. West is less than east. But if your square crosses the dreaded 180 boundary, then your western longitude will be greater than your eastern longitude. For example you might have a western longitude of 170 and an eastern longitude of 10 degrees.

If you don’t deal with this little hiccup then when you ask the database for points that are inside a square that crosses the 180 boundary, then you’re going to get everything to the left and right of that square and nothing inside it. So you have to draw two squares on either side of the 180 boundary.

You do this in your application logic. I’m going to throw a little code at you. Here goes:

sub makeMBR {
 my ($swLat, $swLon, $neLat, $neLon) = @_;

 if($swLon > $neLon) {
 return (' (' .
 'MBRContains(GeomFromText(\'Polygon((' .
 $swLat . ' ' . $swLon . ',' . $neLat . ' ' . $swLon . ',' .
 $neLat . ' 180,' . $swLat . ' 180,' .
 $swLat . ' ' . $swLon .
 '))\'), g) ' .
 ' OR ' .
 ' MBRContains(GeomFromText(\'Polygon((' .
 $swLat . ' -180,' . $neLat . ' -180,' .
 $neLat . ' ' . $neLon . ',' . $swLat . ' ' . $neLon . ',' .
 $swLat . ' -180' .
 '))\'), g) ' .
 ') ', 1);

 } else {
 return (' MBRContains(GeomFromText(\'Polygon((' .
 $swLat . ' ' . $swLon . ',' . $neLat . ' ' . $swLon . ',' .
 $neLat . ' ' . $neLon . ',' . $swLat . ' ' . $neLon . ',' .
 $swLat . ' ' . $swLon .
 '))\'), g) ', 0);
 }
}

The code above is Perl. It looks horrendous but it’s actually quite simple. It creates the MBRContains() part of the SQL statement for you automatically. It simply says: If the swLon is greater than the neLon then create two boxes and ask the database for all points in both those boxes. Otherwise just create one box as per normal. The two boxes that area created are on either side of the dreaded 180 boundary.

The function actually returns two values. The first is the MBRContains string that you can combine with the SQL in your application and feed to the database. The second value is either a 1 or a zero. A 1 indicates that the box has crossed the dreaded 180 boundary and you’re actually asking for points in 2 boxes. A zero indicates that it’s a regular single box. You may want to use this value in your application to determine how things are displayed. Generally when a box crosses the 180 boundary I tend to zoom out a little more so the user can see what’s going on.

You’ll use this code like so:

my ($geoSQL, $crossesIDL) = makeMBR(46, -123, 48, -121);
#Then you'll run this query on the database:
$dbh->selectrow_array("select lat, lon from geom where $geoSQL");

Conclusion

If this helped you and you’ve discovered a tip that could help others or have something to add, please post a comment. Muchos gracias, baaie dankie, dis mos lekker by die see and have a spectacular day!

Mark.