A friend is busy putting together a kick ass startup with a strong geographic component. He’s using Google Maps API version 3 which is a vast improvement (and total rewrite) from previous versions. But he needs to store and query his geographic data in a fast efficient way. There are many options out there, but if you want massive speed and scaleability you really want to use MySQL. So I’m writing this quickstart guide for his (and your) benefit.
Most websites don’t need to do complicated things like store polygon data. They just need to store points on a map and then retrieve those points. They also need to be able to ask the database for all points within a rectangle. So I’m going to run you through schema creation, inserting data, getting your lat/lon data out of the database again, and querying the database for all points within a rectangle. We’re also going to deal with the nasty little issue of asking the database for points in a rectangle that crosses the 180 degree boundary (or the International Date Line).
Why use MySQL’s GIS extensions?
The main, and possibly only reason is because you want speed. You could store lat/lon coordinates as decimal degrees in MySQL. Then when you query the database you’d say “Give me all records where the lat is > X and < Y and the lon is > A and < B. But MySQL (and many other databases) is slow when you’re doing range queries like that because it can’t use it’s regular B-Tree indexes effectively for a range query.
So instead you create a column called a geometry. Then you create an index on that column called a spatial index. This is really an R-Tree index that is very fast when you’re doing range queries.
It’s really that simple. You want to use MySQL’s GIS because spatial indexes are faster for lat/lon range queries than regular indexes. I honestly can’t think of another reason I’d go through the effort of storing/retrieving my data using GIS functions.
How do I create a table to store lat/lon points?
I’m assuming you know how to create a regular table in MySQL. You create a basic table containing coordinates like so:
CREATE TABLE geom ( lat float(10,7) NOT NULL, lon float(10,7) NOT NULL, g GEOMETRY NOT NULL, SPATIAL INDEX(g) ) ENGINE=MyISAM;
Firstly note that I’m storing the coordinates as decimal degrees AND in a spatial column (g). I’m a little paranoid and I like to store the source data.
Note that I’ve specified the table type to be MyISAM. You currently can only create a spatial index on MyISAM tables. You CAN create geometry columns on other table types like InnoDB, but without a spatial index it’s going to be slow which defeates the whole point of using GIS extensions because the only reason you’re using them is SPEED. The down-side of using MyISAM is that if you’re going to be doing a lot of writes to your table (by a lot I mean more than 10 per second) then MyISAM is going to slow down because it doesn’t support row level locking. But if you’re just going to be adding a few hundred thousand records a day and doing a lot more reads than writes, then this will work just fine for you. And remember that you can always replicate this table to a small cluster of slaves and have your web servers query the slaves when you want to scale your website.
How do I insert data into my fancy new spatial table?
I’m going to assume you’ve figured out how to get the lat/lon coordinates you need from the Google Maps API or whatever your source is. Here’s how you insert the data:
INSERT INTO geom (lat, lon, g) VALUES (47.37, -122.21, GeomFromText('POINT(47.37 -122.21)'));
Some things to note here: The value inside the GeomFromText function is a string. So in your application you’re going to have to create that string by concatenating together ‘POINT(‘, your lat, a space, your lon and ‘)’. Then you’re probably going to prepare a statement that looks like:
insert into geom (lat, lon, g) values (?, ?, GeomFromText(?))
When you execute it you’ll pass in the decimal degrees and the string you created.
Great, so how do I get the data back out again?
MySQL will tell you to use the AsText function to turn that geometry point back into something your application can use or pass to the Google Maps API. But because you also stored it as decimal degrees you can just do:
select lat, lon from geom;
But what you really care about is getting it back out FAST! When you ask the database for all points inside a rectangle you need to define that rectangle. So if you imagine a map you need to give the database two points on that map. So we’ll use a point in the South West and a point in the North East.
Lets say you want all points inside the rectangle where the South West point is latitude 46, longitude -123 and the North East point is latitude 48 and longitude -121. NOTE: -121 is further east than -123 degrees of longitude. Just to make it clearer:
SW Lat: 46
SW Lon: -123
NE Lat: 48
NE Lon: -121
You’ll do the following query:
select lat, lon from geom where MBRContains( GeomFromText('Polygon((46 -123, 48 -123, 48 -121, 46 -121, 46 -123))'), p );
If all those numbers look a little confusing, what you’re actually doing is drawing a square (polygon) starting at the south west corner and ending back at the south west corner. A square has four corners, but you have to close the box for MySQL so you have to repeat the last coordinate. ‘p’ is the second parameter to MBRContains and it specifies which column in the table must be contained in the box you’ve created. Lets replace the coordinates with variables to make it easier to read:
select lat, lon from geom where MBRContains( GeomFromText('Polygon((swLat swLon, neLat swLon, neLat neLon, swLat neLon, swLat swLon))'), p );
NOTE: I had to break the above lines up for readability. You may want to have this all as a single line in your code.
Well that’s just peachy, but what if my rectangle is in the middle of the Pacific ocean and crosses the International Date Line?
The Earth is, unfortunately, round. If it were flat we could end the conversation here, I could get on with some work and you could do whatever it is you do on a Saturday at 11:46am mountain standard time. But the Earth is round, so you and I are stuck with each other for another few minutes.
The reason roundness matters is because if the rectangle you are painting on Earth crosses 180 (or -180) degrees of longitude, then you need to change your logic a little. Normally your south-west longitude will be less than your north-east longitude. West is less than east. But if your square crosses the dreaded 180 boundary, then your western longitude will be greater than your eastern longitude. For example you might have a western longitude of 170 and an eastern longitude of 10 degrees.
If you don’t deal with this little hiccup then when you ask the database for points that are inside a square that crosses the 180 boundary, then you’re going to get everything to the left and right of that square and nothing inside it. So you have to draw two squares on either side of the 180 boundary.
You do this in your application logic. I’m going to throw a little code at you. Here goes:
sub makeMBR { my ($swLat, $swLon, $neLat, $neLon) = @_; if($swLon > $neLon) { return (' (' . 'MBRContains(GeomFromText(\'Polygon((' . $swLat . ' ' . $swLon . ',' . $neLat . ' ' . $swLon . ',' . $neLat . ' 180,' . $swLat . ' 180,' . $swLat . ' ' . $swLon . '))\'), g) ' . ' OR ' . ' MBRContains(GeomFromText(\'Polygon((' . $swLat . ' -180,' . $neLat . ' -180,' . $neLat . ' ' . $neLon . ',' . $swLat . ' ' . $neLon . ',' . $swLat . ' -180' . '))\'), g) ' . ') ', 1); } else { return (' MBRContains(GeomFromText(\'Polygon((' . $swLat . ' ' . $swLon . ',' . $neLat . ' ' . $swLon . ',' . $neLat . ' ' . $neLon . ',' . $swLat . ' ' . $neLon . ',' . $swLat . ' ' . $swLon . '))\'), g) ', 0); } }
The code above is Perl. It looks horrendous but it’s actually quite simple. It creates the MBRContains() part of the SQL statement for you automatically. It simply says: If the swLon is greater than the neLon then create two boxes and ask the database for all points in both those boxes. Otherwise just create one box as per normal. The two boxes that area created are on either side of the dreaded 180 boundary.
The function actually returns two values. The first is the MBRContains string that you can combine with the SQL in your application and feed to the database. The second value is either a 1 or a zero. A 1 indicates that the box has crossed the dreaded 180 boundary and you’re actually asking for points in 2 boxes. A zero indicates that it’s a regular single box. You may want to use this value in your application to determine how things are displayed. Generally when a box crosses the 180 boundary I tend to zoom out a little more so the user can see what’s going on.
You’ll use this code like so:
my ($geoSQL, $crossesIDL) = makeMBR(46, -123, 48, -121); #Then you'll run this query on the database: $dbh->selectrow_array("select lat, lon from geom where $geoSQL");
Conclusion
If this helped you and you’ve discovered a tip that could help others or have something to add, please post a comment. Muchos gracias, baaie dankie, dis mos lekker by die see and have a spectacular day!
Mark.