A while back, Jobster CTO Phil Bogle blogged about some of the tricks I’ve used to do fast location queries in SQL. The link to my SQL query to generate the zip lookup table for radius searches is now dead (a cybersquatter stole my domain name and I don’t want to discuss it!). So here’s the original blog post:
If you need to build a radius search for something on your website, then creating a zip code distance lookup table performs much better than calculating distance for every search. This monster chunk of SQL will create that table for you. It takes about 12 hours to run on a fast machine.
insert into zip_dist2 (fromzip, tozip, dist) select z1.zip, z2.zip, ROUND((3956 * 2 * atan2(sqrt((POW(sin(((z1.lat * (atan2(1,1) * 4) / 180) – (z2.lat * (atan2(1,1) * 4) / 180))/2.0),2) + cos((z2.lat * (atan2(1,1) * 4) / 180)) * cos((z1.lat * (atan2(1,1) * 4) / 180)) * POW(sin(((z1.lon * (atan2(1,1) * 4) / 180) – (z2.lon * (atan2(1,1) * 4) / 180))/2.0),2))),sqrt(1-(POW(sin(((z1.lat * (atan2(1,1) * 4) / 180) – (z2.lat * (atan2(1,1) * 4) / 180))/2.0),2) + cos((z2.lat * (atan2(1,1) * 4) / 180)) * cos((z1.lat * (atan2(1,1) * 4) / 180)) * POW(sin(((z1.lon * (atan2(1,1) * 4) / 180) – (z2.lon * (atan2(1,1) * 4) / 180))/2.0),2))))),2) as distance from zip_data as z1, zip_data as z2 where (3956 * 2 * atan2(sqrt((POW(sin(((z1.lat * (atan2(1,1) * 4) / 180) – (z2.lat * (atan2(1,1) * 4) / 180))/2.0),2) + cos((z2.lat * (atan2(1,1) * 4) / 180)) * cos((z1.lat * (atan2(1,1) * 4) / 180)) * POW(sin(((z1.lon * (atan2(1,1) * 4) / 180) – (z2.lon * (atan2(1,1) * 4) / 180))/2.0),2))),sqrt(1-(POW(sin(((z1.lat * (atan2(1,1) * 4) / 180) – (z2.lat * (atan2(1,1) * 4) / 180))/2.0),2) + cos((z2.lat * (atan2(1,1) * 4) / 180)) * cos((z1.lat * (atan2(1,1) * 4) / 180)) * POW(sin(((z1.lon * (atan2(1,1) * 4) / 180) – (z2.lon * (atan2(1,1) * 4) / 180))/2.0),2))))) < 100 and z2.zip != z1.zip
This generates a lookup table that contains two zip codes and their distance from each other for every zip in the USA within 100 miles of each other. It uses the Haversine formula to calculate the distance between two points on the surface of a sphere.
I used this when designing our vertical search engine to create the lookup tables we use for our radius search. I use MySQL for this. You’re also going to need a zip_data table that contains zip codes and their respective latitudes and longitudes. You can buy this data for about $50 from the many retailers online.
Leave a Reply