Avoiding cross site request forgery in your web apps

Google recently fixed a glaring vulnerability in gmail that allows an attacker to forward copies of all or some of your email to themselves by adding a filter to your gmail account. But not before someone lost their domain name to an attacker who then proceeded to try to sell it back to them for cash.

The gmail bug was a cross site request forgery exploit. The attack is incredibly simple. If a user is authenticated to a website, an attacker simply gets that user to load a URL that causes the user to effectively take some sort of action on that website. So by clicking a link in an email or on a website, or by simply loading up a malicious web page that contains an image URL with the correct query string parameters, an attacker can get an unsuspecting user to “do something” on a website they’re a member of.

Wikipedia has a good summary on CSRF and I recommend you read it if you haven’t already. Avoiding CSRF vulnerabilities in your web apps is easy: In all forms that require a user to be authenticated, simply reauthenticate them using some user-specific transient data. You could, for example, include a users session ID in a hidden form field and when the user submits the form check that the session ID in the form POST matches the session ID in the users cookie.

If your session ID’s change every time a user authenticates to your website, it effectively defeats this attack. For extra security you may want to either encrypt the session ID in the form’s hidden field, or set the hidden fields value to an MD5 hash of the real session ID.

The Google CSRF required a form POST which was only slightly more complex for an attacker to implement. But many CSRF attacks don’t require a POST and parameters can therefore appear in a URL query string. The effect of this is that your website can be exploited by one of your users simply loading an image on a malicious web page or in a malicious email.

The importance of not knowing what isn’t possible

A Microsoft quote from an NY Times article I’ve already cited has been bugging the crap out of me. It bugged me when I first blogged about this article and it bugged me as I wandered around B&N last night doing the last of my xmass shopping. I wound up in the management section and picked up a book on the top 10 mistakes leaders make. Staring at me as I flipped open chapter 5 was confirmation that I wasn’t nuts.

Here’s the quote that bugged me:

“I’m happy that by hiring a bunch of old hands, who have been through these wars for 10 or 20 years, we at least have a nucleus of people who kind of know what’s possible and what isn’t,”

I’ve lost count of how many times as a software developer I’ve sat down and said “I wonder if this is possible?”. When I created WorkZoo I wondered if it was possible to aggregate all the worlds jobs into a single database – and I got pretty darn close. When I created Geojoey I wondered if it was possible to have a rich pure Ajax application with a client-side MVC model – and it was. When I created LineBuzz I wondered if it was possible to post inline comments on arbitrary text on any web page – yes it’s possible. When I created Feedjit I wondered if it was possible to scale to serve real-time traffic data in a widget. We’re serving almost 100 Million real-time widgets per month now.

I started coding on an Apple IIe and later moved to IBM PC’s so in my youth Apple and Microsoft were symbols of innovation and I wanted to innovate the way they did. Apple’s still doing a great job, but it breaks my heart to see MS floundering like a fish out of water in the new world of broadband, browser standards, open source and dynamic web applications.

Come on guys. Get it together already!! Fire those know-it-alls, hire some new blood and pretend for a moment that the past doesn’t matter and that anything is possible.

Smart Image Resizing – Liquid Rescale

Phil Bogle wrote recently about an awesome image resizing algorithm. I found out via a welsh view what happened to it. It’s been launched as a website called RSizr.com and is also available as a Gimp plugin called Liquid Rescale. It’s really really cool to see this amazing algo take the open source route.

It’s an incredibly smart algo – I tried it on a Google Analytics graph and it shrunk the graph without breaking the line while maintaining the text scale.

It’d be awesome to see this as a feature in Image Magick so we can put more web front-ends on it.

The perils of high traffic

I have this little 64 bit dual core opteron that I’m busy torturing with way more traffic than it’s creators intended. I sometimes edit code on my live servers – only when I’m sure it’s not going to break anything and only when I’m wide-awake and fully caffeinated.  Today I tried to edit a file on a live box. In the time that it took ViM to delete the file and rewrite it to disk during the save operation (about 1/50th of a second), the webserver threw out 20 messages saying file not found.

Not to self: No more editing-code-on-live-cowboy-crap.

Configuring apache 2.2.4 + mod_perl 2.0 + php 5.2.3 + libapreq with a worker MPM

I couldn’t find any docs on compiling mod_perl2 alongside php5 with apache 2.2, so hopefully this helps someone.

I’ve always statically compiled mod_perl into apache, but the easiest way to get mod_perl to play nice with PHP under apache2 is to compile them as DSO’s or dynamic modules that are inserted at runtime. I’ve tested this under Ubuntu 7 and CentOS 5.

At the time of this writing the server hosting this page is running with this config and handles a not-insignificant amount of traffic.

NOTE: I use a worker MPM with Apache to get the best possible performance. The worker MPM is a hybrid thread/process model. It requires that PHP be threadsafe when compiled.

Here are the commands I use. I’m assuming you’ve downloaded the latest apache httpd 2.2 source code, php’s source code, mod_perl’s source code and libapreq2’s source code. I’m assuming you’re smart enough to know when to CD to the directory of each app to compile and install that app, so I’ve left out basic steps like that.

First compile apache with DSO support. Enable the worker MPM, enable mod_rewrite, enable mod_expires, and add a little magic to make libapreq work:

./configure –prefix=/usr/local/apache2 –with-mpm=worker –enable-so –enable-rewrite –enable-expires –with-included-apr

make

make install

Now that apache is installed, compile and install a thread-safe PHP DSO . Note the enable-maintainer-zts compiles a threadsafe PHP. I’ve also added mysql support.

./configure –with-mysql –enable-maintainer-zts –with-apxs2=/usr/local/apache2/bin/apxs

make

make install

Now you compile and install a mod_perl DSO.

perl Makefile.PL –with-apache2-apxs=/usr/local/apache2/bin/apxs

make

make install

Next you compile and install libapreq as a DSO

perl Makefile.PL –with-apache2-apxs=/usr/local/apache2/bin/apxs

make

make install

Make sure your httpd.conf contains the following to enable mod_perl, php and libapreq:

LoadModule apreq_module /usr/local/apache2/modules/mod_apreq2.so
LoadModule perl_module modules/mod_perl.so
LoadModule php5_module modules/libphp5.so
DirectoryIndex index.html index.htm index.php
AddType application/x-httpd-php .php
AddType application/x-httpd-php-source .phps

How to create a ZIP code distance lookup table with 1 line of SQL

A while back, Jobster CTO Phil Bogle blogged about some of the tricks I’ve used to do fast location queries in SQL. The link to my SQL query to generate the zip lookup table for radius searches is now dead (a cybersquatter stole my domain name and I don’t want to discuss it!). So here’s the original blog post:

If you need to build a radius search for something on your website, then creating a zip code distance lookup table performs much better than calculating distance for every search. This monster chunk of SQL will create that table for you. It takes about 12 hours to run on a fast machine.

insert into zip_dist2 (fromzip, tozip, dist) select z1.zip, z2.zip, ROUND((3956 * 2 * atan2(sqrt((POW(sin(((z1.lat * (atan2(1,1) * 4) / 180) – (z2.lat * (atan2(1,1) * 4) / 180))/2.0),2) + cos((z2.lat * (atan2(1,1) * 4) / 180)) * cos((z1.lat * (atan2(1,1) * 4) / 180)) * POW(sin(((z1.lon * (atan2(1,1) * 4) / 180) – (z2.lon * (atan2(1,1) * 4) / 180))/2.0),2))),sqrt(1-(POW(sin(((z1.lat * (atan2(1,1) * 4) / 180) – (z2.lat * (atan2(1,1) * 4) / 180))/2.0),2) + cos((z2.lat * (atan2(1,1) * 4) / 180)) * cos((z1.lat * (atan2(1,1) * 4) / 180)) * POW(sin(((z1.lon * (atan2(1,1) * 4) / 180) – (z2.lon * (atan2(1,1) * 4) / 180))/2.0),2))))),2) as distance from zip_data as z1, zip_data as z2 where (3956 * 2 * atan2(sqrt((POW(sin(((z1.lat * (atan2(1,1) * 4) / 180) – (z2.lat * (atan2(1,1) * 4) / 180))/2.0),2) + cos((z2.lat * (atan2(1,1) * 4) / 180)) * cos((z1.lat * (atan2(1,1) * 4) / 180)) * POW(sin(((z1.lon * (atan2(1,1) * 4) / 180) – (z2.lon * (atan2(1,1) * 4) / 180))/2.0),2))),sqrt(1-(POW(sin(((z1.lat * (atan2(1,1) * 4) / 180) – (z2.lat * (atan2(1,1) * 4) / 180))/2.0),2) + cos((z2.lat * (atan2(1,1) * 4) / 180)) * cos((z1.lat * (atan2(1,1) * 4) / 180)) * POW(sin(((z1.lon * (atan2(1,1) * 4) / 180) – (z2.lon * (atan2(1,1) * 4) / 180))/2.0),2))))) < 100 and z2.zip != z1.zip

This generates a lookup table that contains two zip codes and their distance from each other for every zip in the USA within 100 miles of each other. It uses the Haversine formula to calculate the distance between two points on the surface of a sphere.

I used this when designing our vertical search engine to create the lookup tables we use for our radius search. I use MySQL for this. You’re also going to need a zip_data table that contains zip codes and their respective latitudes and longitudes. You can buy this data for about $50 from the many retailers online.

World-wide city database and other (free) geospatial data

The National Geospatial Intelligence Agency is one of my favorite data sources – it’s also one of my favorite names for any government agency. The agency provides a database of world-wide features which I use as a data source for Geojoey.com’s landmark search feature (top right of the screen).

These guys are selling the equivalent data for over $300.

For an up to date ZIP code database, you should contact USPS.gov and order it from them – which may take a while – government companies grumble grumble. Or just buy it from one of the many online sellers for about $50. Census.gov doesn’t do ZIP codes anymore, but if you don’t care about it being current, there’s an old ZIP code database available for download.

Census.gov’s TIGER database is definitely THE source for US geospatial data. My favorite page is the cartographic boundary files they’ve extracted from the database. It has things like ZCTA’s (ZIP borders), County boundaries, etc. If you’re handy with a graphics app, you can do all kinds of fun stuff with this data.

Programming language choices for entrepreneurs

I’ll often find myself chatting about choice of technology with fellow entrepreneurs and invariably it’s assumed the new web app is going to be developed in Rails.

I don’t know enough about Rails to judge it’s worth. I do know that you can develop applications in Rails very quickly and that it scales complexity better than Perl. Rails may have problems scaling performance. I also know that you can’t hire a Rails developer in Seattle for love or money.

So here are some things to think about when choosing a programming language and platform for your next consumer web business. They are in chronological order – the order you’re going to encounter each issue:

  1. Are you going to be able to hire great talent in languageX for a reasonable price?
  2. Can you code it quickly in languageX?
  3. Is languageX going to scale to handle your traffic?
  4. Is languageX going to scale to handle your complexity?
  5. Is languageX going to be around tomorrow?

If you answered yes to all 5 of these, then you’ve made the right choice.

I use Perl for my projects, and it does fairly well on most criteria. It’s weakest is scaling to handle complexity. Perl lets you invent your own style of coding, so it can become very hard to read someone else’s code. Usually that’s solved through coding by convention. Damian Conway’s Object Oriented Perl is the bible of Perl convention in case you’re considering going that route.