Friday, September 14, 2007

Optimizing geolocation queries

Problem scope: Finding geolocation information based on an ip address.

Geolocation databases typically have columns like – IPFROM, IPTO , CITY, STATE, COUNTRY, LATITUDE, LONGITUDE, etc. Here IPTO and IPFROM are long versions of IpAddresses.

Before we begin, here is an example on how to compute the long version of IPAddress.

long LongIP = 76.102.89.30= 30 + (89*256) + (102*256*256)+(79*256*256*256)= 1281775902

In my experiments, the GeoLocation_Table has (IPFROM, IPTO) set as the primary key. I’m running tests on my Pentium P4 desktop with 2GB RAM that has MySQL running as a service.

Here are some experiments I ran to determine the Geographical Location, given an ipAddress.

1. Common version/Range queries: This is what most people have been doing – this is also the prescribed way from most geolocation database vendors

SELECT LATITUDE, LONGITUDE, CITY, COUNTRY FROM GEOLOCATION_TABLE WHERE IPFROM <= LongIP and IPTO >= LongIP LIMIT 1;

The issue with this query is that the time taken depends on how many rows the query has to go through to figure out


2. My version

SELECT LATITUDE, LONGITUDE, CITY, COUNTRY FROM GEOLOCATION_TABLE WHERE IPFROM <=LongIP order by IPFROM desc limit 1

Here are some comparisons ofQuery times for #1 Vs. #2

I)
IP Address: 76.102.89.30
Long IP Address: 1281775902
Time taken (Query 1): 3.5s
Time taken (Query 2): 0.10s

II)
IP Address: 216.100.241.130
Long IP Address: 3630494082
Time taken (Query 1): 1.3s
Time taken (Query 2): 0.11s


Since the ranges are not overlapping, the second version works and performs much better than first (prescribed) version.

http://freeipservices.com