Geospatial (location based) Searches in MongoDB – Part 2 – simple searching

April 2, 2012

This is the 2nd post of a multi-part series on performing geospatial (location based) searches on large data sets in MongoDB.

In this part we will focus on using simple queries to perform geo searches on the location tagged data that we loaded into the MongoDB (see part 1 for details).

MongoDB supports two-dimensional geospatial indexes. It is designed with location-based queries in mind, such as “find me the closest N items to my location.” It can also efficiently filter on additional criteria, such as “find the items that are within some distance of the centroid of my search”.

Assuming that the data has been properly loaded the first step it to index the data so that it can be geo searched by mongo. This can be done from Java, but for this exercise we will set it up from the command line:
db.name_of_collection.ensureIndex( { loc : “2d” } )

Once you have created the index, it is best to check and make sure that it was created properly:
db.name_of_collection.getIndexes()
You should see something like:
{ “v” : 1, “key” :
{ “loc” : “2d” },
“ns” : “geo1.um”,
“name” : “loc_”
}
Note: The key part is that “loc” should be “2d”

Now for the ‘fun part’, building and running the geospatial queries in Java. We will start simple and build more complex queries, using more advanced Java concepts (i.e. classes) as we go along. If you are not completely comfortable with building/running queries with the BasicDBObject then please review.

A simple geo search from the command line would look like: db.um.find( { loc : { $near : [15,150] }})
In java it would be:

Double locationLongitude = new Double (15);
Double locationLatitude = new Double (150);
BasicDBObject locQuery = new BasicDBObject();
locQuery.put(“loc”, new BasicDBObject(“$near”, new Double[]{locationLongitude, locationLatitude}));
// run the query
DBCursor locCursor = collectionUM.find( locQuery );
// use cursor to view results

Note: Use Double array in this approach; otherwise is can lead to precision issues.

A more complex (useful) search from the command like could be: db.um.find( { loc : { $near : [15.,-150.11] , $maxDistance : 40 } } ).limit(10)
In Java, using a JSON document:

String sLng = “15.5”; Double dLng = new Double(sLng);
String sLat = “150.11”; Double dLat = new Double(sLat);
String sDistance = “40”;
DBCursor cur = collectionUM.find(new BasicDBObject(“loc”,JSON.parse(“{$near : [ ” + dLng + “,” + dLat + ” ] , $maxDistance : ” + sDistance + “}”))).limit(10);

Note: Even with this relatively simple query, it does not appear that you can easily wrap the geo query parameters in a Java BasicDBObjec object. There have been a number of posting on Stack Overflow and the Mongo Google Groups on this issue. However, I have not yet see an example of a BasicDBObject implementation that does not have to resort to using (parsing) a JSON document. Also, there is some mention of implementing this query with the BasicDBObjectBuilder. I am currently looking to that.

In the next installment I plan to using more advanced Java concepts (i.e. classes) to implement the searches.

Advertisements

Geospatial (location based) Searches in MongoDB – Part 1 – data acqusition and loading

March 29, 2012

This is the first post of a multi-part series on performing geospatial (location based) searches on large data sets in MongoDB.

In this part, we will focus on getting large sets of geo-data into MongoDB using the Mongo data drivers. This will be code focused approach; the programming language is Java (Mongo supports a wide variety of programming languages). I will be using the Mongo Java driver parse the geo data, format the data, and insert it into Mongo. You will need to download the Java driver (jar file) from mongodb.org.

First, we will need the some geo-data. One of the best sources for ‘reasonably’ well formatted and consistent data is at GeoNames.org  Here you can download, files from individual countries or the allCountries.zip file containing over 7 Million (~200 MB). To start working I recommend that you download a small, country-specific zip file as it will be much easier to work with (I used um.zip, it contains 230 records). Once you have things working, you can down larger countries or the allCountries files.

The follow code segments perform for basic steps: getting a connection to the Mongo db, getting the collection object (used to store the data in the db), reading the data from the country specific file, and writing the data to the database.

(1)Get a connection to Mongo database

System.out.println(“mongo”);
Mongo mongo = new Mongo(“localhost”, 27017);
System.out.println(“getting db geo1”);
DB db = mongo.getDB(“geo1”);

(2) Create your collection and data store object

// get a single collection
System.out.println(“collection UM”);
DBCollection collectionUM = db.getCollection(“um”);
DBObject dbObject = null;
String jsonObj = “”;

(3) Read the data

There is nothing really fancy here. The data is in a text file, one record per line. Just read the line, tokenize the string, and extract the data. The only tricky part is that the data is not consistently delimited so that you have to look for and find the lat/lng data fields. I use a reg expression to find the floating point data fields (token.matches(“-?\\d+(.\\d+)?”), they are the only floating point fields in the record. To keep things simple I only retained four pieces of data: the geonameID, the location information (the text info between the geonameID and the latitude data field), the latitude data, and the longitude data.
Note: As per good programming practices you do need to check the lat/lng data to insure that is is a floating point number between +/- 180. Also, make sure that you do not lose precision of the data. This should not be a problem in Java, but this sort of thing can be be a bit of a headache in PHP.

(4)_Writing the data to Mongo

We will write the data to the database using the DBCollection.insert() method. In set (2) you created collections object that uses the collection that you will write your documents to. We will us that method to write a JSON object to the collection.
Writing the data is fairly straight forward, the only tricky part is properly formatting the JSON document to include a location array that can be indexed and used in a geospatial search.  The ‘loc’ field is an array.  It stores the lat and long data that you will index and use to perform the location based searches (will be described in part 2 of this series)
The format for the ‘json’ string is:
-> jsonObj = “{geonameID:” + geonameID + “,geoInfo:” + geoInfo + “,loc: [ ” + lat + “, ” + lng + “] }” ;
Remember, the lat and lng fields must be placed inside an array element (the name loc is arbitrary).
The ‘json’ string is loaded into a DBObject:
-> dbObject = (DBObject)JSON.parse(jsonObj);
And the dbObject is written to the database:
-> collectionUM.insert(dbObject);

Using this approach, you can write 100s or millions of records into the data store.

In the next part of this series, I will cover how to perform the geospatial (location based) radial and polygon searches of the geo-coded documents.