Mapping Shortest Routes Using a Graph Database

Lou Mauget Databases, Development Technologies, Java 11 Comments

Attention: The following article was published over 11 years ago, and the information provided may be aged or outdated. Please keep that in mind as you read the post.

We often model interconnected data by cramming it in and out of table structures. Why don’t we simply model interconnected data as … interconnected data?

I recently wrote that there are several kinds of NoSQL database stores: key-value, column family, document-oriented, and graph database.  This article targets Neo4j, a Java-based graph DBMS. The open-ended problem domain of graph databases includes access control lists (ACLs), social networks, recommendation systems, and geo-spatial modeling.

Failure

In 2007, a colleague and I used Java with Oracle 9i to implement Dijkstra’s Algorithm. Our “MapQuest for Trains” application would route a rail train over various right-of-ways while minimizing cost. The cost was a function of distance, fuel surcharge, and obstacles. The task to route a train from Los Angeles to Chicago had a grotesquely long response time. Nobody wanted their applications deployed on our nodes because we spiked the servers!

Where did we go wrong? The USA rail map is a living instance of graph theory. We modeled an extensive network of nodes (stations) and connecting links (tracks) as node tables joined through link tables. Navigation through this model involved n-level joins. An RDMS hits the wall as it proceeds through 4th and 5th level joins. See the Neo4j in Action book in the references for benchmarks of a graph DBMS Neo4j against RDBMS MySQL. We tried sidestepping this deep join problem by working with the network in memory for the life of the application instance. The North American rail system is quite large. The application thrashed.

Graph Data Store

Fast forward to summer, 2012.  I stumbled onto Neo4j, a graph database. The literature described suitability of Neo4j for modeling social networks. It could rapidly answer queries such as, “How many of Judy’s friends, their friends, and their friends’ friends have read Little Women?” I could use Neo4j to create my own Facebook or Twitter. Think of a social network as an instance of graph theory. I wasn’t interested, but how about a do-over at routing trains through the graph that comprises the USA rail network?

Raw Data

I wanted to publish my work with no strings attached. I found an open 1996 National Transportation Atlas Database ZIP file of rail data. I found a flat node file and a link file within. Each record of the node file had an integer ID plus a pair of fields for latitude and longitude. Each record of the link file had a distance field along with the node IDs for the node pair it connected. Perfect!

Bulk Load

This data could directly map to a Neo4j database. See Figure 1. I conceived Bizarro World, where railroad stations would have graph nodes with integer “names.” Hey, it’s only a demo! Each station node would have a latitude property and a longitude property. A station-to-station connecting track becomes a graph link having a single node-to-node distance property. Any number of links could fan out from a given station node.

I wrote a command-line Eclipse Maven project to bulk-insert a file of nodes, index them, and insert the corresponding file of links – all into an embedded Neo4j database. Run time was 20 minutes on my MacBook Air! I discovered it has a fan! Neo4j is ACID transactional. I discovered a non-transactional bulk insert API that speeded the insertion by an order-of-magnitude.

Figure 1: Neo4j models graph theory structure directly

Google Earth

My goal was to implement a Dijkstra algorithm, or the faster A* algorithm to produce a shortest distance node-by-node path between any two of my integer-named rail stations.  See Figure 2 for the notion of a shortest or least-cost path between N1 and N7.

Figure 2: Shortest path from N1 to N7

Each node would contain the node’s latitude / longitude properties. I could walk the path to emit Google KML – meaning Keyhole Markup Language – not related to Keyhole Software. Google Earth will render a KML layer atop a map. The visual effect would be striking.

At this point I received a gift from Neo4j. The heart of my code is able to pass the buck to Neo4j to do the heavy lifting. See the gist, or a subset below:

Transaction tx = graphDb.beginTx();
try {
    Expander relExpander = Traversal.expanderForTypes(
                DomainConstants.RelTypes.DOMAIN_LINK, Direction.BOTH);

    relExpander.add(DomainConstants.RelTypes.DOMAIN_LINK, Direction.BOTH);

    PathFinder<WeightedPath> shortestPath = GraphAlgoFactory.aStar(relExpander,
                costEval, estimateEval);

    ps.println(KMLConstants.KML_LINE_START);
    emitCoordinate(ps, shortestPath, nodeA, nodeB);
    ps.println(KMLConstants.KML_LINE_END);

    tx.success();

} finally {
        tx.finish();
}

The built-in A* algorithm picks a shortest route through the maze of nodes – ‘stations’ – to return an iterator, shortestPath, over those latitude / longitude points that form a shortest path from beginning to end.  I ran that iterator to create a KML stream of points suitable for display on Google Maps or Google Earth. Each node is connected to another by a link that has a distance property.

Additionally, I had to supply a “cost” callback function. The A* algorithm multiplies each examined link cost property – miles – by this functional value. My surrealistic Bizarro World implementation simply returns the value one. In a realistic railroad world, this function could factor in variables such as fuel cost, track usage charges, train type restrictions, or even regional embargos against travel, caused by events such as a Hurricane Sandy obliterating tracks. A cost factor for the latter route would be a huge number.

The algorithm produces “a” shortest route. There could be ties in total cost or distance.  Presumably, a smarter cost function would impose enough variance to eliminate those rare ties.

The user passes a “to” and “from” station integer on the command line.  The application quickly produces the route as a KML stream sent to Google Earth in a per-OS-dependent manner. The visual effect is a smooth zoom into a display, such as that of Figure 3. The zoom actually takes longer than its triggering route calculation. One can zoom down farther, usually finding satellite shots of trains in the cities.

Figure 3: Command-based route displayed on Google Earth

You can clone the project from GitHub here. Import it as a Maven project to Eclipse 3.7 Juno. Compiler level is Java 1.6. Use the REAME.md as your guide.

Mobile Web

That was easy, but I prefer web apps, especially mobile web apps. How about a KML layer atop Google Maps?  I created another Maven project in Eclipse composed of the following function points:

  1. An initialization servlet that copies the Neo4j rail graph database from the class path to a writeable directory – so that Neo4j can write a lock value.
  2. A controller servlet that expects “from” and “to” integer station parameters, producing a KML file of the shortest from-to route.
  3. A single JQuery Mobile page that accepts “from” and “to” slider values, to pass the controller’s KML result stream to the Google Map API. It adds the KML as a layer, onto an embedded Google Map displayed on the single page. A reset button clears the route, unless you want to stack a series of KML layers.

See Figure 4 for an iPhone 4 screen capture. Try it now at http://myjavaneo4j.herokuapp.com/. I deployed its war file to a $0.00/hr Heroku dyno that idles after a period of disuse. You may suffer a one-minute wakeup response time at first. It’s snappy thereafter, until the site falls into disuse for an hour or more.

Figure 4: Mobile version of app embeds Google Maps

You can exercise, step 2, the controller servlet, here. Try other integer pairs from the range 1 … 133752. It will download a KML file to you. Double-click the file to invoke a Google Earth route display. I tested this part alone before I wrote the single HTML page.

You may clone the project from GitHub here. Carefully read the README.md file. Your clone will not draw a route without blessing it with your own Google API key and the URL of your own public deployment.

Afterword

Bizarro World data is outdated and incomplete, making it bizarre. US stations are not really named by integers. Real railroad data includes track ownership, junctions between rail company’s tracks, and industry standard station identifiers.  A production cost function would have access to extra link properties besides distance. I wanted this demo to be freely public. Association of American Railroads data from its Railinc subsidiary isn’t free. Neo4j could accommodate that real data.

I’ve barely touched Neo4j. It did the job without much learning by me. Of course Google brings impressive mapping to the table, but I needed a dynamic shortest path KML layer also.  Neo4j has a remote service REST API, but I used an embedded database. It has several graph traversal languages available, but I needed only the built-in Lucene index. I used it to locate “from” and “to” nodes to feed to the A* algorithm. The main use of API was to load and index the raw data. Even so, I used a non-transactional bulk insert to accelerate the insertion. At run-time I wrapped my meager API calls in a requisite transaction.

I’ve only exercised the CR of CRUD. I’ve certainly not tried to scale this demo.  This sounds negative, but I’m excited. I have a graph data store available in my architectural toolbox. It would be interesting to create an ACL authorization framework with Neo4j, or even relent, and dip a toe into modeling social networks. And did you ever wonder how Amazon seems to know exactly what to sell to you?

— Lou Mauget, [email protected]


References

0 0 votes
Article Rating
Subscribe
Notify of
guest

11 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments