Mapping Shortest Routes Using a Graph Database

by on January 28, 2013 12:00 am

We often model interconnected data by cramming it in and out of table structures. Why don’t we simply model interconnected data as … interconnected data?

I recently wrote that there are several kinds of NoSQL database stores: key-value, column family, document-oriented, and graph database.  This article targets Neo4j, a Java-based graph DBMS. The open-ended problem domain of graph databases includes access control lists (ACLs), social networks, recommendation systems, and geo-spatial modeling.


In 2007, a colleague and I used Java with Oracle 9i to implement Dijkstra’s Algorithm. Our “MapQuest for Trains” application would route a rail train over various right-of-ways while minimizing cost. The cost was a function of distance, fuel surcharge, and obstacles. The task to route a train from Los Angeles to Chicago had a grotesquely long response time. Nobody wanted their applications deployed on our nodes because we spiked the servers!

Where did we go wrong? The USA rail map is a living instance of graph theory. We modeled an extensive network of nodes (stations) and connecting links (tracks) as node tables joined through link tables. Navigation through this model involved n-level joins. An RDMS hits the wall as it proceeds through 4th and 5th level joins. See the Neo4j in Action book in the references for benchmarks of a graph DBMS Neo4j against RDBMS MySQL. We tried sidestepping this deep join problem by working with the network in memory for the life of the application instance. The North American rail system is quite large. The application thrashed.

Graph Data Store

Fast forward to summer, 2012.  I stumbled onto Neo4j, a graph database. The literature described suitability of Neo4j for modeling social networks. It could rapidly answer queries such as, “How many of Judy’s friends, their friends, and their friends’ friends have read Little Women?” I could use Neo4j to create my own Facebook or Twitter. Think of a social network as an instance of graph theory. I wasn’t interested, but how about a do-over at routing trains through the graph that comprises the USA rail network?

Raw Data

I wanted to publish my work with no strings attached. I found an open 1996 National Transportation Atlas Database ZIP file of rail data. I found a flat node file and a link file within. Each record of the node file had an integer ID plus a pair of fields for latitude and longitude. Each record of the link file had a distance field along with the node IDs for the node pair it connected. Perfect!

Bulk Load

This data could directly map to a Neo4j database. See Figure 1. I conceived Bizarro World, where railroad stations would have graph nodes with integer “names.” Hey, it’s only a demo! Each station node would have a latitude property and a longitude property. A station-to-station connecting track becomes a graph link having a single node-to-node distance property. Any number of links could fan out from a given station node.

I wrote a command-line Eclipse Maven project to bulk-insert a file of nodes, index them, and insert the corresponding file of links – all into an embedded Neo4j database. Run time was 20 minutes on my MacBook Air! I discovered it has a fan! Neo4j is ACID transactional. I discovered a non-transactional bulk insert API that speeded the insertion by an order-of-magnitude.

Figure 1: Neo4j models graph theory structure directly

Google Earth

My goal was to implement a Dijkstra algorithm, or the faster A* algorithm to produce a shortest distance node-by-node path between any two of my integer-named rail stations.  See Figure 2 for the notion of a shortest or least-cost path between N1 and N7.

Figure 2: Shortest path from N1 to N7

Each node would contain the node’s latitude / longitude properties. I could walk the path to emit Google KML – meaning Keyhole Markup Language – not related to Keyhole Software. Google Earth will render a KML layer atop a map. The visual effect would be striking.

At this point I received a gift from Neo4j. The heart of my code is able to pass the buck to Neo4j to do the heavy lifting. See the gist, or a subset below:

Transaction tx = graphDb.beginTx();
try {
    Expander relExpander = Traversal.expanderForTypes(
                DomainConstants.RelTypes.DOMAIN_LINK, Direction.BOTH);

    relExpander.add(DomainConstants.RelTypes.DOMAIN_LINK, Direction.BOTH);

    PathFinder<WeightedPath> shortestPath = GraphAlgoFactory.aStar(relExpander,
                costEval, estimateEval);

    emitCoordinate(ps, shortestPath, nodeA, nodeB);


} finally {

The built-in A* algorithm picks a shortest route through the maze of nodes – ‘stations’ – to return an iterator, shortestPath, over those latitude / longitude points that form a shortest path from beginning to end.  I ran that iterator to create a KML stream of points suitable for display on Google Maps or Google Earth. Each node is connected to another by a link that has a distance property.

Additionally, I had to supply a “cost” callback function. The A* algorithm multiplies each examined link cost property – miles – by this functional value. My surrealistic Bizarro World implementation simply returns the value one. In a realistic railroad world, this function could factor in variables such as fuel cost, track usage charges, train type restrictions, or even regional embargos against travel, caused by events such as a Hurricane Sandy obliterating tracks. A cost factor for the latter route would be a huge number.

The algorithm produces “a” shortest route. There could be ties in total cost or distance.  Presumably, a smarter cost function would impose enough variance to eliminate those rare ties.

The user passes a “to” and “from” station integer on the command line.  The application quickly produces the route as a KML stream sent to Google Earth in a per-OS-dependent manner. The visual effect is a smooth zoom into a display, such as that of Figure 3. The zoom actually takes longer than its triggering route calculation. One can zoom down farther, usually finding satellite shots of trains in the cities.

Figure 3: Command-based route displayed on Google Earth

You can clone the project from GitHub here. Import it as a Maven project to Eclipse 3.7 Juno. Compiler level is Java 1.6. Use the as your guide.

Mobile Web

That was easy, but I prefer web apps, especially mobile web apps. How about a KML layer atop Google Maps?  I created another Maven project in Eclipse composed of the following function points:

  1. An initialization servlet that copies the Neo4j rail graph database from the class path to a writeable directory – so that Neo4j can write a lock value.
  2. A controller servlet that expects “from” and “to” integer station parameters, producing a KML file of the shortest from-to route.
  3. A single JQuery Mobile page that accepts “from” and “to” slider values, to pass the controller’s KML result stream to the Google Map API. It adds the KML as a layer, onto an embedded Google Map displayed on the single page. A reset button clears the route, unless you want to stack a series of KML layers.

See Figure 4 for an iPhone 4 screen capture. Try it now at I deployed its war file to a $0.00/hr Heroku dyno that idles after a period of disuse. You may suffer a one-minute wakeup response time at first. It’s snappy thereafter, until the site falls into disuse for an hour or more.

Figure 4: Mobile version of app embeds Google Maps

You can exercise, step 2, the controller servlet, here. Try other integer pairs from the range 1 … 133752. It will download a KML file to you. Double-click the file to invoke a Google Earth route display. I tested this part alone before I wrote the single HTML page.

You may clone the project from GitHub here. Carefully read the file. Your clone will not draw a route without blessing it with your own Google API key and the URL of your own public deployment.


Bizarro World data is outdated and incomplete, making it bizarre. US stations are not really named by integers. Real railroad data includes track ownership, junctions between rail company’s tracks, and industry standard station identifiers.  A production cost function would have access to extra link properties besides distance. I wanted this demo to be freely public. Association of American Railroads data from its Railinc subsidiary isn’t free. Neo4j could accommodate that real data.

I’ve barely touched Neo4j. It did the job without much learning by me. Of course Google brings impressive mapping to the table, but I needed a dynamic shortest path KML layer also.  Neo4j has a remote service REST API, but I used an embedded database. It has several graph traversal languages available, but I needed only the built-in Lucene index. I used it to locate “from” and “to” nodes to feed to the A* algorithm. The main use of API was to load and index the raw data. Even so, I used a non-transactional bulk insert to accelerate the insertion. At run-time I wrapped my meager API calls in a requisite transaction.

I’ve only exercised the CR of CRUD. I’ve certainly not tried to scale this demo.  This sounds negative, but I’m excited. I have a graph data store available in my architectural toolbox. It would be interesting to create an ACL authorization framework with Neo4j, or even relent, and dip a toe into modeling social networks. And did you ever wonder how Amazon seems to know exactly what to sell to you?

— Lou Mauget,


  • Share:

11 Responses to “Mapping Shortest Routes Using a Graph Database”

  1. John McGinn says:

    Dude that is pretty awesome :)

  2. Michael Silverstein says:

    Fascinating stuff – Nice job!

  3. chiru says:

    nice post great algorithm included for graph database..
    Thanks for posting..

  4. Lars Bendixen says:

    Hi guys,

    Great posting, really interesting subject.

    I have imported this into Eclipse and have all the dependencies in place. However, in the “RouterService” class this method:

    public void findShortestPath(PrintStream ps, long startNode, long endNode) {
    try {
    RouterService router = Initializer.getApplicationContext().getBean(RouterService.class);
    router.emitShortestPathKML(ps, startNode, endNode);
    } catch (Exception e) {

    Is giving me problems.



    Here is the error message:

    “The method getBean(String) in the type BeanFactory is not applicable for the arguments (Class)”

    I am not sure how to resolve this. Any ideas? I’d appreciate it.



    • Lou Mauget says:


      That malfunctioning statement is redundant. I missed removing it after refactoring the logic. The RouterService is already instantiated or the failing method could not be called. Spring failed doing a getBean, but it’s a moot point. In the following snippet I’ve included “this” in place of “router” to emphasize that flow is within the instance that contains method emitShortestPath. I’ll push an update to GitHub if you can verify that all operates for you after this change. Note that the first access is slow because it loads the database.

      public void findShortestPath(PrintStream ps, long startNode, long endNode) {
      try {
      this.emitShortestPathKML(ps, startNode, endNode);
      } catch (Exception e) {

      Thanks for your interest, Lars.

      Lou Mauget

  5. […] like geometry and maps []. I mentioned that MongoDB supports 2d geospatial indexes. Each entry can associate a document with […]

  6. Great post. Are you on Twitter?

Leave a Reply

Things Twitter is Talking About
  • New to #JavaScript prototypal inheritance? Here are some notes to help you along the way -
    July 5, 2015 at 7:55 AM
  • ICYMI: we've released a demo version of #GrokOla which is open to the public. Try out its features & capabilities -
    July 4, 2015 at 3:05 PM
  • Happy 4th of July from the Keyhole team! We hope that you have a happy and safe holiday with your family and friends.
    July 4, 2015 at 9:55 AM
  • Let's talk testing. Here are common challenges #Agile teams face when writing automated tests & how to overcome them:
    July 3, 2015 at 11:06 AM
  • #GrokOla users get free educational tutorials. But lucky you, we've released some to the public. #JavaScript primer -
    July 3, 2015 at 10:55 AM
  • Being able to isolate debugging techniques can help make you a better debugger. Here's Time-Oriented #Debugging
    July 2, 2015 at 10:50 AM
  • RT @zachagardner: @zachagardner has declared it is @ChipotleTweets day at @KeyholeSoftware . You have been warned 🐓🐖🐄
    July 2, 2015 at 10:09 AM
  • Current state of random number generation & the differences in how #Java & #JavaScript approach it - #security
    July 1, 2015 at 2:45 PM
  • Woohoo - 600 followers! Thanks, everyone. We'd love to ask you - what type of tweets / dev content would you like to see more of from us?
    July 1, 2015 at 10:38 AM
  • We would like to welcome Dallas Monson to the team today! Dallas is a Senior Architect focused on UI/UX and #JavaScript. Welcome, Dallas!
    July 1, 2015 at 8:35 AM
  • Good introduction to TypeScript - Plus, how to approach modularization in #TypeScript -
    June 30, 2015 at 3:25 PM
  • .@mrbristopher just delivered a new S911 Night Drone to James Hayes, winner of our #kcdc15 giveaway! Congrats, James!
    June 30, 2015 at 11:35 AM
  • It feels like primitives could have been left out of the initial implementation of #Java. See why -
    June 29, 2015 at 4:05 PM
  • Developers in a bounce house! I repeat, developers in a bounce house! We had a blast at our 1st company picnic. Pics:
    June 29, 2015 at 1:40 PM
  • New #SpringBatch tutorial from @jhackett01: Spring Batch – Replacing XML Job Configuration With JavaConfig #java
    June 29, 2015 at 11:46 AM
  • We had such a fun time at the Keyhole company picnic! Pictures to come, including some of our developers in the bounce house. #loveourteam
    June 29, 2015 at 8:41 AM
  • In #JavaScript, how do we harness the power of callbacks without the confusing mess of nested functions? Promises -
    June 29, 2015 at 8:40 AM
  • .@zachagardner We are so happy that your family attended! This will definitely need to be repeated every year!
    June 28, 2015 at 8:14 PM
  • Thank you to all on the Keyhole team who came to our first inaugural company picnic! Wonderful food, family and bounce house fun!
    June 28, 2015 at 7:50 PM
  • Debugging is a challenging part of being a programmer. We have a tutorial series to help, with a #JavaScript focus -
    June 27, 2015 at 1:45 PM