Mapping Shortest Routes Using a Graph Database

by on January 28, 2013 12:00 am

We often model interconnected data by cramming it in and out of table structures. Why don’t we simply model interconnected data as … interconnected data?

I recently wrote that there are several kinds of NoSQL database stores: key-value, column family, document-oriented, and graph database.  This article targets Neo4j, a Java-based graph DBMS. The open-ended problem domain of graph databases includes access control lists (ACLs), social networks, recommendation systems, and geo-spatial modeling.

Failure

In 2007, a colleague and I used Java with Oracle 9i to implement Dijkstra’s Algorithm. Our “MapQuest for Trains” application would route a rail train over various right-of-ways while minimizing cost. The cost was a function of distance, fuel surcharge, and obstacles. The task to route a train from Los Angeles to Chicago had a grotesquely long response time. Nobody wanted their applications deployed on our nodes because we spiked the servers!

Where did we go wrong? The USA rail map is a living instance of graph theory. We modeled an extensive network of nodes (stations) and connecting links (tracks) as node tables joined through link tables. Navigation through this model involved n-level joins. An RDMS hits the wall as it proceeds through 4th and 5th level joins. See the Neo4j in Action book in the references for benchmarks of a graph DBMS Neo4j against RDBMS MySQL. We tried sidestepping this deep join problem by working with the network in memory for the life of the application instance. The North American rail system is quite large. The application thrashed.

Graph Data Store

Fast forward to summer, 2012.  I stumbled onto Neo4j, a graph database. The literature described suitability of Neo4j for modeling social networks. It could rapidly answer queries such as, “How many of Judy’s friends, their friends, and their friends’ friends have read Little Women?” I could use Neo4j to create my own Facebook or Twitter. Think of a social network as an instance of graph theory. I wasn’t interested, but how about a do-over at routing trains through the graph that comprises the USA rail network?

Raw Data

I wanted to publish my work with no strings attached. I found an open 1996 National Transportation Atlas Database ZIP file of rail data. I found a flat node file and a link file within. Each record of the node file had an integer ID plus a pair of fields for latitude and longitude. Each record of the link file had a distance field along with the node IDs for the node pair it connected. Perfect!

Bulk Load

This data could directly map to a Neo4j database. See Figure 1. I conceived Bizarro World, where railroad stations would have graph nodes with integer “names.” Hey, it’s only a demo! Each station node would have a latitude property and a longitude property. A station-to-station connecting track becomes a graph link having a single node-to-node distance property. Any number of links could fan out from a given station node.

I wrote a command-line Eclipse Maven project to bulk-insert a file of nodes, index them, and insert the corresponding file of links – all into an embedded Neo4j database. Run time was 20 minutes on my MacBook Air! I discovered it has a fan! Neo4j is ACID transactional. I discovered a non-transactional bulk insert API that speeded the insertion by an order-of-magnitude.

Figure 1: Neo4j models graph theory structure directly

Google Earth

My goal was to implement a Dijkstra algorithm, or the faster A* algorithm to produce a shortest distance node-by-node path between any two of my integer-named rail stations.  See Figure 2 for the notion of a shortest or least-cost path between N1 and N7.

Figure 2: Shortest path from N1 to N7

Each node would contain the node’s latitude / longitude properties. I could walk the path to emit Google KML – meaning Keyhole Markup Language – not related to Keyhole Software. Google Earth will render a KML layer atop a map. The visual effect would be striking.

At this point I received a gift from Neo4j. The heart of my code is able to pass the buck to Neo4j to do the heavy lifting. See the gist, or a subset below:

Transaction tx = graphDb.beginTx();
try {
    Expander relExpander = Traversal.expanderForTypes(
                DomainConstants.RelTypes.DOMAIN_LINK, Direction.BOTH);

    relExpander.add(DomainConstants.RelTypes.DOMAIN_LINK, Direction.BOTH);

    PathFinder<WeightedPath> shortestPath = GraphAlgoFactory.aStar(relExpander,
                costEval, estimateEval);

    ps.println(KMLConstants.KML_LINE_START);
    emitCoordinate(ps, shortestPath, nodeA, nodeB);
    ps.println(KMLConstants.KML_LINE_END);

    tx.success();

} finally {
        tx.finish();
}

The built-in A* algorithm picks a shortest route through the maze of nodes – ‘stations’ – to return an iterator, shortestPath, over those latitude / longitude points that form a shortest path from beginning to end.  I ran that iterator to create a KML stream of points suitable for display on Google Maps or Google Earth. Each node is connected to another by a link that has a distance property.

Additionally, I had to supply a “cost” callback function. The A* algorithm multiplies each examined link cost property – miles – by this functional value. My surrealistic Bizarro World implementation simply returns the value one. In a realistic railroad world, this function could factor in variables such as fuel cost, track usage charges, train type restrictions, or even regional embargos against travel, caused by events such as a Hurricane Sandy obliterating tracks. A cost factor for the latter route would be a huge number.

The algorithm produces “a” shortest route. There could be ties in total cost or distance.  Presumably, a smarter cost function would impose enough variance to eliminate those rare ties.

The user passes a “to” and “from” station integer on the command line.  The application quickly produces the route as a KML stream sent to Google Earth in a per-OS-dependent manner. The visual effect is a smooth zoom into a display, such as that of Figure 3. The zoom actually takes longer than its triggering route calculation. One can zoom down farther, usually finding satellite shots of trains in the cities.

Figure 3: Command-based route displayed on Google Earth

You can clone the project from GitHub here. Import it as a Maven project to Eclipse 3.7 Juno. Compiler level is Java 1.6. Use the REAME.md as your guide.

Mobile Web

That was easy, but I prefer web apps, especially mobile web apps. How about a KML layer atop Google Maps?  I created another Maven project in Eclipse composed of the following function points:

  1. An initialization servlet that copies the Neo4j rail graph database from the class path to a writeable directory – so that Neo4j can write a lock value.
  2. A controller servlet that expects “from” and “to” integer station parameters, producing a KML file of the shortest from-to route.
  3. A single JQuery Mobile page that accepts “from” and “to” slider values, to pass the controller’s KML result stream to the Google Map API. It adds the KML as a layer, onto an embedded Google Map displayed on the single page. A reset button clears the route, unless you want to stack a series of KML layers.

See Figure 4 for an iPhone 4 screen capture. Try it now at http://myjavaneo4j.herokuapp.com/. I deployed its war file to a $0.00/hr Heroku dyno that idles after a period of disuse. You may suffer a one-minute wakeup response time at first. It’s snappy thereafter, until the site falls into disuse for an hour or more.

Figure 4: Mobile version of app embeds Google Maps

You can exercise, step 2, the controller servlet, here. Try other integer pairs from the range 1 … 133752. It will download a KML file to you. Double-click the file to invoke a Google Earth route display. I tested this part alone before I wrote the single HTML page.

You may clone the project from GitHub here. Carefully read the README.md file. Your clone will not draw a route without blessing it with your own Google API key and the URL of your own public deployment.

Afterword

Bizarro World data is outdated and incomplete, making it bizarre. US stations are not really named by integers. Real railroad data includes track ownership, junctions between rail company’s tracks, and industry standard station identifiers.  A production cost function would have access to extra link properties besides distance. I wanted this demo to be freely public. Association of American Railroads data from its Railinc subsidiary isn’t free. Neo4j could accommodate that real data.

I’ve barely touched Neo4j. It did the job without much learning by me. Of course Google brings impressive mapping to the table, but I needed a dynamic shortest path KML layer also.  Neo4j has a remote service REST API, but I used an embedded database. It has several graph traversal languages available, but I needed only the built-in Lucene index. I used it to locate “from” and “to” nodes to feed to the A* algorithm. The main use of API was to load and index the raw data. Even so, I used a non-transactional bulk insert to accelerate the insertion. At run-time I wrapped my meager API calls in a requisite transaction.

I’ve only exercised the CR of CRUD. I’ve certainly not tried to scale this demo.  This sounds negative, but I’m excited. I have a graph data store available in my architectural toolbox. It would be interesting to create an ACL authorization framework with Neo4j, or even relent, and dip a toe into modeling social networks. And did you ever wonder how Amazon seems to know exactly what to sell to you?

– Lou Mauget, asktheteam@keyholesoftware.com


References

  • Share:

11 Responses to “Mapping Shortest Routes Using a Graph Database”

  1. John McGinn says:

    Dude that is pretty awesome :)

  2. Michael Silverstein says:

    Fascinating stuff – Nice job!

  3. chiru says:

    nice post great algorithm included for graph database..
    Thanks for posting..

  4. Lars Bendixen says:

    Hi guys,

    Great posting, really interesting subject.

    I have imported this into Eclipse and have all the dependencies in place. However, in the “RouterService” class this method:

    public void findShortestPath(PrintStream ps, long startNode, long endNode) {
    try {
    RouterService router = Initializer.getApplicationContext().getBean(RouterService.class);
    router.emitShortestPathKML(ps, startNode, endNode);
    } catch (Exception e) {
    e.printStackTrace();
    }
    }

    Is giving me problems.

    Namely

    Initializer.getApplicationContext().getBean(RouterService.class)

    Here is the error message:

    “The method getBean(String) in the type BeanFactory is not applicable for the arguments (Class)”

    I am not sure how to resolve this. Any ideas? I’d appreciate it.

    Thanks,

    Lars

    • Lou Mauget says:

      Lars,

      That malfunctioning statement is redundant. I missed removing it after refactoring the logic. The RouterService is already instantiated or the failing method could not be called. Spring failed doing a getBean, but it’s a moot point. In the following snippet I’ve included “this” in place of “router” to emphasize that flow is within the instance that contains method emitShortestPath. I’ll push an update to GitHub if you can verify that all operates for you after this change. Note that the first access is slow because it loads the database.

      public void findShortestPath(PrintStream ps, long startNode, long endNode) {
      try {
      this.emitShortestPathKML(ps, startNode, endNode);
      } catch (Exception e) {
      e.printStackTrace();
      }
      }

      Thanks for your interest, Lars.

      Lou Mauget

  5. [...] like geometry and maps [http://keyholesoftware.com/2013/01/28/mapping-shortest-routes-using-a-graph-database/]. I mentioned that MongoDB supports 2d geospatial indexes. Each entry can associate a document with [...]

  6. Great post. Are you on Twitter?

Leave a Reply

Things Twitter is Talking About
  • Thank your #Sysadmin - today is System Administrator Appreciation Day. http://t.co/LcvDNa9kPg
    July 25, 2014 at 8:05 AM
  • @rickincanada Thx for your tweet! Shoot us an email at asktheteam@keyholesoftware.com so we can set up a time to talk. Have a good day.
    July 24, 2014 at 3:33 PM
  • Never used JAXB? Check out a simple usage pattern that pairs #JAXB’s data binding capabilities with JPA - http://t.co/Ki9G04HV5e
    July 24, 2014 at 9:53 AM
  • Guess what today is? Tell An Old Joke Day - http://t.co/835ORWMX6N! So, why do programmers always confuse Halloween & Xmas? 31 Oct = 25 Dec
    July 24, 2014 at 8:45 AM
  • MT @midwestio: Posted another #midwestio talk recording to our YouTube channel: @MinaMarkham on modular CSS. Watch: http://t.co/aU3LpfUoi4
    July 24, 2014 at 8:25 AM
  • We just posted pictures from our National Hot Dog Day Lunch Cookout. Check them out - http://t.co/To06plaw1C
    July 23, 2014 at 4:14 PM
  • Good free cheat sheet - #Java Performance Optimization Refcard from @DZone: http://t.co/7vBgsmqy08
    July 23, 2014 at 10:48 AM
  • Did you know today is a holiday? It's National Hot Dog Day! We're gearing up for our team lunch hot dog cookout & can't wait to celebrate.
    July 23, 2014 at 9:43 AM
  • Check out our newest blog: #JAXB – A Newcomer’s Perspective, Part 1 http://t.co/Ki9G04HV5e
    July 22, 2014 at 1:22 PM
  • New post on the Keyhole blog by Mark Adelsberger: #JAXB – A Newcomer’s Perspective, Part 1 http://t.co/Ki9G04HV5e
    July 21, 2014 at 2:27 PM
  • If you're a Java dev, you're likely familiar with Annotations. But have you created your own #Java Annotations? Ex - http://t.co/BgCsYjxZKF
    July 18, 2014 at 12:10 PM
  • RT @gamasutra: Don't Miss: Unconventional Tips for Improving your Programming Skills http://t.co/6TFox7CKHU
    July 16, 2014 at 3:20 PM
  • We're about to send out our free monthly tech newsletter. Dev tips/articles via email. Not on the list yet? Sign up - http://t.co/F8h0NSiicZ
    July 15, 2014 at 11:57 AM
  • Have you ever tried creating your own #Java annotations? See a situation where it was beneficial - http://t.co/BgCsYjxZKF
    July 15, 2014 at 8:36 AM
  • There's a new post on the Keyhole blog by @jhackett01: Creating Your Own #Java Annotations - http://t.co/BgCsYjxZKF
    July 14, 2014 at 1:43 PM
  • We love development! Have you seen our weekly team blog? We show how to be successful with the tech we use. See it - http://t.co/nlRtb1XNQH
    July 12, 2014 at 2:35 PM
  • Rapid appdev has a bad rep, but there are ways to bring development time down the right way. Don't Fear the Rapid - http://t.co/aTPcAKOj0r
    July 11, 2014 at 3:10 PM
  • Automated Testing is great for dev, but does bring a set of challenges (especially for #agile teams). Success tips: http://t.co/1acl1ngO7i
    July 11, 2014 at 12:16 PM
  • This is fantastic - One small step for Google, one giant leap for empowering girls to code: http://t.co/R90V5DBkv1
    July 10, 2014 at 2:52 PM
  • #RabbitMQ: messaging software built on AMQP protocol. Learn relevant concepts & how to avoid common "gotchas" here: http://t.co/ZwMXlhKyX8
    July 10, 2014 at 9:31 AM
Keyhole Software
8900 State Line Road, Suite 455
Leawood, KS 66206
ph: 877-521-7769
© 2014 Keyhole Software, LLC. All rights reserved.