Picking A Graph Database: ArangoDB, Neo4j, or OrientDB

John Hoestje Databases, Opinion, Problem Solving, Programming Leave a Comment

TL;DR

  • Spoiler alert! Graph databases are a great option for storing complex and highly connected data.
  • In this post, I compare the benefits and risks of graph databases ArangoDB, Neo4j, and OrientDB for a client project.
  • Due to the combination of performance and cost, I chose ArangoDB for my client’s needs.

Our Data Needs In A Nutshell

A recent project for a client required storing highly connected data and supporting queries to mine the complex data relationships. While this was an internal project, performance was still needed. The client’s approved database base technology still surrounded relational databases.

While there is an industry need for relational databases, using a relational database for highly connected and complex data is problematic. The industry has solved this data relationship storage issue by adopting polyglot database environments. Companies select database technologies to store data and relationships based on how IT needs it and the strengths of the database technology.

Companies are increasingly storing data for relationship values. The business value placed on data relationships has greatly increased and also strained many former data design strategies.

Using a relational database, or other previous generation technologies, to store highly connected and complex data leads to poor performance, high change and maintenance costs, and excessively complex queries. Current generation database technologies like NoSQL are designed around the idea of storing complex data relationships.

Since the client’s project was an internal project and starting a new space, data definitions were not well known and could drastically change over time. The first release would primarily be a release to start collecting data. The data model needs to be flexible and easily accept new properties and relationships over time without requiring the modification of older data.

While there are many database implementations that fall under the NoSQL database umbrella, I ended up focusing on the Graph database subset. I compared several NoSQL technologies and decided on a graph database implementation based on the project’s data needs.

Requirements

Some of the graph database strengths the project needed were:

  • Storing complex relationships
  • Querying relationships based on highly connected data
  • High performance and low cost of change

Adopting a graph database technology for use cases fitting this profile will reduce complexity and overhead compared to using a relational database. The capabilities of a graph database will provide (at a minimum) the following benefits:

  • Support a query language designed for traversing data relationships
  • A data persistence structure specializing in natively storing relationships
  • Lowering maintenance costs for rapidly changing data structures, which lowers total cost of ownership
  • Accommodating semi-structured data, which increases flexibility
See Also:  .NET Memory Management with dotMemory

With the high-level requirements, I needed to filter the graph candidates down to just a few. Some additional non-functional requirements that made the filtering easier were:

  • Quick integration with Spring
  • Flexible licensing
  • Cost
  • A low learning curve with the query language and database runtime

Based on Graph database online research with the evaluation criteria, I filtered the results to compare ArangoDB, Neo4j, and OrientDB. The benefits and risks of each implementation are below.

The Candidates

ArangoDB

Graph databases have emerged as a best of breed for storing highly connected data, therefore reducing the cost for data structure changes, and handling semi-structured data. Large projects like Twitter and Facebook use Graph databases to handle their highly connected datasets.

The industry trend is to now house multiple data store types to solve complex projects. Each data store type has its own strengths and weaknesses. Multi data store companies can choose a data store based on its strengths matching the need instead of forcing all needs onto one solution.

A graph data store is a type of NoSQL or non-relational data store.

Benefits

  • Data relationships are natively stored as first-class citizens as a graph edge
  • Graph edges can also store data values similar to data Nodes
  • Optimized to ask questions based on the relationships
    • O(1) performance or constant performance
  • Flexibility for evolving data models
  • Can add new relationships and data properties without disturbing existing queries
  • Uses business rule asserts for data governance instead of enforcement by schema
  • The data model does not need to be translated
    • The graph fits naturally with Object-Oriented data models
  • Open-source with available commercial support
    • Apache 2 licensed
  • Multi-Model store
    • Key-Value, Document, and Graph database models available in a single instance
  • Drivers for Java, JavaScript, .NET, and other popular development languages
  • Role-based access control

Risks

  • Proprietary query language
    • No agreed-upon query standards make switching graph data stores more difficult
  • Write scaling depending on the graph implementation
  • Graph databases are not a good fit for data versioning

Neo4j

The Benefits and Risks are similar to ArangoDB except where listed below:

See Also:  The Jury is Still Out: Blockchain in Healthcare

Benefits

  • Robust graph visualization tools
  • Less Java infrastructure code needed for implementation than ArangoDB
    • Integrates very well with Spring
  • Neo4j was the first graph database implementation and is, therefore, a more well-known name
  • Graph edges can store labels

Risks

  • Commercial license only for closed-source applications
  • Single model store
    • Graph model store only
  • One graph per Neo4j instance

OrientDB

The Benefits and Risks are similar to ArangoDB except where listed below:

Benefits

  • SQL-like query language
  • Options for different levels of schema enforcement
    • Schema-less, schema-full, or mixed
  • Supports Object-Oriented concept of class inheritance
    • Attributes are inherited from one or more parent classes

Risks

  • Minimal documentation and Java examples
    • Examples are for Document model types and not for Graph models
  • Class inheritance can add unneeded complexity
    • While this is a beneficial feature, it could be misused and add unneeded complexity to the database model
    • An entity must inherit from an OrientDB reserved class before the entity is traversable on a graph as a Vertex or Edge

Our Recommendation

After creating a basic proof-of-concept and weighing the benefits and risks for each database, I chose to use ArangoDB as the project’s database. ArangoDB was chosen for its combination of features, performance, and cost. ArangoDB receives regular updates and is available with an Apache 2.0 license. It has integration with Spring Data and provides drivers for multiple development languages.

Some key features that differentiated ArangoDB from the others are:

  • Performance with highly connected datasets
  • An Apache 2 open source license option with commercial support options
  • Support for storing data on Edges
  • Support for multi-models on the same instance
  • Amount of available documentation and examples

I hope that this provides some insight as to why I chose ArangoDB and helps you to determine which graph database works best for you. I would certainly encourage you to give ArangoDB a look if you’re seeking a graph database.

What Do You Think?