Is NoSQL The SQL Sequel?

by on October 1, 2012 12:00 am

“Can’t we all just get along?” 

I assert that the explosion of so-called NoSQL database management systems (DBMS) is not displacing the well-known relational DBMS (RDBMS) that we love and admire. There is room for each, sometimes within one application. Why? Visits by three spirits could enlighten us …

1. Spirit of DBMS Past

The DBMS was invented before the mid-1960s. Those early DBMS had no SQL. They were not even relational. Oddly, they are not considered as NoSQL.

In 1970, E.F. Codd invented the RDBMS at IBM. That company was slow to adopt the technology, probably due to investment in its IMS hierarchical DBMS product. IMS contained no SQL. IBM invented a proprietary query language, SEQUEL, in the late seventies. In 1980, Larry Ellison launched a clean-room rendition of SEQUEL, named SQL. It was used with his RDBMS, Oracle Database. IBM eventually released DB2. The ACID RDBMS was off to the races.

2. Spirit of DBMS Present

Today’s RDBMS technology contains tons of sand. I’ve seen the following: “The number of data items in relational databases matches the number grains of sand on all the beaches.” The point is that it is unimaginable to migrate all RDBMS data to another home. 

RDBMS databases are often integration databases for multiple applications. Such databases have rigid schemas. Schema governance can make individual application change painful. The recent decade saw a mitigating move to “data as a service” web service integration points. Here, each application may conform to its unique application data transfer schema, avoiding conformance to the central database schema. That gives wiggle-room for easier service client application change. Services don’t solve all schema dependency issues. One application may require a fast query index on a column, where another application may need fast insertions, inhibited by that very index.

Scaling RDBMS

The 21st century has spawned newcomers such as Amazon and Google. “Cloud,” no longer means “vapor.” The buzz phrases “big data,” “relaxed consistency,” and “eventual consistency” are in our faces.

Argh! Here be demons! RDBMS technology hits the wall under big data conditions. I’m talking BIG data. No, I mean BIG DATA – entire beaches of sand processed daily. Scaling out implies nodes and clusters. Alas, clusters of RDBMS nodes must share a single disk space to fulfill instantaneous ACID requirements. Even a failure-resistant RAID is a geographic single-point-of failure.

What about master-slave remote node replication? That means relaxing instantaneous slave consistency across a replication event. Many RDBMS shops have long used multi-version concurrency control (MVCC) for update performance. They assume a given item will not be updated often. This buys a chance at blazing read performance across large sets of data. That kind of “C” in ACID also implies detecting conflict caused by a stale read for an update … eventually.

Consistency through MVCC opens a door toward acceptance of the world “eventual.” There are more mundane demons in scaling a relational database. Many RDBMS products are licensed per-server or per-processor. Do the math. Another scaling tactic is to insert queue-driven workers that combine multiple requests into single requests to the database. This resilient approach works until it, too, cannot keep up with growth. The next technique is to shard keys across nodes, but sharding needs participation from the application. Additionally, when one shard node for a range of key values invariably goes belly up, that portion of the data becomes unavailable.

These are RDBMS Band-Aid scaling techniques. Fault-tolerance, tolerance of human error, and maintenance complexity eventually overrun realistic limits. Additionally, we have the well-known impedance mismatch between tables and objects. ORMs manage this to a degree, but they introduce an additional mapping layer while often not performing efficiently in high throughput conditions. The spirit paints a bleak picture of the huge data RDBMS, but there’s a lot of life left in that old dog. Let’s look at NoSQL before we try to put the RDBMS on the shelf.

NoSQL

New database systems emerged during this millennium. Startups run by bright young helmsmen have sprung to life. Here, open source is not as scary as it is to legal departments of some entrenched enterprises. Many startups must deal with big data, or new computational problems having unique data requirements. These applications need application data, not integration data.

Who knew the term, NoSQL, prior to 2009? It seems to correctly mean a “non-relational DBMS.” Some say NoSQL means “Not only SQL.” That smacks of acronym damage control. Some NoSQL products even provide a SQL subset. Let’s grant that the tag, #NoSQL, tweets well. These new DBMS appeared after the “turn of the century,” meaning 2000, not 1900. Apparently, only software written subsequent to the birth of its users qualifies for nutty acronyms.

A NoSQL database generally stores data as an aggregate, not a set of flat tuples or rows. A write to the aggregate is atomic. Outlier NoSQL paradigms exist, such as a graph DBMS, or a column-family DBMS. A NoSQL schema for an auction would be enforced by the auction application, not by the auction’s private application database. Two applications using a single NoSQL database as an integration point need to have solid agreement on its schema. An RDBMS is aggregate-ignorant, meaning that it has no clue how its data is used by an application.

A NoSQL DBMS may use document-oriented values, as opposed to opaque values. Document-oriented NoSQL can query and update a value based on introspection of items within the document. Think of JSON or XML aggregates here. Aggregate orientation grants enough information to the DBMS to enable it to organize data items to reside together on a given node. Scaling out to clusters is the fruit of the vine of NoSQL. The ascendance of NoSQL follows the rise of young organizations that have mind-boggling huge data requirements. Sometimes eventual consistency is good enough. Scaling involves tradeoffs.

CAP

We see a prospect of relaxing consistency to get increased scaling. There’s a theorem for that: The CAP Theorem. Of the three properties of data (1) Consistency, (2) Availability, and (3) Partition tolerance, we’re limited to choosing any two.

We covered consistency. Availability means that a server must always answer a request in some fashion, in order to be deemed available. A partition in CAP is a section of the DBMS that has no communication with any other section. A network breakage within a cluster severs it into parts – partitions – that cannot communicate with one-another. Partition tolerance is the measure of the ability to survive partitioning. CAP properties are not discrete. We can trade some of one to get more of another. 

There is no time property associated with availability. Does a five-minute response time from a server mean that it is available? This is latency. We’re usually interested in the trade-off between latency and consistency. Many NoSQL implementations process distributed big data at blazing speed – low latency – at the expense of infrequent detectable failed updates.

Alternatively, some trade consistency with durability. Really? Consider a data logging application where the trend of the data is more important than logging the last few items before a server failure. Or, consider a DBMS that maintains session data in a responsive application that has a high number of simultaneous users. Updates have to be instantly consistent, but if the DBMS server crashes, the end-users simply lose sessions, to their minor annoyance, at worst. We may categorize a DBMS as to which two legs of the CAP Theorem it provides. For example “CouchDb is AP,” while “PostgreSQL is CA.”

NoSQL Flavors

There is no consistent definition for what constitutes NoSQL. It’s a wild bunch ranging from embedded systems to huge HUGE distributed systems. A general list of the characteristics of NoSQL follows:

  1. Generally open source
  2. No relational model
  3. No schema – schema in the mind of the app programmer
  4. Distributed, fault-tolerant
  5. No full ACID guarantee
  6. Atomically updates a given value

Common kinds of NoSQL DBMS:

Kind Description

Example

Key-Value A unique key identifies a value aggregate that is meaningful to the application. The DBMS does not care or understand what is inside the value. Riak, Redis, MemCached,
Document A unique key identifies a value that the DBMS can understand at some level, so as to provide a value query capability. Values are stored as JSON, XML, or another well-known structured data format enforced by the DBMS. The DBMS can query items by key or by document content. CouchDB; CouchBase; MongoDB; Lotus Notes (old, MVCC)
Column-family Stores data as columns in a column family defined at creation time. Adds new columns without upsetting the application or the DBMS. Efficient for computing an aggregate over a subset of rows, or for all values of a column. Cassandra; BigTable; HBase
Graph Data stored as nodes and links. Each node or link may have arbitrary properties attached. Useful for modeling deeply nested relationships such as networks or geographic data. Usually transactional across multiple operations. Neo4J, OrientDB (has SQL)

Advantages of NoSQL

  1. Scales out
  2. Deals with explosions of data outstripping RDMS capability
  3. Fewer expensive DBAs required
  4. Uses clusters of low-cost servers
  5. Promotes application evolution without external schema change
  6. Is cheaper or free to obtain

Disadvantages of NoSQL

  1. Lacks maturity
  2. No big-name enterprise support providers
  3. Difficult business analytics – these databases are not integration points
  4. No zero-admin to ease installation and maintenance
  5. Lower expertise – NoSQL is young
  6. The application governs its own schema

3. Spirit of DBMS to Come

Humans prefer an “all or nothing” answer to binary questions such as “Is this the end of SQL?” The answer is another question: “Why can’t architects choose from a DBMS palette that contains both RDMS and NoSQL?” This is the notion of polyglot persistence.

There are examples of an RDBMS coexisting with NoSQL within enterprise applications. A trivial example is Memcached, a NoSQL embedded in-memory key-value store often used to accelerate applications through keyed value caching of RDMS results. The majority of applications need extremely low-latency reads. Some applications also need immediate, consistent propagation of updates. Think of an inventory update after clicking “order now” or a financial transfer from savings to checking. Other applications are happy enough with hours of update propagation latency. Think of adding a user review to a movie database where that user originates in a separate social network.

The Spirit and I agree that the RDBMS has no end-of-life in sight. Certainly, it holds beaches of sand that nobody wants to relocate, but it has future relevance. We’ll continue to need lightning-fast coordinated consistent financial and e-commerce transactions. The governance imposed by the RDBMS schema and DBAs enhance the maintainability and safety of such applications. On the other hand, I want instant access to huge amounts of data, even where I know some items may be long-in-the-tooth.

For example, I expect instant, accurate routes from an Earth-load of data fronted by Google Maps, but I am tolerant of zooming into a picture of my driveway that shows a car I sold a year ago. Beyond caching, more than one kind of DBMS may persist parts of a single application that uses polyglot persistence.

Imagine that we create a genealogy site that consolidates data purchased from other services. It integrates those slowly changing sources through data-as-a-service, periodically updating its own huge data NoSQL document store. This data store supports imaginatively varied genealogy queries based on items within NoSQL documents. Those reside in a graph-oriented NoSQL DBMS. We pick OrientDB. Imagine snappy response time for concurrent users querying information about families. One user could query for an instant list of her third cousins without the costly deep join that an RDBMS would require. Everything is responsive. Everybody is happy. Our genealogy site is a for-profit business. It must recover development costs, data purchase fees, and operating expenses, while turning a profit. Its customers are a set of registered users that pay a fee. The application ties a NoSQL user registry to Facebook or Twitter, but associated fees, accounting, and financial reporting reside in an RDBMS.

That’s polyglot persistence, friends.

Final Thoughts

I expect that most people with skin in the game will not be serious NoSQL users for a while. This should not discourage developers and architects from experimenting with various NoSQL DBMS now, so as to make choices based on knowledge.

You don’t always need a download. There are clouds with free entry-level access (e.g. search for Cloudant, Heroku, Mongohq, or NuVola). The NoSQL world is changing rapidly. In most cases you’re currently better off sticking with RDBMS, unless you are dealing with big data, or if you have a case for a polyglot application.

Beware of insanity or substance abuse caused by trying to decide on a “favorite” NoSQL package. It IS a wild bunch. I have not said much about specific NoSQL solutions, nor have I mentioned MapReduce, a conceptual friend of big data. These are fodder for subsequent posts. SQL has a good prognosis, but there’s plenty of room for the emerging NoSQL wave.

 

– Louis Mauget, asktheteam@keyholesoftware.com

References

  • Share:

8 Responses to “Is NoSQL The SQL Sequel?”

  1. David says:

    Lou,

    Very informative, Thanks, David

  2. [...] of simple, scalable applications. CouchDB is a non-relational database, but if you are OK with the tradeoffs of going the NoSQL route with your application, you can do much worse than the node/CouchDB [...]

  3. [...] of simple, scalable applications. CouchDB is a non-relational database, but if you are OK with the tradeoffs of going the NoSQL route with your application, you can do much worse than the node/CouchDB [...]

  4. [...] of simple, scalable applications. CouchDB is a non-relational database, but if you are OK with the tradeoffs of going the NoSQL route with your application, you can do much worse than the node/CouchDB [...]

  5. [...] recently wrote that there are several kinds of NoSQL database stores: key-value, column family, document-oriented, [...]

  6. [...] written about NoSQL DBMS [http://keyholesoftware.com/2012/10/01/is-nosql-the-sql-sequel/]. We know that there are several categories of NoSQL DBMS. MongoDB is a scalable NoSQL [...]

  7. Cristina says:

    Everything is very open with a really clear description of the issues.
    It was really informative. Your site is very helpful.
    Many thanks for sharing!

Leave a Reply

Things Twitter is Talking About
  • Check out a quick intro to Functional Reactive Programing and #JavaScript - http://t.co/4LSt6aPJvG
    September 20, 2014 at 11:15 AM
  • In Part 2 of our series on creating your own #Java annotations, learn about processing them with the Reflection API - http://t.co/E1lr3RmjI7
    September 19, 2014 at 12:15 PM
  • The life of a Keyhole consultant - A Delicate Balance: It’s What We Do http://t.co/ToRpWY3aix Blog as true today as the day it was written.
    September 19, 2014 at 9:50 AM
  • 7 Things You Can Do to Become a Better Developer - http://t.co/llPNMUN8nQ
    September 19, 2014 at 8:43 AM
  • .@jessitron Good luck, you'll do great! Our team really enjoyed your KCDC14 talks.
    September 18, 2014 at 10:19 AM
  • RT @woodwardjd: 7 deadly sins of programming. I think I did all of this last week. #strangeloop http://t.co/f7QFq1SpqW
    September 18, 2014 at 10:03 AM
  • In Part 2 of our series on creating your own #Java annotations, learn about processing them with the Reflection API - http://t.co/E1lr3RmjI7
    September 17, 2014 at 3:18 PM
  • We send out our free monthly tech newsletter tomorrow - dev tips/articles via email. Not on the list? Sign up: http://t.co/h8kpjn419s
    September 16, 2014 at 2:58 PM
  • Want to chuckle? If programming languages were vehicles -http://t.co/quqHsUFCtR #funny
    September 16, 2014 at 11:41 AM
  • In Part 2 of our series on creating your own annotations, learn about processing #Java annotations using Reflection: http://t.co/DJZvQuarkc
    September 16, 2014 at 9:06 AM
  • Don't miss @jhackett01's newest post on the Keyhole blog - Processing #Java Annotations Using Reflection: http://t.co/E1lr3RmjI7
    September 15, 2014 at 12:02 PM
  • We're pretty excited - Keyhole's #BikeMS team raised 158% of their fundraising goal to benefit @MidAmericaMS. Plus, they had a great ride!
    September 15, 2014 at 10:38 AM
  • A huge welcome to David Kelly (@rheomatic) who officially joins the Keyhole team today! :-)
    September 15, 2014 at 10:00 AM
  • Sending warm thoughts to @eastlack, @cdesalvo, @wdpitt & all participating in #BikeMS this AM. Thanks for helping in the fight against MS!
    September 13, 2014 at 8:10 AM
  • .@rheomatic We are so excited to have you joining the team! Welcome :-)
    September 12, 2014 at 4:11 PM
  • As the official holiday is a Saturday, we're celebrating today! Happy (early) #ProgrammersDay to you! http://t.co/1CvUfrzytE
    September 12, 2014 at 1:55 PM
  • Tomorrow @cdesalvo, @eastlack, & @wdpitt are riding #BikeMS to benefit @MidAmericaMS. You can get involved, too - http://t.co/9boQwEUxth
    September 12, 2014 at 11:00 AM
  • RT @AgileDevs: 5 tips for great code reviews http://t.co/9PdbtEv0z8
    September 11, 2014 at 2:53 PM
  • The BEMs of Structuring #CSS - http://t.co/159suYtfx6 A quick introduction to the Block Element Modifier methodology.
    September 10, 2014 at 2:49 PM
  • A huge welcome to Joseph Post (@jsphpst) who has joined the Keyhole team this week!
    September 10, 2014 at 9:52 AM
Keyhole Software
8900 State Line Road, Suite 455
Leawood, KS 66206
ph: 877-521-7769
© 2014 Keyhole Software, LLC. All rights reserved.