An argument against defaulting to auto-increment primary keys
The de-facto “easy” primary key for ages has been the auto-incrementing integer. Upon insertion of a new record, the database simply increments a counter for the table and uses the new value for the primary key of the inserted row. It’s simple to setup, requires no maintenance, and guarantees a predictable key.
For my own purposes, I was under the notion that I should always use auto-incrementing primary keys. They were an obvious choice for many of my databases and I never needed to consider alternatives.
Then the cloud came along, with all of its distributed fanciness.
Specifically, I started a project on top of Google App Engine (GAE), an auto-scaling application runtime hosted by everybody’s favorite search monster. App Engine provides a distributed cloud database following a NoSQL paradigm called the Datastore. You store an object (POJO in my case) using a Long or String identifier. The Datastore is a powerful facility that couples well with App Engine’s scalable application runtime architecture.
I started My Gear Pack, a cloud-backed Android application and Google Web Toolkit (GWT) client used for helping outdoor savvy folks organize their gear. I immediately, without even thinking about it, constructed all of my Entities (that’s what GAE calls them) using auto-generated Long IDs. Upon insertion of a new Entity, the Datastore would automatically assign a Long that would be guaranteed to be unique.
This was rather convenient, as I wouldn’t have to manage ID generation code and handle any conflicts.
Then a user requested a new feature. They wanted to be able to use the Android application to manage their gear lists offline, without a data connection.
No problem, I thought to myself! I’ll just sever my layers and create a replicated database on their mobile device. I wrote a lot of code and spent hours designing my grand scheme to store the local Entities (in Android) with an empty “remoteID” field that would be populated when they were eventually synchronized with the Datastore, which would create the real Long ID.
It was complicated, and I had to walk away from coding several times out of frustration. I would glare at my dogs as if they had wronged me terribly. Ultimately dropping to the floor and wrestling around with them made things better. It always makes things better.
Solution: Universally Unique Identifier
After significant research and some experiments with various possible solutions, I landed on one of the most underrated features of modern programming languages, the Universally Unique Identifier (UUID).
A UUID is a pseudo-randomly generated 128-bit number, pure and simple. Due to several parameters that go into the creation of the number, it is generally accepted as unique enough that our meager Earthling brains can’t fathom a way to generate a duplicate.
UUIDs are frequently represented as hexadecimal Strings. So instead of using the Datastore-generated Long IDs, I built a prototype version of my entire project stack (GAE Java backend, GWT client, and Android client) using Strings for Entity identifiers. This enabled my offline clients to generate IDs when a user wanted to create a new gear list or gear item without being connected to the Datastore.
The solution turned out to be better than I could have anticipated. I could still use my replicated database design to synchronize back and forth between the clients and the cloud backend, and I could now uniquely identify any Entity of any type in the Datastore.
Life was perfect! I couldn’t have crafted a better architecture if I had chugged a couple more Mountain Dews…
Then I considered deployment.
How was I going to migrate all of my Datastore Entities (roughly 10,000) to use String identifiers instead of Long?
Stay tuned for Part 2 for a simple technique to mass-migrate Datastore schemas and retain Entity relationships.
– Shannon Griswold, firstname.lastname@example.org