Every long-term project will outlive at least some of the technologies it was originally built with. For example, a project I have been involved with recently ran into this situation. The app is hosted on Heroku, and over the years, the available MongoDB add-ons have changed and dwindled until now, there is only one.
Several migrations between MongoDB add-ons have already happened because of shutdowns. So, it was decided that rather than migrating to the last one still in existence, the project would switch to using PostgreSQL, which is supported directly by the Heroku team.
Additionally, instead of coding directly to the PostgreSQL API, we decided to also make the switch to using Sequelize to abstract out the database layer. In this post, I’ll walk through the challenges that came up during this refactor, and we’ll discuss some of the “wins” as well.
Why PostgreSQL?
So you may be wondering, “why PostgreSQL?” Well, MongoDB is a JSON-based, document database. PostgreSQL is a relational database.
Why choose PostgreSQL and not some other document database? First, Heroku’s support for PostgreSQL was a big factor. Not having to worry about future migrations when inevitably yet another add-on is discontinued is a good thing.
Second, Heroku has a “hobby” free tier. This helps my client with costs because the test/staging server can use the hobby tier for their testing.
Third, PostgreSQL has come a long way in supporting JSON columns, and we took full advantage of that.
In the end, the best of both relational and document databases were incorporated into the final solution.
Why Not Just Use a Tool?
There are several existing tools that migrate data from MongoDB to PostgreSQL. I chose to roll my own to have ultimate control over what was migrating. Another motivating factor was to learn how to use Sequelize migrations on a going forward rather than a one-time migration.
Since the beginning, this project has evolved quite a bit, and it’s clear that it will continue to change going forward.
Create the PostgreSQL Database
The first step is to install PostgreSQL locally, and that’s pretty simple.
Download an installer here. The documentation for Sequelize is invaluable for setting up migrations, and I’d recommend taking a look at it.
Once you’ve set up config/config.json
, you can use NODE_ENV=development npx sequelize-cli db:create
to create the database for local development. Since the name of the database and the credentials are already in config.json
, it’s a pretty simple process.
Table Generation: Those Darned Auto Plurals!
Now that we have a database, it’s time to define the tables. Since we have a legacy MongoDB database, we will want to match the collection names to the new table names in PostgreSQL. This way, the code doesn’t have to change.
This may seem a little trickier than it should be. To protect the client’s privacy, we won’t work with the actual table names. Instead, we’ll work with the examples provided by Sequelize.
Let’s say we need a table named user
to mirror a user
collection in MongoDB. Looking at the Sequelize example, we can run this command to create the migration, which will create the table we need:
npx sequelize-cli model:generate --name User --attributes firstName:string,lastName:string,email:string,data:json
This command will create a Sequelize model named User
and a migration file. I added the data column as JSON so that you could see an example of how to do that. Let’s take a look at what happened when we ran this command.
The model Sequelize created is shown below.
'use strict'; const { Model } = require('sequelize'); module.exports = (sequelize, DataTypes) => { class User extends Model { /** * Helper method for defining associations. * This method is not a part of Sequelize lifecycle. * The `models/index` file will call this method automatically. */ static associate(models) { // define association here } }; User.init({ firstName: DataTypes.STRING, lastName: DataTypes.STRING, email: DataTypes.STRING, data: Datatypes.JSON }, { sequelize, modelName: 'User', }); return User; };
This looks pretty good. Let’s have a look at the migration file.
'use strict'; module.exports = { up: async (queryInterface, Sequelize) => { await queryInterface.createTable('Users', { id: { allowNull: false, autoIncrement: true, primaryKey: true, type: Sequelize.INTEGER }, firstName: { type: Sequelize.STRING }, lastName: { type: Sequelize.STRING }, email: { type: Sequelize.STRING }, data: { type: Sequelize.JSON }, createdAt: { allowNull: false, type: Sequelize.DATE }, updatedAt: { allowNull: false, type: Sequelize.DATE } }); }, down: async (queryInterface, Sequelize) => { await queryInterface.dropTable('Users'); } };
Hey, wait a minute. We asked for a table named user
, but Sequelize decided we needed a table named Users
instead. That’s not going to work very well for our existing code.
After some digging, I found that Sequelize isn’t very friendly for mimicking legacy tables, but no worries. We can wrangle it into submission. I ended up manually fixing the migration file by simply changing the table name to user
.
Now, the model simply needs tableName: ‘user’
added where modelName
is located. If you don’t need timestamps, you can delete the createdAt
and updatedAt
properties from the model and add timestamps: false
to the migration file where modelName
and tableName
are located.
If you make any needed changes to the model and migration before running the Sequelize migrate command (npx sequelize-cli db:migrate
), then your tables will be created according to your specifications.
We won’t run any migrations just yet, but if you want to, you can undo all migrations by running npx sequelize-cli db:migrate:undo:all
.
Planting the Seeds, But Don’t Use Those Seed Files!
Once you have all of your tables defined like you want them, you may have some reference data that you’d like to specify. Strictly speaking, you could use the code I’m going to show later to just migrate everything. However, in the interest of showing how to do seeding without pulling data from another database, we’ll go run through this example right now.
Seed files are kind of the redheaded stepchild of Sequelize. Yes, they are there, and they kind of work, but Sequelize doesn’t keep track of them as well as it does migrations, and they can get kind of wonky.
The good news is that migration files can do everything seed files can do and more. So let’s just use the migration files and skip knowing anything about seed files.
Just after the queryInterface.createTable
API call in the migration file, you can add the following:
await queryInterface.bulkInsert('user', [{ firstName: 'John', lastName: 'Doe', email: ‘[email protected]’, data: JSON.stringify([{hobby:’golf’}]) }], {});
This would create a user in the user table. Obviously, you can continue creating all the data you need to in order to seed the database. Using the JSON column of PostgreSQL, you can decide what MongoDB data you want to store in columns and what data can be left as JSON.
In my case, I only moved data to columns that were used in queries to make migration as simple as possible. PostgreSQL does support finding data inside JSON columns, but normal columns are much faster.
Connecting Locally vs. Remotely … Heroku Anyone?
The Sequelize docs do a pretty good job of describing how to set up config.json for a local database with the database name and credentials. However, Heroku doesn’t work like that.
When you add the Heroku PostgreSQL add-on to your site, Heroku automatically creates an environment variable for you called DATABASE_URL
with the connection info.
It took some more digging to figure out how to configure config.json to work with that, so let’s take a look.
"test": { "use_env_variable": "DATABASE_URL", "dialectOptions": { "ssl": { "require": true, "rejectUnauthorized": false } }, "dialect": "postgres" }
What is that rejectUnauthorized
all about? The Heroku PostgreSQL add-on creates a self-signed certificate for SSL when the add-on is configured by Heroku. PostgreSQL doesn’t like those, so it will reject the SSL connection when it figures that out. We can override that behavior with this flag.
Out from Mongo, into PostgreSQL
Let’s continue our migration by adding the code to pull data from MongoDB and insert it into PostgreSQL.
I created a module to do the migrations since all of my collections had the same basic document format. This module works locally in addition to running within Heroku.
The example environment variable for connecting to the MongoDB database in my case is MONGO
– again, this isn’t the real one we use.
Calling find({}
) on the collection returns all documents in the collection. If your database has an extremely large number of rows, you’d want to do paging here to process the rows in chunks. In my case, though, this is a specialized database that doesn’t have enough rows for that to be a concern.
The bulkInsert
call for Sequelize is just a simple example, but I wanted to point out how to insert JSON data into a PostgreSQL JSON column. Don’t forget to close the MongoDB database connection after the migration.
const mongodb = require('mongodb'); const MongoClient = mongodb.MongoClient; module.exports = async (queryInterface, collection) => { console.log(`beginning MongoDB migration for environment ${process.env.NODE_ENV} using connection ${process.env.MONGO}`); let numMigrated = 0; let client = new MongoClient(process.env.MONGO, {useUnifiedTopology: true}); let database; try { database = await client.connect(); console.log('connected'); const remoteDb = database.db(); const cursor = await remoteDb.collection(collection).find({}); if (cursor) { const allData = await cursor.toArray(); if (allData && allData.length) { console.log(`number of records is ${allData.length}`); for (const data of allData) { await queryInterface.bulkInsert(collection, [{ foo: data.foo, jsondata: JSON.stringify(data.data) }], {}); ++numMigrated; } } } catch(err) { console.log(err.stack); } if (database) { await database.close(); } console.log(`finished MongoDB migration. ${numMigrated} records migrated.`); };
Setting Up Node.JS/Express
Since the project uses Node.JS plus Express, we’ll need to add some dependencies for Sequelize and PostgreSQL. Be sure to add all of these to dependencies and not devDependencies because we’ll need to have these available in production for migrations to work.
Below are the dependencies I added. Note that because of the custom Sequelize described in the previous section, we won’t be removing MongoDB just yet.
"sequelize": "^6.3.3", "sequelize-cli": "^6.2.0", "pg": "^8.3.0", "pg-hstore": "^2.3.3"
Like
, How Do You Do That in Sequelize?
Using Sequelize with PostgreSQL in Node.JS/Express is pretty straightforward. Inserts, updates, and deletes are very well covered elsewhere, including in the Sequelize documentation. The one thing that puzzled me a little bit was the like
operator, so I’ll cover that here.
First, in the requires, add these two entries:
const { Op } = require("sequelize"); const sdb = require('./models');
This gets us our connection to the database and access to the operators.
To perform a find
using a like
, let’s set up a fake scenario and then consider the code below.
Let’s say we have a list of data stored in a collection with a column called itemName
that all start with a common prefix (i.e. foo1, foo2, foo3). We pass in the collection (table) name and the prefix for the list of items.
The following code will find all of the rows using the like
operator. If we’ve indexed the itemName
column, this will be a fast operation.
Sequelize returns results in an array, with the real data in a property called dataValues
. So, if we get results, we map those down to just the data to make it look more like the data we used to get from MongoDB.
This is just a dummy example since I can’t use the real situation where I need to do a like
operation.
const find = async (collection, prefix) => { try { result = await sdb[collection].findAll({ where: { itemName: { [Op.like]: `${prefix}%` } } }); } catch(error) { console.log(error); result = null; } if (!result || !result.length) { result = []; else { result = result.map(element => element.dataValues); } return result; };
Migrations in Production (Don’t Hurt My Data, Yo!)
Once the initial migration that creates the tables and moves the data over runs, those will not run again unless you perform the undo
operation.
Going forward, you want your new migrations to run before your Node.JS/Express server starts up. You can do this in code, but I prefer the method that uses package.json
. Now, anytime you add new migrations, they will be executed before the server starts.
"scripts": { "start": "sequelize db:migrate && node app.js", }
Conclusion
Migrating from one database technology to another one can be a daunting task. Thankfully, PostgreSQL and Sequelize combine to make this particular migration much easier than it otherwise could have been.
I hope this blog gives at least a little bit of assistance to anyone facing a similar task. Even if it’s just moral support, I know anything helps.
A bit of a shameless plug, but if you liked this post, and you’re looking for more, go check out the rest of the Keyhole Dev Blog! Lots of good info there on migrations, data, and more.
Best of luck on your migrations!