A PostgreSQL migration library inspired by the Stack Overflow system described in Nick Craver's blog.
Migrations are defined in sequential SQL files, for example:
migrations
β 1_create-table.sql
β 2_alter-table.sql
β 3_add-index.sql
Requires Node 10.17.0+
Supports PostgreSQL 9.4+
There are two ways to use the API.
Either, pass a database connection config object:
import {migrate} from "postgres-migrations"
async function() {
const dbConfig = {
database: "database-name",
user: "postgres",
password: "password",
host: "localhost",
port: 5432,
// Default: false for backwards-compatibility
// This might change!
ensureDatabaseExists: true,
// Default: "postgres"
// Used when checking/creating "database-name"
defaultDatabase: "postgres"
}
await migrate(dbConfig, "path/to/migration/files")
}
Or, pass a pg
client:
import {migrate} from "postgres-migrations"
async function() {
const dbConfig = {
database: "database-name",
user: "postgres",
password: "password",
host: "localhost",
port: 5432,
}
// Note: when passing a client, it is assumed that the database already exists
const client = new pg.Client(dbConfig) // or a Pool, or a PoolClient
await client.connect()
try {
await migrate({client}, "path/to/migration/files")
} finally {
await client.end()
}
}
Occasionally, if two people are working on the same codebase independently, they might both create a migration at the same time. For example, 5_add-table.sql
and 5_add-column.sql
. If these both get pushed, there will be a conflict.
While the migration system will notice this and refuse to apply the migrations, it can be useful to catch this as early as possible.
The loadMigrationFiles
function can be used to check if the migration files satisfy the rules.
Alternatively, use the pg-validate-migrations
bin script: pg-validate-migrations "path/to/migration/files"
.
There is deliberately no concept of a 'down' migration. In the words of Nick Craver:
If we needed to reverse something, we could just push another migration negating whatever we did that went boom ... Why roll back when you can roll forward?
Migrations are guaranteed to run in the same order every time, on every system.
Some migration systems use timestamps for ordering migrations, where the timestamp represents when the migration file was created. This doesn't guarantee that the migrations will be run in the same order on every system.
For example, imagine Developer A creates a migration file in a branch. The next day, Developer B creates a migration in master, and deploys it to production. On day three Developer A merges in their branch and deploys to production.
The production database sees the migrations applied out of order with respect to their creation time. Any new development database will run the migrations in the timestamp order.
A migrations
table is created as the first migration, before any user-supplied migrations. This keeps track of all the migrations which have already been run.
Previously run migration scripts shouldn't be modified, since we want the process to be repeated in the same way for every new environment.
This is enforced by hashing the file contents of a migration script and storing this in migrations
table. Before running a migration, the previously run scripts are hashed and checked against the database to ensure they haven't changed.
Running in a transaction ensures each migration is atomic. Either it completes successfully, or it is rolled back and the process is aborted.
An exception is made when -- postgres-migrations disable-transaction
is included at the top of the migration file. This allows migrations such as CREATE INDEX CONCURRENTLY
which cannot be run inside a transaction.
If anything fails, the migration in progress is rolled back and an exception is thrown.
As of v4, advisory locks are used to control concurrency. If two migration runs are kicked off concurrently, one will wait for the other to finish before starting. Once a process has acquired a lock, it will run each of the pending migrations before releasing the lock again.
Logs from two processes A
and B
running concurrently should look something like the following.
B Connected to database
B Acquiring advisory lock...
A Connected to database
A Acquiring advisory lock...
B ... acquired advisory lock
B Starting migrations
B Starting migration: 2 migration-name
B Finished migration: 2 migration-name
B Starting migration: 3 another-migration-name
B Finished migration: 3 another-migration-name
B Successfully applied migrations: migration-name, another-migration-name
B Finished migrations
B Releasing advisory lock...
B ... released advisory lock
A ... acquired advisory lock
A Starting migrations
A No migrations applied
A Finished migrations
A Releasing advisory lock...
A ... released advisory lock
Warning: the use of advisory locks will cause problems when using transaction pooling or statement pooling in PgBouncer. A similar system is used in Rails, see this for an explanation of the problem.
Migrations should only be run once, but this is a good principle to follow regardless.
Once applied (to production), a migration cannot be changed.
This is enforced by storing a hash of the file contents for each migration in the migrations table.
These hashes are checked when running migrations.
Backwards incompatible changes can usually be made in a few stages.
For an example, see this blog post.
A migration file must match the following pattern:
[id][separator][name][extension]
Section | Accepted Values | Description |
---|---|---|
id | Any integer or left zero integers | Consecutive integer ID. Must start from 1 and be consecutive, e.g. if you have migrations 1-4, the next one must be 5. |
separator | _ or - or nothing |
|
name | Any length text | |
extension | .sql or .js |
File extensions supported. Case insensitive. |
Example:
migrations
β 1_create-initial-tables.sql
β 1_create-initial-tables.md # Docs can go here
β 2-alter-initial-tables.SQL
β 3-alter-initial-tables-again.js
Or, if you want better ordering in your filesystem:
migrations
β 00001_create-initial-tables.sql
β 00001_create-initial-tables.md # Docs can go here
β 00002-alter-initial-tables.sql
β 00003_alter-initial-tables-again.js
Migrations will be performed in the order of the ids. If ids are not consecutive or if multiple migrations have the same id, the migration run will fail.
Note that file names cannot be changed later.
By using .js
extension on your migration file you gain access to all NodeJS features and only need to export a method called generateSql
that returns a string
literal like:
// ./migrations/helpers/create-main-table.js
module.exports = `
CREATE TABLE main (
id int primary key
);`
// ./migrations/helpers/create-secondary-table.js
module.exports = `
CREATE TABLE secondary (
id int primary key
);`
// ./migrations/1-init.js
const createMainTable = require("./create-main-table")
const createSecondaryTable = require("./create-secondary-table")
module.exports.generateSql = () => `${createMainTable}
${createSecondaryTable}`
If you want sane date handling, it is recommended you use the following code snippet to fix a node-postgres
bug:
const pg = require("pg")
const parseDate = (val) =>
val === null ? null : moment(val).format("YYYY-MM-DD")
const DATATYPE_DATE = 1082
pg.types.setTypeParser(DATATYPE_DATE, (val) => {
return val === null ? null : parseDate(val)
})
General rule: only change schemas and other static data in database migrations.
When writing a migration which affects data, consider whether the migration needs to be run for all possible environments or just some specific environment. Schema changes and static data need changing for all environments. Often, data changes need to only happen in dev or prod (to fix some data), and might be better of run as one-off jobs (manually or otherwise).
-- No no no nononono (at least for big tables)
ALTER TABLE my_table ALTER COLUMN currently_nullable SET NOT NULL;
TL;DR don't do the above without reading this. It can be slow for big tables, and will lock out all writes to the table until it completes.
When creating indexes, there are a few important considerations.
Creating an index should probably look like this:
-- postgres-migrations disable-transaction
CREATE INDEX CONCURRENTLY IF NOT EXISTS name_of_idx
ON table_name (column_name);
CONCURRENTLY
- without this, writes on the table will block until the index has finished being created. However, it can't be run inside a transaction.-- postgres-migrations disable-transaction
- migrations are run inside a transaction by default. This disables that.IF NOT EXISTS
- since the transaction is disabled, it's possible to end up in a partially applied state where the index exists but the migration wasn't recorded. In this case, the migration will probably get run again. This ensures that will succeed.
See the Postgres docs on creating indexes.
Most of the time using IF NOT EXISTS
is not necessary (see above for an exception). In most cases, we would be better off with a failing migration script that tells us that we tried to create a table with a duplicate name.
A comment that is added to a migration script can never be changed once the migration script has been deployed. For complex migration scripts, consider documenting them in a separate markdown file with the same file name as the migration script. This documentation can then be updated later if a better explanation becomes apparent.
Your file structure might look something like this:
- migrations
- 0001_complex_migration.md <--- Contains documentation that can be updated.
- 0001_complex_migration.sql
- 0002_simple_migration.sql
Rather than this:
- migrations
- 0001_complex_migration.sql <--- Contains documentation that can never be updated.
- 0002_simple_migration.sql
Stack Overflow: How We Do Deployment - 2016 Edition (Database Migrations)
Database Migrations Done Right
Database versioning best practices
The tests require Docker to be installed. It probably helps to docker pull postgres:9.4
.