Thursday, April 25, 2013

Schema versioning in Postgres

Edit long after writing this post (4/2016): These days I use Liquibase! But let's keep the post below for posterity....

For a web application our team has been working on, I wrote a set of scripts that facilitate keeping our Postgres database schema up to date. Each developer has their own database instance, but I'm the guy in charge of managing schema changes. So, whenever the schema is changed, it's nice to have a version number for each so we can tell if an upgrade is required. (A similar approach could easily be used for databases other than Postgres, by the way.)

How it works:
We have a set of .sql scripts that are named like this:

upgrade_[a.b]_to_[c.d].sql
upgrade_[c.d]_to_[e.f].sql
upgrade_[e.f]_to_[g.h].sql

Each script contains the necessary changes, typically ALTER TABLE statements and such. At the very end of each, the schema version is set in the form of two function definitions: get_schema_version_major() and get_schema_version_minor(). Each simply returns an int. I increment the major number in each  sql script that makes a change that, without which, the application would break. An example might be a dropped column. The minor version is incremented when a non-application-breaking change is made, such as a new constraint or index.

For each sql script there is a corresponding bash shell script as well as a Windows cmd script. Example: upgrade_[a.b]_to_[c.d].sh. This wraps up the call to psql so that the less-database-inclined can run it without hassle.

After a while of my coworkers using these upgrade scripts, there were complaints about how so many scripts had to be run! For example, maybe schema version 10.3 needed to be upgraded to 15.0 via several links in between. So I cooked up a new shell script that would intelligently traverse the chain of version numbers and automagically do it all for you. just run upgrade.sh and voila! Here's what the output looks like:

Beginning schema upgrades, starting with version 10.3...
Upgrading from 10.3 to 11.0...
Upgrade successful.
Upgrading from 11.0 to 12.0...
Upgrade successful.
Upgrading from 12.0 to 13.0...
Upgrade successful.
Upgrading from 13.0 to 14.0...
Upgrade successful.
Upgrading from 14.0 to 15.0...
Upgrade successful.
Number of successful upgrades: 5. Version is now 15.0.

It won’t do anything it shouldn’t if I run it again…

mwrynn@mwrynnix:~$ ./upgrade.sh

Beginning schema upgrades, starting with version 15.0...
Number of successful upgrades: 0. Version is now 15.0.

Here’s an error case. I started over from 10.3 and purposely added a syntax error to the 13.0_to_14.0 script just for the purpose of this test…

mwrynn@mwrynnix:~$ ./upgrade.sh

Beginning schema upgrades, starting with version 10.3...
Upgrading from 10.3 to 11.0...
Upgrade successful.
Upgrading from 11.0 to 12.0...
Upgrade successful.
Upgrading from 12.0 to 13.0...
Upgrade successful.
Upgrading from 13.0 to 14.0...                         
psql:sbo_upgrade_13.0_to_14.0.sql:25: ERROR:  syntax error at or near "INT"
LINE 15: ...subscription ADD COLUMNBLAH pending_status_change INT REFERE...
                                                              ^
Upgrade failed. Aborting.
Number of successful upgrades: 3. Version is now 13.0.

Finally, I’ve fixed that error and I’m resuming from where we were. Since each upgrade is an all-or-nothing transaction, it doesn’t matter that some of the statements in that bad script worked before the error occurred…

mwrynn@mwrynnix:~$ ./upgrade.sh
                                         
Beginning schema upgrades, starting with version 13.0...
Upgrading from 13.0 to 14.0...
Upgrade successful.                                
Upgrading from 14.0 to 15.0...
Upgrade successful.
Number of successful upgrades: 2. Version is now 15.0.

No comments: