A table that should exist in all projects with a database

It’s called schema_version (or migrations, or whatever suits you) and its purpose is to keep track of structural or data changes to the database.
A possible structure (example in MySQL) is:

create table schema_version (
    `when` timestamp not null default CURRENT_TIMESTAMP,
    `key` varchar(256) not null,
    `extra` varchar(256),
    primary key (`key`)
) ENGINE=InnoDB;

insert into schema_version(`key`, `extra`) values ('001', 'schema version');

Whether you add this table from the beggining of the project or just after you’ve deployed the first version to a staging or production server is up to you.

Whenever you need to execute an SQL script to change the database structure or perform a data migration you should be adding a row in that table as well. And do that via an insert statement at the begining or end of that script (which is committed to the project’s code repository).

E.g:

insert into schema_version(`key`, `extra`)
    values ('002', 'FOO-22 user profile enhancement');

another example:

insert into schema_version(`key`, `extra`)
    values ('003', 'FOO-73 contact us form stores host & user agent');

and another example:

insert into schema_version(`key`, `extra`)
    values ('004', 'FOO-62 deletion of non validated user accounts');

This is useful because now you know which database migration scripts you’ve run on each server (development, staging, production):

mysql> select * from schema_version;
+---------------------+-----+-------------------------------------------------+
| when                | key | extra                                           |
+---------------------+-----+-------------------------------------------------+
| 2010-11-22 11:21:39 | 001 | schema version                                  |
| 2010-12-02 17:02:20 | 002 | FOO-22 user profile enhancement                 |
| 2010-12-06 15:55:41 | 003 | FOO-73 contact us form stores host & user agent |
| 2010-12-06 15:58:12 | 004 | FOO-62 deletion of non validated user accounts  |
+---------------------+-----+-------------------------------------------------+

The key and extra columns could contain anything that makes sense to you. I prefer a sequential key and a very short description (with reference to the issue ID, in our case a JIRA project called FOO) of the issue that this migration relates to.

The scripts which represent each of those database migration should be kept in the code repository together with the code and their filename should be prefixed with the key of the migration. e.g:

migrations/001-schema-version.sql
migrations/002-FOO-22-user-profile-enhancement.sql
migrations/003-FOO-73-contact-form-fixes.sql
migrations/004-FOO-62-data-fix.sql
migrations/005-FOO-88-venues-management.sql
migrations/006-FOO-89-venues-management-v2.sql
migrations/007-FOO-78-private-messages-system.sql

This keeps everything tidy and all team members work is now easier:

  • If a new developer jumps in development and has been given an old database dump which only contains migrations up to 004, he can compare that to the repository and see that he needs to execute migrations 005, 006 and 007 in order to bring his database in sync with the code.
  • If a deployer (or operations person) is asked to deploy the latest version of the project he can see what scripts have been run on the target server and act accordingly.

Happy migrations.

Edit: Since I’ve been receiving tons of traffic from dzone and reddit with lots of useful comments I need to followup with this F.A.Q.

Q: The column names of the table are the worst piece of SQL I’ve seen in my life. Why did you choose those names?
A: You are free to change the table to your needs. So, although you are missing the purpose of this post (which was to provide a methodology of tracking changes) here you go:

create table schema_migrations (
    `appliedAt` timestamp not null default CURRENT_TIMESTAMP,
    `migrationCode` varchar(256) not null,
    `extraNotes` varchar(256),
    primary key (`migrationCode`)
) ENGINE=InnoDB;

Since the migrations are 1:1 to the releases of the software project I use varchar for key/migrationCode (instead of int) so I can deal with cases such as:

prod-001
prod-002
prod-003
prod-003-quickfix-1
prod-003-quickfix-2
prod-004
...

Also note that If I’ve got something that has a property best described by “key” then I’ll happily name the column “key” any day. Domain modeling is priority number #1 for me and since I use an ORM which allows me to do that I’ll go ahead and do it.

Q: This is old news, you should use Rails or liquibase or something else
A: Thanks for your input. Will definitely try those out. My proposal is just a simple convention which works on any environment right now without any learning curve. Also note that the problem I’m solving here is not tracking the difference of 2 database schemas, but tracking what scripts have been run on each server.
Q: How often do you apply changes to the database schema for this to be helpful?
A: Most real world projects which evolve in time usually require changes in database structure. Also, most real world projects need at least a staging server and a production one. This method helps me track what migrations have been run on each server. For example we’ve got around 120 migration scripts for a medium traffic (1m pageviews/day) public site which we’ve been maintaining for 2 years. No project finishes when it goes live.
Q: Shouldn’t this data go into the code repository instead of the database?
A: The database scripts which apply changes to the database are already in source control. The point is to be able to track which ones have been executed in which database servers (dev, staging, production).
Q: The developer can forget and mess this process up with unpredicted results.
A: Yes, the developer can most probably mess any aspect of a software project.

56 Responses to “A table that should exist in all projects with a database”

  1. links for 2010-12-09 Says:

    […] A table that should exist in all projects with a database (tags: database) […]

  2. Shashi Velur Says:

    Good post..!

    I wrote a blog post last year on the very same topic: http://shashivelur.com/blog/2008/07/hibernate-db-migration

  3. Stuart Grimshaw Says:

    @cody I can see how it avoids the problem of conflicting integers, but not conflicting changes. 1 branch could last a month, but during that month another branch could be created & rolled live.

    6 months down the line how do I know which order to apply those changes?

  4. Richard Dingwall » The road to automated database deployment Says:

    […] Given a live database and a stack of SQL scripts, how do you know which ones need to be run? You could start reading going through them, statement by statement, until you find a change that doesn’t appear to be applied and start there — or you could simply have each script write a row to the end of a changelog table like this: […]

  5. Future-Proof Your Database Change Log | Language Hacker | Robert Elwell's Blog Says:

    […] a big believer in using some kind of database change log for any project you’re working on. As someone firmly entrenched in the PHP world, I’m […]

  6. Gary Says:

    You may want to have a look at ‘chd version’ with ChronicDB