A table that should exist in all projects with a database

It’s called schema_version (or migrations, or whatever suits you) and its purpose is to keep track of structural or data changes to the database.
A possible structure (example in MySQL) is:

create table schema_version (
    `when` timestamp not null default CURRENT_TIMESTAMP,
    `key` varchar(256) not null,
    `extra` varchar(256),
    primary key (`key`)
) ENGINE=InnoDB;

insert into schema_version(`key`, `extra`) values ('001', 'schema version');

Whether you add this table from the beggining of the project or just after you’ve deployed the first version to a staging or production server is up to you.

Whenever you need to execute an SQL script to change the database structure or perform a data migration you should be adding a row in that table as well. And do that via an insert statement at the begining or end of that script (which is committed to the project’s code repository).

E.g:

insert into schema_version(`key`, `extra`)
    values ('002', 'FOO-22 user profile enhancement');

another example:

insert into schema_version(`key`, `extra`)
    values ('003', 'FOO-73 contact us form stores host & user agent');

and another example:

insert into schema_version(`key`, `extra`)
    values ('004', 'FOO-62 deletion of non validated user accounts');

This is useful because now you know which database migration scripts you’ve run on each server (development, staging, production):

mysql> select * from schema_version;
+---------------------+-----+-------------------------------------------------+
| when                | key | extra                                           |
+---------------------+-----+-------------------------------------------------+
| 2010-11-22 11:21:39 | 001 | schema version                                  |
| 2010-12-02 17:02:20 | 002 | FOO-22 user profile enhancement                 |
| 2010-12-06 15:55:41 | 003 | FOO-73 contact us form stores host & user agent |
| 2010-12-06 15:58:12 | 004 | FOO-62 deletion of non validated user accounts  |
+---------------------+-----+-------------------------------------------------+

The key and extra columns could contain anything that makes sense to you. I prefer a sequential key and a very short description (with reference to the issue ID, in our case a JIRA project called FOO) of the issue that this migration relates to.

The scripts which represent each of those database migration should be kept in the code repository together with the code and their filename should be prefixed with the key of the migration. e.g:

migrations/001-schema-version.sql
migrations/002-FOO-22-user-profile-enhancement.sql
migrations/003-FOO-73-contact-form-fixes.sql
migrations/004-FOO-62-data-fix.sql
migrations/005-FOO-88-venues-management.sql
migrations/006-FOO-89-venues-management-v2.sql
migrations/007-FOO-78-private-messages-system.sql

This keeps everything tidy and all team members work is now easier:

  • If a new developer jumps in development and has been given an old database dump which only contains migrations up to 004, he can compare that to the repository and see that he needs to execute migrations 005, 006 and 007 in order to bring his database in sync with the code.
  • If a deployer (or operations person) is asked to deploy the latest version of the project he can see what scripts have been run on the target server and act accordingly.

Happy migrations.

Edit: Since I’ve been receiving tons of traffic from dzone and reddit with lots of useful comments I need to followup with this F.A.Q.

Q: The column names of the table are the worst piece of SQL I’ve seen in my life. Why did you choose those names?
A: You are free to change the table to your needs. So, although you are missing the purpose of this post (which was to provide a methodology of tracking changes) here you go:

create table schema_migrations (
    `appliedAt` timestamp not null default CURRENT_TIMESTAMP,
    `migrationCode` varchar(256) not null,
    `extraNotes` varchar(256),
    primary key (`migrationCode`)
) ENGINE=InnoDB;

Since the migrations are 1:1 to the releases of the software project I use varchar for key/migrationCode (instead of int) so I can deal with cases such as:

prod-001
prod-002
prod-003
prod-003-quickfix-1
prod-003-quickfix-2
prod-004
...

Also note that If I’ve got something that has a property best described by “key” then I’ll happily name the column “key” any day. Domain modeling is priority number #1 for me and since I use an ORM which allows me to do that I’ll go ahead and do it.

Q: This is old news, you should use Rails or liquibase or something else
A: Thanks for your input. Will definitely try those out. My proposal is just a simple convention which works on any environment right now without any learning curve. Also note that the problem I’m solving here is not tracking the difference of 2 database schemas, but tracking what scripts have been run on each server.
Q: How often do you apply changes to the database schema for this to be helpful?
A: Most real world projects which evolve in time usually require changes in database structure. Also, most real world projects need at least a staging server and a production one. This method helps me track what migrations have been run on each server. For example we’ve got around 120 migration scripts for a medium traffic (1m pageviews/day) public site which we’ve been maintaining for 2 years. No project finishes when it goes live.
Q: Shouldn’t this data go into the code repository instead of the database?
A: The database scripts which apply changes to the database are already in source control. The point is to be able to track which ones have been executed in which database servers (dev, staging, production).
Q: The developer can forget and mess this process up with unpredicted results.
A: Yes, the developer can most probably mess any aspect of a software project.

56 Responses to “A table that should exist in all projects with a database”

  1. javapapo Says:

    nice, but you have to keep documenting the versioning of the DB – in your support documents bla bla, provide readme.txt for every migration blabla. In other words – migration support requires more – i bet you know it.

    Now for this specific one – good idea by the way – I would add the migration – updgrade SQL sripts in a CLOB field – just in case the folder structure is screwed!

    :D
    hehe…migration hell!

  2. cherouvim Says:

    No need for documentation or readme.txt.

    It’s all about the development and deployment lifecycle. We’ve all been in projects where every developer changes the database structure in an ad hoc and undocumented way. When it’s time for a production deployment things go really bad. This is what this method solves. Development and deployment sanity :)

  3. PanosJee Says:

    That s the way it should be but it comes as a default for Rails project :)
    We have a command that creates migrations (new tables or dropping table or altering) and every time we run rake db:migrate it would execute them and of course update the schema_migration table. Very handy! Especially when you do incremental changes to production servers

  4. cherouvim Says:

    @PanosJee: Yes, in fact I borrowed (and simplified) this facility from Rails and Refactoring Databases book to my Java and PHP projects.

  5. Maria Ioannidou Says:

    Nice idea, db migration has always been an issue… I’ll try it!

  6. javapapo Says:

    in many projects…All the related parties are concerned about migrations – so I can argue..unfortunately that YES migrations should be documented!!!! sorry for that.

    The developer side half of the answer..the other is the client.Because when you apply patches to his enterprise application they are really concerned what the patches are doing and if the migration is breaking something!!!

  7. javapapo Says:

    just to relax my previous comment – I have witness many times – migrations going wrong on production level (because usually they happen when you deliver) and then you have to explain ..what when wrong. Pre – deliver migrations – mainly related to development lifecycle are an issue yes..but can be handled with proper attituded by the devs, the pm and of course…solutions like the above!

    Thanks and respect :D

  8. Benjamin Flesch Says:

    This is very useful and should be implemented by every project which really wants to ship one day. Makes maintenance so much easier!

  9. Database Migrations | blog.abatir.se Says:

    [...] wrote in A table that should exist in all projects with a database about an approach to document the database [...]

  10. Scott S. McCoy Says:

    I use a similar thing, a common version table on every schema with a series of shell scripts which bootstrap running migrations. We create the migration script once and index it by version number, each series of DDL changes containing another migration script. We have a tool (just a pile of sh & awk scripts) then checks out a tag for a given version, and calculates the current and target deployment revision based on the schema version, and runs the delta of migration scripts in order.

    The developers perform the modifications ad-hoc in our local environments, through the tool one by one in our QA environment (to test our checked in migration scripts), then the operational staff execute the change-set bulk in our stage environment before execution in production. This leads to very smooth deployments, as well as a chronological record in the data-base of the authoritative state at any given point in time which does aid troubleshooting.

  11. Barry Coughlan Says:

    In addition, it’s good to have a folder with all your ALTER TABLE statements in it, and name the file after the ‘key’ field

  12. dude Says:

    rails really was ahead of it’s time

  13. Slava Says:

    What about concurrency. DevA adds “his” Migration7 on top of “common” Migration6, DevB adds his own Migration7 on top of the same migration. Now what is the correct order of them?

  14. cherouvim Says:

    @Scott S. McCoy: That’s great. Automation and smooth release cycles are priority number one for me as well.

    @Barry Coughlan: You are right. In fact I’ll present this in a next post, together with the 3 phases of database refactoring migrations, called: expand, release, cleanup.

    @dude: Probably yes. Part of being a good engineer is being able to assess and combine the good elements of different tools/methodologies/schools into your own process. So that’s what I did (since I do not use RoR).

    @Slava: That is a very good question. In fact you can use those migrations as a 1:1 match with the product release cycles. So when the weekly (or whatever) version of the project is to be released next friday, every dev enters their deltas into that script. In rare cases a project lead will have to reshufle or do tiny bits of editing into the migration script.

  15. Gabe da Silveira Says:

    @Slava that’s why Rails now uses timestamps rather than sequential version numbers.

  16. IGnatius T Foobar Says:

    Why an entire table full of superfluous data? The entire job can be (and often is) handled by a single integer somewhere in the data store indicating what version the schema is at. Furthermore, the application should be smart enough to recognize when its data store is not the current version and run the upgrade functions automatically.

  17. Moshe Says:

    @IGnatius: Agree.
    In Magento we use core_resource table to synchronize db structure with currently installed code, keep only the current version of the schema for each module, and automatically run PHP upgrade files whenever needed.

  18. cherouvim Says:

    @IGnatius T Foobar: A single integer is what I used to use but having all that extra data is sometimes useful, especially when the sysadmin forgot to run an intermediate script.

    Automatic upgrade of the schema via the application is out of the question in large and high traffic projects where the schema migration (with its 3 phases: expand, release, cleanup) may last days do to alters and data fixes on huge tables.

  19. salvador Says:

    We’ve been using this migrations plugin for maven to execute and keep track of all our schema migrations. It’s awesome! Check it out.

    http://code.google.com/p/c5-db-migration/

  20. Björn Nilsson Says:

    This is great, I have been using a similar but less structured way of keeping track of schema changes. For a larger team something like this is mandatory to keep track.

    Another improvement that could be done is to do a check in th ebeginning of the script that it is applying to the correct version. eg 007 refuses to run if current version is not 006.

  21. Mujtaba Hussain Says:

    Hi

    ActiveRecord in Rails and LiquiBase both do that with schema_information and DATABASECHANGELOG tables respectively. Why not utilise that instead of creating your own?

  22. Price Says:

    Thanks for the post.

    I recently took over a database which has no versioning whatsoever. All changes take place directly on production. This will help in finding a good way of stopping this madness :)

    A couple of questions:

    - Barry Coughlan mentioned keeping the ALTER TABLE scripts in a seperate folder. Why would you want to do that? I can’t think of any scenario where that would help…

    - How do you handle rollback actions in case something goes wrong and the new version must be rolled back to the previous one?

  23. cherouvim Says:

    @Mujtaba Hussain: That is correct, but I mainly use Java and PHP. I’m aware of liquibase and I will assess in the future. My proposal though is just a simple convention which works on any environment without any learning curve.

    @Price: I keep in a folder all sql scripts which take my database from point A to point B.
    Regarding rollback, yes I could have undo scripts as well but it’s an overhead. I know (from a podcast) that Amazon does that. I will usually grab a database backup just before running those on production. Of course anything that will be run on production has already been tested on staging.

  24. Alan Pinstein Says:

    Great post. I wrote MP, a generic migrations framework for PHP. It works very similarly but I would love to update it to record audit trails as well.

    If you’re interested in helping to make your process available to anyone and would like to improve MP as well, we’d love to have your help with the project!

  25. Stuart Grimshaw Says:

    How do you handle the case where, after schema change 007 I create a new branch of the code to work on my new feature & someone else creates their branch to work on their new feature.

    I create 008-create-this-table.sql and someone else creates 008-create-that-table.sql it’s fairly simple I guess to rename one of them to 009 when the first hits the master branch, but at our company we can have 3 or 4 projects on the go, not to mention branches for bug fixes too. With that much going on, it would quickly become unmanageable.

    We’ve been looking for a solution to this problem for a long time, and it’s not easy :-)

  26. cherouvim Says:

    @Alan Pinstein: Thanks for the offer. Will consider as soon as I got some free time in my hands :)

    @Stuart Grimshaw: Great question. We usually work on trunk, and db changes on branches or local dev experiments do not go into the migration script. Although I didn’t mention it in the post, the migration is a 1:1 match of a releasable project version. So if something is only exists in a branch then it’s sql delta scripts do not (yet) belong into the version migration script.

  27. Adam Żochowski Says:

    Instead of polluting your system with an additional table, please use sql extended properties. I am sure ms-sql has them:

    http://msdn.microsoft.com/en-us/library/ms180047.aspx

    Use these extended properties on the schema level, with property name matching your schema migration script that has been applied. Use the value to track when schema migration was applied.

    Cheers

    Adam Żochowski

  28. cherouvim Says:

    @Adam Żochowski: Interesting. My proposal has no learning curve and works on any database version. In fact I today use it for projects using MySQL, SQL Server and Oracle.

  29. Jonathan Soeder Says:

    It is possible to get all of this functionality for very cheap just by using ruby’s rake, and activerecord. Just because you’re not developing your app in the rails framework doesn’t mean you can’t use ruby to do development and deployment automation.

    I also like to use vlad the deployer inside non-rails projects, to handle automating ssh for deploying my app to multiple servers / environments from source control. this can run the migrations for you automatically.

  30. Cody Says:

    @Stuart Grimshaw To get around your problem of multiple developers creating an “008″ migration at the same time – dont use serial integers but timestamps of when the migration was created. Thus, you have uniqueness down to the second instead of an information-less integer. Your migrations should look like

    201012081445_CreateSomeTable
    201012081617_AddIndexToFoo

    Rails got away from integer organization a long time ago and its my preferred method on my PHP migrations framework: https://github.com/ruckus/ruckusing-migrations

  31. mikeJ Says:

    we go one more. keep a version on each table, sP etc. we then keep a script that does the migration for each object to the next version. we are able to migrate each individual customer from any release version to the current version in set of transformations. we have to put this data into the schema because extended properties are non standard or non-existant across the various backends we support – oracle, db2, msql, firebird and mysql.

  32. nirk Says:

    @dude

    We were doing API version tracking when I was working in S/390 assembler using fixed-width flat files with a header and a footer. Everything old is new again.

  33. Udi Says:

    Why not just keep this info in a text file that you update and check into source control? Why does it need to be in the db?

  34. Joshua Simeon Narins Says:

    I integrated RT (our ticket system) with SVN and came up with a similar, but different, solution.

    Each ticket (read: project) has associated files and revisions. Some of these files are in the /schema_updates/ directory. We can push, or pull, tickets from production using a script I wrote. One of the steps is to apply the schema_update. To avoid name collisions we use the format [ticket_number]_up.sql for the code to apply the schema, and [ticket_number]_down.sql if the ticket needs to be pulled from production (or QA, for that matter, which has its own database).

    The same push/pull (well, mostly push) script will also tell you the status of a ticket (is it in QA? production? not even that far? in between?) so it can tell you if any particular schema update has been applied.

  35. Stepan Says:

    Incremental update scripts are very hard to maintain. We use

    http://bitbucket.org/stepancheg/mysql-diff/

    to compare database schemas. It outputs difference as sequence of ALTER TABLE statements. It is not always reliable, but very helpful.

  36. the jackol’s den » Blog Archive » A table that should exist in all projects with a database Says:

    [...] Read more By Mikhail Esteves | December 9th, 2010 in Links [...]

  37. Dantheta Says:

    My only change to this would be to use an int field (instead of a varchar) for the version number and not to use a reserved word (‘key’ in some DBMS) as a column name.

    A sound strategy it is, though. I personally wouldn’t be comfortable with the application being able to run ALTER commands (that is what we have DBAs and DBA automation for), but that’s as much personal preference as anything else.

    Cheers!

  38. Mike Marcos Says:

    When I was working with a few people on a project, we just kept one database build script in the SVN repo. This worked really well –

    You make changes to the db? Commit that build script.

    You sit down to start working? Update your project.

    Uh, oh, merge conflict? Oh, look, we both tried to change this table, let’s figure out how to change it to both our advantages.

  39. Rob Says:

    fuck it, just GIT it

  40. Tiago Albineli Motta Says:

    you can use

    rake db:migrate

  41. The B. Says:

    All good, what I do in addition to that is also add an MD5 field to the table. The tool that we wrote to do the database migration checks the md5 for each file, and will warn you when a SQL script has been changed (which it really should not, once it’s pushed upstream).

  42. cherouvim Says:

    @Dantheta: Agreed. You can adapt it to your needs, and yes, I’d never run any alter statements via application code as well.

    @Mike Marcos: OK, that works until you release your software and you enter the next development cycle. You need a new script then.

    @Rob: Not sure how GIT solves the problem.

    @Tiago Albineli Motta: Thanks for the info, will definitelly try it out. My proposal has 0 learning curve though.

    @The B.: Interesting addition in case your environment requires it.

  43. Paul Keeble Says:

    There are two good tools for doing database versioning and reliable releasing that every developer should know about.

    1) http://www.Liquibase.org
    2) http://dbdeploy.com

    Both use more information in their table than this to spot other types of errors and don’t require you to produce a custom solution. Liquibase is especially good as it can deal with the going backwards in time problem found when doing branching and automated database releases.

  44. Paul Says:

    Previously I’ve dumped the database schema nightly and (after stripping out cruft like comments and blank lines) compared it against the latest in a svn repository. If there are differences, store the new one in the repository and create a change control record which gets written to a CC database.

    It needs the organisation to mandate and enforce proper change control where every change to a production system gets handled formally and creates a log record. Really, structural changes should only be doable from a controlled admin user and preferably on the machine hosting the database. Of course, this may not work with all RDBMSs :(

  45. anon Says:

    I simply use a PHP script which reads table columns (and even data) and applies schema changes accordingly (it’s ran automatically on update). I just add changes to the end of the script. Even if a straight-up SQL migration script failed in mid-process, you can run the update script as many times as you like.

  46. Ma Diga Says:

    I hadn’t thought of doing something like this. I keep track of DB schema changes or migrations on an internal wiki site by date and etc. Will try this in the near future as well…

  47. Jeroen Says:

    You also want to include a column to identify the changer. You’ll want to know in some point in the future who’s ass you want to kick because of some unwanted changes.

  48. Eggsy84 Says:

    Hi all,

    We have implemented this way tracking our SQL changes. Currently up to over 130 ;)

    One thing we have noticed is that it become very hard to roll back the changes after you have applied them.

    If (and lets face it when) an install goes wrong and you have already performed your SQL update what do people tend to do?

    You can of course take a DB backup before the update but for large databases this might not be feasible due to restore times.

    Any other suggestions?

    Eggsy

  49. cherouvim Says:

    @Paul Keeble: Thanks for the links, will definitely research.

    @Jeroen: Haha. Nice one :)

    @Eggsy84: If you are really serious you can include undo scripts for each migration. A database backup before applying the changes is helpful too, although that depends on the size of the database.

  50. Martin Says:

    Take a look at MyBatis Migrations application. It is incredibly simple and just fantastic.

  51. links for 2010-12-09 Says:

    [...] A table that should exist in all projects with a database (tags: database) [...]

  52. Shashi Velur Says:

    Good post..!

    I wrote a blog post last year on the very same topic: http://shashivelur.com/blog/2008/07/hibernate-db-migration

  53. Stuart Grimshaw Says:

    @cody I can see how it avoids the problem of conflicting integers, but not conflicting changes. 1 branch could last a month, but during that month another branch could be created & rolled live.

    6 months down the line how do I know which order to apply those changes?

  54. Richard Dingwall » The road to automated database deployment Says:

    [...] Given a live database and a stack of SQL scripts, how do you know which ones need to be run? You could start reading going through them, statement by statement, until you find a change that doesn’t appear to be applied and start there — or you could simply have each script write a row to the end of a changelog table like this: [...]

  55. Future-Proof Your Database Change Log | Language Hacker | Robert Elwell's Blog Says:

    [...] a big believer in using some kind of database change log for any project you’re working on. As someone firmly entrenched in the PHP world, I’m [...]

  56. Gary Says:

    You may want to have a look at ‘chd version’ with ChronicDB