Archive for the ‘subversion’ Category

A table that should exist in all projects with a database

Wednesday, December 8th, 2010

It’s called schema_version (or migrations, or whatever suits you) and its purpose is to keep track of structural or data changes to the database.
A possible structure (example in MySQL) is:

create table schema_version (
    `when` timestamp not null default CURRENT_TIMESTAMP,
    `key` varchar(256) not null,
    `extra` varchar(256),
    primary key (`key`)

insert into schema_version(`key`, `extra`) values ('001', 'schema version');

Whether you add this table from the beggining of the project or just after you’ve deployed the first version to a staging or production server is up to you.

Whenever you need to execute an SQL script to change the database structure or perform a data migration you should be adding a row in that table as well. And do that via an insert statement at the begining or end of that script (which is committed to the project’s code repository).


insert into schema_version(`key`, `extra`)
    values ('002', 'FOO-22 user profile enhancement');

another example:

insert into schema_version(`key`, `extra`)
    values ('003', 'FOO-73 contact us form stores host & user agent');

and another example:

insert into schema_version(`key`, `extra`)
    values ('004', 'FOO-62 deletion of non validated user accounts');

This is useful because now you know which database migration scripts you’ve run on each server (development, staging, production):

mysql> select * from schema_version;
| when                | key | extra                                           |
| 2010-11-22 11:21:39 | 001 | schema version                                  |
| 2010-12-02 17:02:20 | 002 | FOO-22 user profile enhancement                 |
| 2010-12-06 15:55:41 | 003 | FOO-73 contact us form stores host & user agent |
| 2010-12-06 15:58:12 | 004 | FOO-62 deletion of non validated user accounts  |

The key and extra columns could contain anything that makes sense to you. I prefer a sequential key and a very short description (with reference to the issue ID, in our case a JIRA project called FOO) of the issue that this migration relates to.

The scripts which represent each of those database migration should be kept in the code repository together with the code and their filename should be prefixed with the key of the migration. e.g:


This keeps everything tidy and all team members work is now easier:

  • If a new developer jumps in development and has been given an old database dump which only contains migrations up to 004, he can compare that to the repository and see that he needs to execute migrations 005, 006 and 007 in order to bring his database in sync with the code.
  • If a deployer (or operations person) is asked to deploy the latest version of the project he can see what scripts have been run on the target server and act accordingly.

Happy migrations.

Edit: Since I’ve been receiving tons of traffic from dzone and reddit with lots of useful comments I need to followup with this F.A.Q.

Q: The column names of the table are the worst piece of SQL I’ve seen in my life. Why did you choose those names?
A: You are free to change the table to your needs. So, although you are missing the purpose of this post (which was to provide a methodology of tracking changes) here you go:

create table schema_migrations (
    `appliedAt` timestamp not null default CURRENT_TIMESTAMP,
    `migrationCode` varchar(256) not null,
    `extraNotes` varchar(256),
    primary key (`migrationCode`)

Since the migrations are 1:1 to the releases of the software project I use varchar for key/migrationCode (instead of int) so I can deal with cases such as:


Also note that If I’ve got something that has a property best described by “key” then I’ll happily name the column “key” any day. Domain modeling is priority number #1 for me and since I use an ORM which allows me to do that I’ll go ahead and do it.

Q: This is old news, you should use Rails or liquibase or something else
A: Thanks for your input. Will definitely try those out. My proposal is just a simple convention which works on any environment right now without any learning curve. Also note that the problem I’m solving here is not tracking the difference of 2 database schemas, but tracking what scripts have been run on each server.
Q: How often do you apply changes to the database schema for this to be helpful?
A: Most real world projects which evolve in time usually require changes in database structure. Also, most real world projects need at least a staging server and a production one. This method helps me track what migrations have been run on each server. For example we’ve got around 120 migration scripts for a medium traffic (1m pageviews/day) public site which we’ve been maintaining for 2 years. No project finishes when it goes live.
Q: Shouldn’t this data go into the code repository instead of the database?
A: The database scripts which apply changes to the database are already in source control. The point is to be able to track which ones have been executed in which database servers (dev, staging, production).
Q: The developer can forget and mess this process up with unpredicted results.
A: Yes, the developer can most probably mess any aspect of a software project.

Improved svn post-commit hook for hudson

Wednesday, March 24th, 2010

Hudson’s wiki entry about the Subversion plugin explains how to setup a post-commit svn hook so commits trigger the hudson builds without the need of hudson to constantly poll the repository.

The proposed post-commit hook implementation is good, but what happens when the svn server does not respond is that the commit takes place but the hook blocks for ever. This can be confusing and annoying.

The svn server may not be able to respond because it may be down or can’t be reached. In any case, and as the Fallacies of Distributed Computing explain, the network is unreliable, so a better approach is to add a timeout and retries setting on the command which attempts to notify svn:

UUID=`svnlook uuid $REPOS`
/usr/bin/wget \
  --timeout=2 \
  --tries=2 \
  --header "Content-Type:text/plain;charset=UTF-8" \
  --post-data "`svnlook changed --revision $REV $REPOS`" \
  --output-document "-" \

Wget will now fail if after 2 seconds the svn server hasn’t responded, and will try that twice. After that the user will get an error message but the commit will have been done.

5 sins of subversion usage

Saturday, May 5th, 2007

Whether you develop alone, or with others, there are some things that you should avoid doing when using subversion (or any other version control system).

1. Break the build and/or tests

The problem

It’s very common to break the build because you removed a class or changed a method contract. This can sometimes get frustrating. Others, who continue working on the project, will either have to tell you that you broke the build (thus asking you to fix it), or fix it themselves. This is something that requires communication and time. If more than one person fixes the problems, the possibility of conflicts is high.


a) Do a clean build and run all tests before committing. That is Shift+F11 and ALT+F6 in NetBeans.
b) Use a Continuous Integration system which does this for you.

2. No comments or bad comments

The problem

Examples of bad comments (yes, the first one is an empty string):

  • uploaded
  • Deleted some files
  • Improvements

These comments are useless. Everybody can see from the log that was “uploaded”, or that some files where deleted. It is also not useful to say “improvements”, because it is very general, and also because it’s common sense that you are working on a project only to improve it (and not to make it worse).


Always write comments about changes in the software, and not about which files have been changed. Try to comment in terms of software modules, requests and bug fixes. You could even use issue numbers from your bug tracking software. Examples of good comments would include:

  • Changed authentication to use cookies instead of the http session
  • Changed from dbcp to c3p0
  • First stages of html/css layout integration
  • mock dao tests
  • fixed mail scheduler (Issue 5532)
  • Reporting engine now supports PDF export

One of the advantages of having comments such as the above is that you can easily select a range of dates and produce a change log.

3. Committing useless files

The problem

Some people commit stuff such as:

Sometimes Thumbs.db and _notes end up deployed on the production server. This is useless, bad and inappropriate.


Never commit stuff that should not be committed. Learn how to use the ignore/exclude patterns of your favourite svn client. In TortoiseSVN a the exclude/ignore pattern “thumbs.db _notes dist build nbproject” will do the job.

4. Not committing for days (or weeks)

The problem

A developer does not commit for days (or weeks) because he is lazy, always leaves from work in a rush, or hasn’t got a compiling application (for days… or weeks). Some reasons why this is very bad are:

  1. Code reviewer can’t review the code.
  2. Developer might be working on something that has already been done before, but nobody can tell him so.
  3. Developer might be trying to solve problem in a wrong way, but nobody can see that.
  4. Anxiety levels of Project Lead increases because he doesn’t know about work being done.
  5. Merging becomes really hard due to many (and difficult to resolve) conflicts.
  6. The possibility of completely loosing work increases as hard drives tend to fail when you least expect it.


Commit at least once a day. If you are working on a feature or fix that is completely unrelated with work that others do, consider branching.

5. Not following naming/structure conventions

The problem

When a repository is shared between many different people with many different projects, things can get messy:

svn:// site coca cola
svn://οι εικόνες


Use naming and structure conventions. Make up your own, propose them in your team and adopt them. A possible convention could be:

  1. Only lower case in folders
  2. No native characters in folders
  3. Only dashes in folders (no spaces or underscores)
  4. Folder structure always uses root/account name/project name/project artifact

These simple rules greatly improve company’s xyz repository structure:


Good luck.

Interesting reads:
KDE SVN Commit Policy
Version Control with Subversion