Archive for the ‘databases’ Category

A table that should exist in all projects with a database

Wednesday, December 8th, 2010

It’s called schema_version (or migrations, or whatever suits you) and its purpose is to keep track of structural or data changes to the database.
A possible structure (example in MySQL) is:

create table schema_version (
    `when` timestamp not null default CURRENT_TIMESTAMP,
    `key` varchar(256) not null,
    `extra` varchar(256),
    primary key (`key`)
) ENGINE=InnoDB;

insert into schema_version(`key`, `extra`) values ('001', 'schema version');

Whether you add this table from the beggining of the project or just after you’ve deployed the first version to a staging or production server is up to you.

Whenever you need to execute an SQL script to change the database structure or perform a data migration you should be adding a row in that table as well. And do that via an insert statement at the begining or end of that script (which is committed to the project’s code repository).

E.g:

insert into schema_version(`key`, `extra`)
    values ('002', 'FOO-22 user profile enhancement');

another example:

insert into schema_version(`key`, `extra`)
    values ('003', 'FOO-73 contact us form stores host & user agent');

and another example:

insert into schema_version(`key`, `extra`)
    values ('004', 'FOO-62 deletion of non validated user accounts');

This is useful because now you know which database migration scripts you’ve run on each server (development, staging, production):

mysql> select * from schema_version;
+---------------------+-----+-------------------------------------------------+
| when                | key | extra                                           |
+---------------------+-----+-------------------------------------------------+
| 2010-11-22 11:21:39 | 001 | schema version                                  |
| 2010-12-02 17:02:20 | 002 | FOO-22 user profile enhancement                 |
| 2010-12-06 15:55:41 | 003 | FOO-73 contact us form stores host & user agent |
| 2010-12-06 15:58:12 | 004 | FOO-62 deletion of non validated user accounts  |
+---------------------+-----+-------------------------------------------------+

The key and extra columns could contain anything that makes sense to you. I prefer a sequential key and a very short description (with reference to the issue ID, in our case a JIRA project called FOO) of the issue that this migration relates to.

The scripts which represent each of those database migration should be kept in the code repository together with the code and their filename should be prefixed with the key of the migration. e.g:

migrations/001-schema-version.sql
migrations/002-FOO-22-user-profile-enhancement.sql
migrations/003-FOO-73-contact-form-fixes.sql
migrations/004-FOO-62-data-fix.sql
migrations/005-FOO-88-venues-management.sql
migrations/006-FOO-89-venues-management-v2.sql
migrations/007-FOO-78-private-messages-system.sql

This keeps everything tidy and all team members work is now easier:

  • If a new developer jumps in development and has been given an old database dump which only contains migrations up to 004, he can compare that to the repository and see that he needs to execute migrations 005, 006 and 007 in order to bring his database in sync with the code.
  • If a deployer (or operations person) is asked to deploy the latest version of the project he can see what scripts have been run on the target server and act accordingly.

Happy migrations.

Edit: Since I’ve been receiving tons of traffic from dzone and reddit with lots of useful comments I need to followup with this F.A.Q.

Q: The column names of the table are the worst piece of SQL I’ve seen in my life. Why did you choose those names?
A: You are free to change the table to your needs. So, although you are missing the purpose of this post (which was to provide a methodology of tracking changes) here you go:

create table schema_migrations (
    `appliedAt` timestamp not null default CURRENT_TIMESTAMP,
    `migrationCode` varchar(256) not null,
    `extraNotes` varchar(256),
    primary key (`migrationCode`)
) ENGINE=InnoDB;

Since the migrations are 1:1 to the releases of the software project I use varchar for key/migrationCode (instead of int) so I can deal with cases such as:

prod-001
prod-002
prod-003
prod-003-quickfix-1
prod-003-quickfix-2
prod-004
...

Also note that If I’ve got something that has a property best described by “key” then I’ll happily name the column “key” any day. Domain modeling is priority number #1 for me and since I use an ORM which allows me to do that I’ll go ahead and do it.

Q: This is old news, you should use Rails or liquibase or something else
A: Thanks for your input. Will definitely try those out. My proposal is just a simple convention which works on any environment right now without any learning curve. Also note that the problem I’m solving here is not tracking the difference of 2 database schemas, but tracking what scripts have been run on each server.
Q: How often do you apply changes to the database schema for this to be helpful?
A: Most real world projects which evolve in time usually require changes in database structure. Also, most real world projects need at least a staging server and a production one. This method helps me track what migrations have been run on each server. For example we’ve got around 120 migration scripts for a medium traffic (1m pageviews/day) public site which we’ve been maintaining for 2 years. No project finishes when it goes live.
Q: Shouldn’t this data go into the code repository instead of the database?
A: The database scripts which apply changes to the database are already in source control. The point is to be able to track which ones have been executed in which database servers (dev, staging, production).
Q: The developer can forget and mess this process up with unpredicted results.
A: Yes, the developer can most probably mess any aspect of a software project.

clearing (or marking) MySQL slow query log

Tuesday, July 28th, 2009

The MySQL slow query log is a very useful feature which all applications in production should have enabled. It logs all queries which complete in more seconds than what the long_query_time server variable specifies.

Sometimes you need to clear this log without restarting mysqld. You could simply erase the file (in Linux) but the file handle would be lost and no further logging would take place.

The best way to purge the log in Linux without having to restart the server is:

cat /dev/null > /path/to/logfile

Another way is to erase or rename the logfile and then run the following statement in mysql:

flush logs;

references:
http://bugs.mysql.com/bug.php?id=3337
http://lists-archives.org/mysql/26837-purge-slow-query-log.html

When you want to do that in Windows though things are different.

I haven’t yet found a way to safely remove the log file while mysqld is running. What I’m left with is to mark the current position in the slow query log so I can look it up and examine the log from that point onwards. The way this can be done is by executing a slow query (slow enough to be logged) and the safest of all looks like:

select "prod-25", sleep(4);

I can now quickly navigate to “prod-25″ in the slow query log; which by the way is a marker of the 25th production release of a system which I’m tunning.

The deployment process could be automatically executing (probably via ant) such a marker query in order to keep all the historic (slow log) data grouped by releases for future analysis.

fix hibernate+ehcache: miss for sql

Saturday, February 21st, 2009

If you are using an entity as a named parameter in a hibernate Query or Criteria which is cachable from ehcache then this entity needs to implement hashcode and equals using a business key. Otherwise the hibernate Query or Criteria may always “look different” to ehcache so it will be a constant cache miss.

DEBUG (MemoryStore.java:138) - query.FooBarCache: query.FooBarMemoryStore miss for sql: /* criteria query */ select this_.id as y0_ from foobars this_ where this_.state=?; parameters: LIVE; max rows: 1; transformer: org.hibernate.transform.PassThroughResultTransformer@294633f0
DEBUG (Cache.java:808) - query.FooBar cache - Miss

My favourite string for testing web applications

Monday, February 16th, 2009

Weird title, huh?

When creating templates, pages and action responses for your web application you really need to take HTML escaping into consideration. Sometimes the use cases of the system are so many that you may omit HTML escaping for some piece of dynamic or user entered text.

HTML escaping means converting <foo>bar & " to &lt;foo&gt;bar &amp; &quot; in the HTML source.

One of the reasons for HTML escaping is to avoid XSS attacks or simply to make your site valid.

Reasons for making your HTML output valid include:

  1. It’s the “right thing to do”
  2. Does not tire the browser
  3. Allows you to manually detect (via CTRL+SHIFT+A on web developer toolbar) for real HTML output errors
  4. Saves you from css rendering issues due to HTML anomalies
  5. Ensures your content is easily parsable from third party agents (crawlers, scrappers, XSLT transformators etc)

So my favourite string is <script’”<i>
You can alter your development database content using statements like these:

update articles set title=concat("<script'\"<i>", title);
update users set firstname=concat("<script'\"<i>", lastname), lastname=concat("<script'\"<i>", lastname);
update categories set title=concat("<script'\"<i>", title);
...

If after this database content change your site is not functional then there is a problem. You can also check for HTML validity with CTRL+SHIFT+A on web developer toolbar and quickly spot areas where you missed HTML escaping.

You could even automate this whole process by having a tool (JTidy?) scan that all your pages and use cases produce valid HTML. So indirectly you would be testing for insecure (in XSS terms) parts of the application.

HTML escaping in JSTL
HTML escaping in freemarker
HTML escaping in velocity

The * stupidest things I’ve done in my programming job

Saturday, February 7th, 2009

I’m not ashamed of those sins any more, so here you go :)

1. ORM

Stupidity
Building my own Object Relational Mapping framework.
Consequence
Project is a mess after 2 years of maintenance with hardcore hacks to bypass my own ORM and call custom SQL queries.
What should I have done
Use hibernate, iBATIS, Cayenne or something similar.

2. EAV

Stupidity
Using an Entity-Attribute-Value model database schema design.
Consequence
Non scalable solution and total impossibility to run any useful queries on the database level.
What should I have done
Use an ordinary normalized database schema design.

3. Database Access

Stupidity
Synchronize (serialize) database access using one shared connection.
Consequence
Zero scalability. Very slow response times when more than 10 users where using the application.
What should I have done
Don’t do that and use a connection pool such as c3p0 and use a “new” (reused) connection returned from the pool for every request/response cycle.

4. IDE

Stupidity
Avoided learning and using an Integrated development environment.
Consequence
Inability to build test and deploy the application quickly and generally do anything useful.
What should I have done
Get familiar with an IDE. NetBeans, eclipse etc.

5. Transactions

Stupidity
Not using them.
Consequence
Corrupt data in an application involving e-shop like functionality.
What should I have done
Use database transactions. When in MySQL use InnoDB.

6. Prepared Statements

Stupidity
Using Statements, string concatenation and naive character escaping to assemble my own “safe” queries.
Consequence
SQL Injections possible in my application. I managed to login using ‘or 1=1;delete from users;– and alter the database state in a very nasty way.
What should I have done
Use Prepared Statements which correctly assemble and escape the query properly depending on the JDBC driver used.

7. Business Logic

Stupidity
Doing it in the template (JSP).
Consequence
Messy non maintainable application.
What should I have done
Do it in an MVC style with servlets or with a Front Controller. Even better by using an existing open source MVC framework such as Struts, Spring MVC etc.

Of course, all the bad choices above have probably made me a better programmer.

Disabling foreign key generation in hbm2ddl

Monday, August 13th, 2007

When generating the database schema using the hbm2ddl tools, foreign keys are being created for every relation. You can enhance the schema by using the foreign-key attribute to specify your own name for a particular foreign key.

But, what happens when you don’t want a foreign key generated?

There is an undocumented behavior of the foreign-key attribute, and that is to specify foreign-key=”none”. hbm2ddl will not create an FK for a relation which has such an attribute. The hibernate documentation and the two bibles Hibernate in Action and Java Persistence with Hibernate state absolutely nothing about this feature. I found it after hardcore googling in the following hibernate changelog:

Changes in version 2.1.9 (xx.x.xxxx)
------------------------------------
* foreign-key="none" can be used to disable generation of a foreign key.

Later on, I found another blog mentioning this behavior: http://blog.xebia.com/2007/02/05/let-hibernate-connect-your-world/

The most logical question now is “why don’t you want an FK?”. That depends on the system you are building. Sometimes you might have a table in your application which will be CRUDed from another system, which you do not control. You may want to be able to keep references (ids) to entities which may be deleted. foreign-key=”none” together with not-found=”ignore” can solve these kind of problems.

Overcoming the MySQL BIT datatype problems with hibernate

Friday, June 15th, 2007

I’m fond of optimization. When I code or design my database schema I try to avoid waisting CPU cycles or storage space (at least without a good reason). So when my domain class has the following field:

private boolean active; // determines whether this Person is active

I will let hibernate and the MySQL5InnoDBDialect choose what is most appropriate:

<property name="active" not-null="true" />

In that case it will generate a BIT:

...
active bit not null,
...

The problem

So far so good…
…until you read the blog post called “Why you should not use BIT columns in MySQL” by Xaprb.
Another serious deficiency is the fact that a database dump will not export bit data as “0″ or “1″. Depending on the tool used to dump and the MySQL server version you may find one of the following:

INSERT INTO `person` VALUES (1,"foo","\\0");
INSERT INTO `person` VALUES (2,"bar","");

or

INSERT INTO `person` VALUES (1,"foo"," ");
INSERT INTO `person` VALUES (2,"bar"," ");

The third field is of datatype BIT. Row number 1 is false and row number 2 is true. The problem with that is that some MySQL client tools cannot import such things. It gives you an ERROR 1064 (42000) at line 23: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ” at line 1

Solution #1

Hand edit the sql script and change all false bits to 0 and all true bits to 1, the script can be imported like a charm.

Solution #2

Extend MySQL5InnoDBDialect and make all BITs rendered as TinyInt(1). The code is very simple:

package com.foo.hibernate;

import java.sql.Types;
import org.hibernate.dialect.MySQL5InnoDBDialect;

public class MySQL5InnoDBDialectBitFixed extends MySQL5InnoDBDialect {

  public MySQL5InnoDBDialectBitFixed() {
    super();
    registerColumnType(Types.BIT, "tinyint(1)");
  }
  
}

Now when using the MySQL5InnoDBDialectBitFixed dialect, hbm2ddl will generate:

...
active tinyint(1) not null,
...

Until we get better 5.x MySQL versions, with better BIT support, this plan should do the job nicely.

Good luck

SQL Server + hbm2ddl + unicode columns

Tuesday, May 1st, 2007

Hibernate offers org.hibernate.dialect.SQLServerDialect as the dialect for SQL Server. When generating the database schema, using hbm2ddl, the string type columns do not support native characters. So the following mapping:

<property name="title" length="128" />

will produce the following SQL:

...
title varchar(128) null,
...

By extending the org.hibernate.dialect.SQLServerDialect we can achieve the generation of NCHAR, NVARCHAR, and NTEXT columns instead of CHAR, VARCHAR and TEXT.

package com.foo.hibernate;

import java.sql.Types;
import org.hibernate.dialect.SQLServerDialect;

public class SQLServerNativeDialect extends SQLServerDialect{

  public SQLServerNativeDialect() {
    super();
    registerColumnType(Types.CHAR, "nchar(1)");
    registerColumnType(Types.VARCHAR, "nvarchar($l)");
    registerColumnType(Types.LONGVARCHAR, "nvarchar($l)");
    registerColumnType(Types.CLOB, "ntext");
  }

}

All we need to do now is plug this dialect in our hibernate configuration:

<property name="hibernate.dialect">
  com.foo.hibernate.SQLServerNativeDialect
</property>

Related hibernate forums thread: http://forum.hibernate.org/viewtopic.php?t=972518
Related API method: http://www.hibernate.org/hib_docs/v3/api/org/hibernate/dialect/Dialect.html

Database Connection Pooling

Friday, February 23rd, 2007

You are building a webapp. You want database connectivity. You want pooling (because its an expensive resource). You start building your own database connection pool. STOP!!!

Who told you that you can do it well? Why did you hack your own connection pool implementation, which is seriously broken, spawns thousands of threads and turns the server into a miserable piece of shit that needs restart every 24h?

Wrong choice my friend. Next time do us all a favour and use one of the following:

And don’t forget: The standard idiom for releasing a connection is to close (return) the connection in a finally block.

Connection con = getPooledConnectionFromSomewhere();
try {
  // do stuff with connection
} catch (SQLException e) {
  // handle problems
} finally {
  con.close();
}

Of course, closing the connection can throw an SQLException, but it’s up to you on how you will handle it.

Good luck

DataSource exposed through JNDI

Friday, February 23rd, 2007

You are building a webapp and you want database connection pooling. Your container can help you manage this javax.sql.DataSource by configuring it and exposing it through the JNDI tree.

Containers usually come with Jakarta Commons DBCP out of the box. In order to use it edit the context.xml file of your webapp and set your datasource there.

<?xml version="1.0" encoding="UTF-8"?>
 <Context path="/foo-app">

 <!-- TOMCAT 5.5.xx DESCRIPTOR -->
 <Resource name="foo"
  auth="Container"
  type="javax.sql.DataSource"
  maxActive="1"
  maxIdle="1"
  maxWait="3000"
  username="user"
  password="pass"
  driverClassName="net.sourceforge.jtds.jdbc.Driver"
  url="jdbc:jtds:sqlserver://localhost:1433;DatabaseName=foobar;charset=utf8"
 />

 <!-- TOMCAT 5.0.28 DESCRIPTOR -->
 <!--
  <Resource name="foo" type="javax.sql.DataSource"/>
  <ResourceParams name="foo">
   <parameter><name>maxActive</name><value>1</value></parameter>
   <parameter><name>maxIdle</name><value>1</value></parameter>
   <parameter><name>maxWait</name><value>3000</value></parameter>
   <parameter><name>username</name><value>user</value></parameter>
   <parameter><name>password</name><value>pass</value></parameter>
   <parameter><name>driverClassName</name><value>net.sourceforge.jtds.jdbc.Driver</value></parameter>
   <parameter><name>url</name><value>jdbc:jtds:sqlserver://localhost:1433;DatabaseName=foobar;charset=utf8</value></parameter>
  </ResourceParams>
  -->

</Context>

Tomcat 5.0.xx and 5.5.xx uses different xml syntax for most of it’s configuration. Here I’m presenting both with 5.0.xx’s block commented out.

So, you’ve got your app called “foo-app”. Next time you’ll deploy it, tomcat will copy context.xml to ${catalina.home}/conf/localhost/foo-app.xml where from it will be reading the configuration on each context or container initialization. Your Datasource is called “foo”.
In your java code now, what you need to get a reference to “foo” is:

Context ctx = new InitialContext();
DataSource ds = (DataSource)ctx.lookup("java:comp/env/foo");

And then something along the lines:

Connection con = ds.getConnection();

Happy coding.