Archive for the ‘deployment’ Category

A table that should exist in all projects with a database

Wednesday, December 8th, 2010

It’s called schema_version (or migrations, or whatever suits you) and its purpose is to keep track of structural or data changes to the database.
A possible structure (example in MySQL) is:

create table schema_version (
    `when` timestamp not null default CURRENT_TIMESTAMP,
    `key` varchar(256) not null,
    `extra` varchar(256),
    primary key (`key`)

insert into schema_version(`key`, `extra`) values ('001', 'schema version');

Whether you add this table from the beggining of the project or just after you’ve deployed the first version to a staging or production server is up to you.

Whenever you need to execute an SQL script to change the database structure or perform a data migration you should be adding a row in that table as well. And do that via an insert statement at the begining or end of that script (which is committed to the project’s code repository).


insert into schema_version(`key`, `extra`)
    values ('002', 'FOO-22 user profile enhancement');

another example:

insert into schema_version(`key`, `extra`)
    values ('003', 'FOO-73 contact us form stores host & user agent');

and another example:

insert into schema_version(`key`, `extra`)
    values ('004', 'FOO-62 deletion of non validated user accounts');

This is useful because now you know which database migration scripts you’ve run on each server (development, staging, production):

mysql> select * from schema_version;
| when                | key | extra                                           |
| 2010-11-22 11:21:39 | 001 | schema version                                  |
| 2010-12-02 17:02:20 | 002 | FOO-22 user profile enhancement                 |
| 2010-12-06 15:55:41 | 003 | FOO-73 contact us form stores host & user agent |
| 2010-12-06 15:58:12 | 004 | FOO-62 deletion of non validated user accounts  |

The key and extra columns could contain anything that makes sense to you. I prefer a sequential key and a very short description (with reference to the issue ID, in our case a JIRA project called FOO) of the issue that this migration relates to.

The scripts which represent each of those database migration should be kept in the code repository together with the code and their filename should be prefixed with the key of the migration. e.g:


This keeps everything tidy and all team members work is now easier:

  • If a new developer jumps in development and has been given an old database dump which only contains migrations up to 004, he can compare that to the repository and see that he needs to execute migrations 005, 006 and 007 in order to bring his database in sync with the code.
  • If a deployer (or operations person) is asked to deploy the latest version of the project he can see what scripts have been run on the target server and act accordingly.

Happy migrations.

Edit: Since I’ve been receiving tons of traffic from dzone and reddit with lots of useful comments I need to followup with this F.A.Q.

Q: The column names of the table are the worst piece of SQL I’ve seen in my life. Why did you choose those names?
A: You are free to change the table to your needs. So, although you are missing the purpose of this post (which was to provide a methodology of tracking changes) here you go:

create table schema_migrations (
    `appliedAt` timestamp not null default CURRENT_TIMESTAMP,
    `migrationCode` varchar(256) not null,
    `extraNotes` varchar(256),
    primary key (`migrationCode`)

Since the migrations are 1:1 to the releases of the software project I use varchar for key/migrationCode (instead of int) so I can deal with cases such as:


Also note that If I’ve got something that has a property best described by “key” then I’ll happily name the column “key” any day. Domain modeling is priority number #1 for me and since I use an ORM which allows me to do that I’ll go ahead and do it.

Q: This is old news, you should use Rails or liquibase or something else
A: Thanks for your input. Will definitely try those out. My proposal is just a simple convention which works on any environment right now without any learning curve. Also note that the problem I’m solving here is not tracking the difference of 2 database schemas, but tracking what scripts have been run on each server.
Q: How often do you apply changes to the database schema for this to be helpful?
A: Most real world projects which evolve in time usually require changes in database structure. Also, most real world projects need at least a staging server and a production one. This method helps me track what migrations have been run on each server. For example we’ve got around 120 migration scripts for a medium traffic (1m pageviews/day) public site which we’ve been maintaining for 2 years. No project finishes when it goes live.
Q: Shouldn’t this data go into the code repository instead of the database?
A: The database scripts which apply changes to the database are already in source control. The point is to be able to track which ones have been executed in which database servers (dev, staging, production).
Q: The developer can forget and mess this process up with unpredicted results.
A: Yes, the developer can most probably mess any aspect of a software project.

Disabling quartz and ehcache UpdateChecker

Monday, August 16th, 2010

Last year Terracotta acquired ehcache and quartz and it was all good an exciting news. The problem is that since then they’ve included an automatic update checker on these two libraries which is turned on by default!

What this does is to connect to as soon as you bootstrap your application, send some info (!) and get a response back on whether you are currently using the latest version of the library.
firewall complaining that a Java process wants to connect to

You’ll get something this on your logs for ehcache:

2010-08-16 11:18:04,794 DEBUG ( - Checking for update...
2010-08-16 11:18:05,934 INFO  ( - New update(s) found: 2.2.0 Please check for the latest version.

and for quartz:

2010-08-16 11:15:58,218 DEBUG ( - Checking for available updated version of Quartz...
2010-08-16 11:16:01,734 INFO  ( - New Quartz update(s) found: 1.8.4 []

Terracotta gives an explanation on why they did this but no matter how you try it still makes your brain hurt and wonder what would happen if every vendor of Java libraries did this. Complete misery.

Disabling this check is highly recommended both on development and production.

For ehcache you need to add:


in your ehcache.xml root element (<ehcache>) and:


in your file.

More discussions:
ehcache UpdateChecker:
quartz UpdateChecker:

Migrating from tomcat to weblogic

Thursday, March 11th, 2010

Moving from tomcat to weblogic may sound crazy. In case you need to do it though (e.g for business reasons) here are a couple of things which may go wrong.

First of all the classloader hierarchy in weblogic do not do what you usually expect from other servers such as tomcat, resin, jetty and jboss. If your application uses hibernate (and implicitly ANTLR) you may get the following exception:

Caused by: java.lang.Throwable: Substituted for missing class org.hibernate.QueryException - ClassNotFoundException: org.hibernate.hql.ast.HqlToken [from com.example.model.Person order by id]
        at org.hibernate.hql.ast.HqlLexer.panic(
        at antlr.CharScanner.setTokenObjectClass(
        at org.hibernate.hql.ast.HqlLexer.setTokenObjectClass(
        at antlr.CharScanner.<init>(
        at antlr.CharScanner.<init>(
        at org.hibernate.hql.antlr.HqlBaseLexer.<init>(

As explained in the Hibernate3 Migration Guide Weblogic doesn’t seem to support proper class loader isolation, will not see the Hibernate classes in the application’s context and will try to use it’s own version of ANTLR.

In the same fashion you may get the following exception for commons lang:

java.lang.NoSuchMethodError: org.apache.commons.lang.exception.ExceptionUtils.getMessage(Ljava/lang/Throwable;)Ljava/lang/String;

because weblogic internally uses commons lang 2.1 and the one you use may have more API methods.

For both these problems the solution is to instruct weblogic to prefer the jars from the WEB-INF of your application. You need to create a weblogic specific file called weblogic.xml and place it under WEB-INF:

<?xml version="1.0" encoding="UTF-8"?>

Another problem is that, like in resin, the default servlet is not named “default” so if you depend on it in web.xml, your application may throw the following at the deployment phase:

Caused by: [HTTP:101170]The servlet default is referenced in servlet-mapping *.avi, but not defined in web.xml.

This is because the default servlet is called FileServlet in the web.xml, so you’ll need to change all references in your web.xml from “default” to “FileServlet”.

Last, but not least, tomcat will automatically issue a 302 redirect from http://localhost:8080/context to http://localhost:8080/context/ before allowing your application to do any processing. So all instances of request.getServletPath() will never return an empty string, but will always start with “/”. Weblogic doesn’t do this so http://localhost:8080/context resolves and if your code contains something like:


you’ll get:

java.lang.StringIndexOutOfBoundsException: String index out of range: -1

so a safer way to trim this leading slash is by doing:

request.getServletPath().replaceFirst("^/", "")

Good luck, and remember. Every time you use a full blown application server for something that a simple web container would be enough, god kills a kitten.

clearing (or marking) MySQL slow query log

Tuesday, July 28th, 2009

The MySQL slow query log is a very useful feature which all applications in production should have enabled. It logs all queries which complete in more seconds than what the long_query_time server variable specifies.

Sometimes you need to clear this log without restarting mysqld. You could simply erase the file (in Linux) but the file handle would be lost and no further logging would take place.

The best way to purge the log in Linux without having to restart the server is:

cat /dev/null > /path/to/logfile

Another way is to erase or rename the logfile and then run the following statement in mysql:

flush logs;


When you want to do that in Windows though things are different.

I haven’t yet found a way to safely remove the log file while mysqld is running. What I’m left with is to mark the current position in the slow query log so I can look it up and examine the log from that point onwards. The way this can be done is by executing a slow query (slow enough to be logged) and the safest of all looks like:

select "prod-25", sleep(4);

I can now quickly navigate to “prod-25” in the slow query log; which by the way is a marker of the 25th production release of a system which I’m tunning.

The deployment process could be automatically executing (probably via ant) such a marker query in order to keep all the historic (slow log) data grouped by releases for future analysis.

Simple DoS protection with mod_security

Wednesday, July 22nd, 2009

ModSecurity™ is an open source, free web application firewall (WAF) Apache module. It provides protection from a range of attacks against web applications and allows for HTTP traffic monitoring and real-time analysis with little or no changes to existing infrastructure.

It can do many things for you, such as detecting for XSS, SQL injection or file inclusion attacks.

A special use of mod_security can be simple protection from DoS attacks. Suppose your apache or application logs reveal that some specific IP is requesting too many pages per second (e.g 30 pages/sec from a single IP when your normal peak is 5 globally). In the best case scenario this could result in a slight decrease of the performance of the site which could be noticeable by the other users. In the worst case scenario it could bring the whole site down (denial of service). This attack could of course be unintentional. A misconfigured crawler or a spam bot could be the source of the problem, but in any case you’d like to block such requests.

Here is a possible configuration for mod_security to prevent those simple DoS attacks with explanatory comments:

SecRuleEngine On

SecAuditEngine RelevantOnly
SecAuditLogType Serial
SecAuditLog logs/mod_security.log

# a folder where mod_security will store data variables
SecDataDir logs/mod_security-data

# ignore requests from localhost or some other IP
SecRule REMOTE_ADDR "^127\.0\.0\.1$" "phase:1,nolog,allow"

# for all non static urls count requests per second per ip
# (increase var requests by one, expires in 1 second)
SecRule REQUEST_BASENAME "!(\.avi$|\.bmp$|\.css$|\.doc$|\.flv$|\.gif$|\

# if there where more than 5 requests per second for this IP
# set var block to 1 (expires in 5 seconds) and increase var blocks by one (expires in an hour)
SecRule ip:requests "@eq 5" "phase:1,pass,nolog,setvar:ip.block=1,expirevar:ip.block=5,setvar:ip.blocks=+1,expirevar:ip.blocks=3600"

# if user was blocked more than 5 times (var blocks>5), log and return http 403
SecRule ip:blocks "@ge 5" "phase:1,deny,log,logdata:'req/sec: %{ip.requests}, blocks: %{ip.blocks}',status:403"

# if user is blocked (var block=1), log and return http 403
SecRule ip:block "@eq 1" "phase:1,deny,log,logdata:'req/sec: %{ip.requests}, blocks: %{ip.blocks}',status:403"

# 403 is some static page or message
ErrorDocument 403 "<center><h2>take it easy yo!"

In case you experiment with this configuration on production make sure you keep an eye on mod_security.log to validate that you are really blocking out requests that you intend to.

Good luck!

fixing javax.mail.MessagingException: Could not connect to SMTP host

Wednesday, July 22nd, 2009

You’ve done everything right. You are using of the JavaMail API with the correct settings and still it doesn’t manage to connect to the SMTP host to dispatch the email. You are on a windows machine and the exception looks like:

javax.mail.MessagingException: Could not connect to SMTP host:, port: 25;
  nested exception is: Software caused connection abort: connect
	at com.sun.mail.smtp.SMTPTransport.openServer(
	at com.sun.mail.smtp.SMTPTransport.protocolConnect(
	at javax.mail.Service.connect(
	at javax.mail.Service.connect(
	at javax.mail.Service.connect(
	at javax.mail.Transport.send0(
	at javax.mail.Transport.send(
Caused by: Software caused connection abort: connect
	at Method)
	at com.sun.mail.util.SocketFetcher.createSocket(
	at com.sun.mail.util.SocketFetcher.getSocket(
	at com.sun.mail.smtp.SMTPTransport.openServer(
	... 40 more

One possibility is that something is blocking JavaMail from connecting to the local or a remote SMTP host, and this something can be an anti-virus.


p.s Why would a sysadmin want a resident shield anti-virus on a production box serving web content via tomcat still remains a mystery to me

mod_expires and Cache Killers

Sunday, May 3rd, 2009

Rule 3 of Steve Souders’ YSlow suggests that websites should Add a far future Expires header to the components. Components with a cache header could be static files such as those with extensions .css, .js, .jpg, .png, .gif etc. This gives a huge boost in client side performance of users with a primed cache. In apache this is done via mod_expires and an example configuration would be:

ExpiresActive On
ExpiresByType image/x-icon "access plus 1 month"
ExpiresByType text/css "access plus 1 month"
ExpiresByType application/javascript "access plus 1 month"
ExpiresByType image/gif "access plus 1 month"
ExpiresByType image/jpeg "access plus 1 month"
ExpiresByType image/png "access plus 1 month"

All this works well until you need to update a cached static file. The users with the primed cache will either have to wait 1 month to get the new file, or explicitly invalidate their cache. Some people will even ask their users to do a hard refresh but this obviously does not scale and it’s not very robust.

Since you cannot send an automatic signal to the browsers to reload those files all you can do is change the URL of those files (explicit invalidation). You could simply rename all those files, but an easier way to achieve the same effect is by adding a fake (unused – dummy) parameter at the end of the resource URL:

<img src="logo.jpg?2" />

The next logical step would be to automate this into the build system and have every production release feature new cache killer tokens. It seems that many well known sites do that already:



@import "/css/189/global.css";

@import '';


What happens with images referenced from within css files? You could rewrite the css files automatically as part of your production build process with Ant.

    <format property="cacheKill" pattern="yyyyMMddhhmm" locale="en,UK"/>

<target name="rewrite-css">
    <replace dir="${build.web.dir}" value="css?${cacheKill}&quot;)"><include name="css/**/*.css"/><include name="scripts/**/*.css"/><replacetoken>css&quot;)</replacetoken></replace>
    <replace dir="${build.web.dir}" value="png?${cacheKill}&quot;)"><include name="css/**/*.css"/><include name="scripts/**/*.css"/><replacetoken>png&quot;)</replacetoken></replace>
    <replace dir="${build.web.dir}" value="gif?${cacheKill}&quot;)"><include name="css/**/*.css"/><include name="scripts/**/*.css"/><replacetoken>gif&quot;)</replacetoken></replace>
    <replace dir="${build.web.dir}" value="jpg?${cacheKill}&quot;)"><include name="css/**/*.css"/><include name="scripts/**/*.css"/><replacetoken>jpg&quot;)</replacetoken></replace>
    <replace dir="${build.web.dir}" value="css?${cacheKill}')"><include name="css/**/*.css"/><include name="scripts/**/*.css"/><replacetoken>css')</replacetoken></replace>
    <replace dir="${build.web.dir}" value="png?${cacheKill}')"><include name="css/**/*.css"/><include name="scripts/**/*.css"/><replacetoken>png')</replacetoken></replace>
    <replace dir="${build.web.dir}" value="gif?${cacheKill}')"><include name="css/**/*.css"/><include name="scripts/**/*.css"/><replacetoken>gif')</replacetoken></replace>
    <replace dir="${build.web.dir}" value="jpg?${cacheKill}')"><include name="css/**/*.css"/><include name="scripts/**/*.css"/><replacetoken>jpg')</replacetoken></replace>
    <replace dir="${build.web.dir}" value="css?${cacheKill})"><include name="css/**/*.css"/><include name="scripts/**/*.css"/><replacetoken>css)</replacetoken></replace>
    <replace dir="${build.web.dir}" value="png?${cacheKill})"><include name="css/**/*.css"/><include name="scripts/**/*.css"/><replacetoken>png)</replacetoken></replace>
    <replace dir="${build.web.dir}" value="gif?${cacheKill})"><include name="css/**/*.css"/><include name="scripts/**/*.css"/><replacetoken>gif)</replacetoken></replace>
    <replace dir="${build.web.dir}" value="jpg?${cacheKill})"><include name="css/**/*.css"/><include name="scripts/**/*.css"/><replacetoken>jpg)</replacetoken></replace>

This will take care of the following background image reference styles for css, png, gif and jpg files:

... background-image: url("images/ed-bg.gif");
... background-image: url('images/ed-bg.gif');
... background-image: url(images/ed-bg.gif);

and convert them to:

... background-image: url("images/ed-bg.gif?200905031126");
... background-image: url('images/ed-bg.gif?200905031126');
... background-image: url(images/ed-bg.gif?200905031126);

Good luck!

A better SMTPAppender

Saturday, May 2nd, 2009

SMTPAppender for log4j is a type of appender which sends emails via an SMTP server. It’s very useful for applications released in production where you’d definitely need to know of all application errors logged. Of course every caring developer should look at the server logs every now and then, but if you’ve got hundreds of them (applications) then it becomes a full time job in itself.

Sometimes a fresh release of a high traffic website may produce hundreds or thousands of ERROR level log events. Many times this may be something minor which is being logged deep inside your code. Until the bug is fixed and a new release is deployed, your inbox and the mail server may suffer heavily.

What follows is an extension of SMTPAppender which limits the amount of emails sent in a specified period of time. It features sensible defaults which of course can be configured externally via the log4j configuration file.

package com.cherouvim;

import org.apache.log4j.Logger;

public class LimitedSMTPAppender extends SMTPAppender {

    private int limit = 10;           // max at 10 mails ...
    private int cycleSeconds = 3600;  // ... per hour

    public void setLimit(int limit) {
        this.limit = limit;

    public void setCycleSeconds(int cycleSeconds) {
        this.cycleSeconds = cycleSeconds;

    private int lastVisited;
    private long lastCycle;

    protected boolean checkEntryConditions() {
        final long now = System.currentTimeMillis();
        final long thisCycle =  now - (now % (1000L*cycleSeconds));
        if (lastCycle!=thisCycle) {
            lastCycle = thisCycle;
            lastVisited = 0;
        return super.checkEntryConditions() && lastVisited<=limit;


The configuration would look something like this:

log4j.appender.???.Subject=An error occured
log4j.appender.???.layout.ConversionPattern=%d{ISO8601} %-5p (%F:%L) - %m%n

The above configuration will limit the mail dispatch to only 3 emails per minute. Any further errors in that minute will not be emailed. The limit and cycleSeconds setting lines can be omitted and the defaults will be applied.

Happy logging!

robots.txt control for host aliases via mod_rewrite

Saturday, February 21st, 2009

Suppose you have a website launched at two different hosts.

<VirtualHost *:80>

The content is the same but you want to serve a different robots.txt file, possibly excluding any indexing from the secondary host.

It would be handy if we could simply say:

User-agent: *

User-agent: *

to allow all bots crawl the primary host and dissalow them from the secondary one, but this syntax is imaginary. Firstly there is no Allow keyword in the spec, and secondly URLs must be relative.

The solution is to have 2 different robots.txt files:


User-agent: *


User-agent: *
Dissalow: /

and serve them via mod_rewrite like this:

<VirtualHost *:80>
    RewriteEngine On
    RewriteCond %{HTTP_HOST} ^www\.example\.com$
    RewriteRule ^/robots.txt$ /robots-www.txt [L]
    RewriteCond %{HTTP_HOST} ^beta\.example\.com$
    RewriteRule ^/robots.txt$ /robots-beta.txt [L]

Now will silently serve robots-www.txt and will serve robots-beta.txt

This is also handy in domain name migration periods where you are waiting for dns to flush all around the globe until you feel safe for completely shutting down the secondary host and possibly assigning 301 redirects to the primary.

Best way to integrate Google Analytics

Thursday, November 6th, 2008

This is the best solution to integrate Google Analytics into your site. It uses ideas and code from the following 2 sites:

Here is the improved snippet:

<script type="text/javascript" src=""></script>
<script type="text/javascript">
    if (typeof(_gat)=='object')
            _gat._getTracker("UA-1234567-8")._trackPageview()}, 1500);


  1. Does not use document.write
  2. Fixes the “_gat is not defined” issue
  3. Delayed GA access (1.5 second) so the page load time is not affected
  4. Very concise

Another improvement would be to cache ga.js locally to avoid the DNS lookup.