Archive for 2010

A Java alternative to xsendfile for apache httpd (that works)

Wednesday, December 15th, 2010

X-Sendfile is a special and non-standard HTTP header that when returned from a backend application server, the frontend webserver will start serving the file that was specified in the header. Quoting mod_xsendfile for apache on why is this useful:

  • Some applications require checking for special privileges.
  • Others have to lookup values first (e.g.. from a DB) in order to correctly process a download request.
  • Or store values (download-counters come into mind).
  • etc.

lighttpd and nginx already have this capability built in. In apache httpd though you need install mod_xsendfile. In case you cannot get it working (I couldn’t in a sensible timeframe) or if you are in an environment where you cannot install extra apache modules then your only hope is serving the file via Java.

For the case of access control I’ve seen (and also written) ProtectedFileServe servlets before, which check a condition and manually stream the file back to the caller. Serving the file like that can be error prone and a better solution is to utilize what already exists in the web container, which in the case of Tomcat is the DefaultServlet.

The following example will delegate the filename for a request from /serve?/resources/filename to the default servlet.

/**
 * Enforces an application authorization check before delegating to the
 * default file servlet which will serve the "protected" file
 * (found under /resources)
 *
 * Will require an apache httpd mod rewrite to convert normal requests:
 *   /resources/image1.png
 *   /resources/docs/doc1.pdf
 *
 * into:
 *   /serve?/resources/image1.png
 *   /serve?/resources/docs/doc1.pdf
 *
 */
public class ProtectedFileServe extends HttpServlet {
    protected void doGet(HttpServletRequest req, HttpServletResponse resp) 
                                 throws ServletException, IOException {
        final String query = req.getQueryString();
        if (query!=null && query.startsWith("/resources/") && isLoggedIn(req)) {
            req.getRequestDispatcher(query).forward(req, resp);
            return;
        }
        resp.sendError(HttpServletResponse.SC_UNAUTHORIZED);
    }
    
    /**
     * Determines whether the requested file should be served
     */
    private boolean isLoggedIn(HttpServletRequest request) {
        return ...;
    }
    
}

Map it in web.xml:

<servlet>
    <servlet-name>ProtectedFileServe</servlet-name>
    <servlet-class>com.example.ProtectedFileServe</servlet-class>
    <load-on-startup>1</load-on-startup>
</servlet>
<servlet-mapping>
    <servlet-name>ProtectedFileServe</servlet-name>
    <url-pattern>/serve</url-pattern>
</servlet-mapping>

Now a request to /serve?/resources/foo.jpg will serve the file /resources/foo.jpg via the default servlet only if the user is logged in.

An enhancement to the URL structure is to apply the following mod_rewrite rule in the apache configuration which will allow URLs such as /resources/foo.jpg to correctly reach the servlet:

RewriteEngine on
RewriteCond %{REQUEST_URI} ^/resources/.*
RewriteRule (.*) /serve?$1 [PT,L]

fixing StringIndexOutOfBoundsException on replaceAll

Wednesday, December 15th, 2010
java.lang.StringIndexOutOfBoundsException: String index out of range: 62
        at java.lang.String.charAt(String.java:686)
        at java.util.regex.Matcher.appendReplacement(Matcher.java:703)
        at java.util.regex.Matcher.replaceAll(Matcher.java:813)
        at java.lang.String.replaceAll(String.java:2189)
        at com.example.XslImportsPathFixer.fix(XslImportsPathFixer.java:50)

This “problem” on String#replaceAll can be a mind bender some times forcing you to debug for hours thinking that the regex you’ve specified (1st parameter of replaceAll) is wrong.

If you are getting the above exception then the problem lies on the replacement String (2nd parameter of replaceAll) which most probably contains $ or \.

This is mentioned in the API:

Note that backslashes (\) and dollar signs ($) in the replacement string may cause the results to be different than if it were being treated as a literal replacement string. Dollar signs may be treated as references to captured subsequences as described above, and backslashes are used to escape literal characters in the replacement string.

So, fixing this is as easy as using Matcher.quoteReplacement(…) and in case you still use Java 1.4.2 you can do it using a helper method such as:

/**
* Escaping "$" and "\" for use as replacement values in regexes.
* 
*/
public static String quoteReplacement(String replacement) {
    return replacement.replaceAll("(\\$|\\\\)", "\\\\$0");
}   

This is especially useful in case your replacement string comes from user controlled content where you don’t know whether these special characters will exist.

This “issue” can obviously be avoided if you are first consulting the API when a JDK method seems to be misbehaving but the problem is that this was added in 1.5. So using the latest JDK (when possible) and always consulting the API is a good practice.

A table that should exist in all projects with a database

Wednesday, December 8th, 2010

It’s called schema_version (or migrations, or whatever suits you) and its purpose is to keep track of structural or data changes to the database.
A possible structure (example in MySQL) is:

create table schema_version (
    `when` timestamp not null default CURRENT_TIMESTAMP,
    `key` varchar(256) not null,
    `extra` varchar(256),
    primary key (`key`)
) ENGINE=InnoDB;

insert into schema_version(`key`, `extra`) values ('001', 'schema version');

Whether you add this table from the beggining of the project or just after you’ve deployed the first version to a staging or production server is up to you.

Whenever you need to execute an SQL script to change the database structure or perform a data migration you should be adding a row in that table as well. And do that via an insert statement at the begining or end of that script (which is committed to the project’s code repository).

E.g:

insert into schema_version(`key`, `extra`)
    values ('002', 'FOO-22 user profile enhancement');

another example:

insert into schema_version(`key`, `extra`)
    values ('003', 'FOO-73 contact us form stores host & user agent');

and another example:

insert into schema_version(`key`, `extra`)
    values ('004', 'FOO-62 deletion of non validated user accounts');

This is useful because now you know which database migration scripts you’ve run on each server (development, staging, production):

mysql> select * from schema_version;
+---------------------+-----+-------------------------------------------------+
| when                | key | extra                                           |
+---------------------+-----+-------------------------------------------------+
| 2010-11-22 11:21:39 | 001 | schema version                                  |
| 2010-12-02 17:02:20 | 002 | FOO-22 user profile enhancement                 |
| 2010-12-06 15:55:41 | 003 | FOO-73 contact us form stores host & user agent |
| 2010-12-06 15:58:12 | 004 | FOO-62 deletion of non validated user accounts  |
+---------------------+-----+-------------------------------------------------+

The key and extra columns could contain anything that makes sense to you. I prefer a sequential key and a very short description (with reference to the issue ID, in our case a JIRA project called FOO) of the issue that this migration relates to.

The scripts which represent each of those database migration should be kept in the code repository together with the code and their filename should be prefixed with the key of the migration. e.g:

migrations/001-schema-version.sql
migrations/002-FOO-22-user-profile-enhancement.sql
migrations/003-FOO-73-contact-form-fixes.sql
migrations/004-FOO-62-data-fix.sql
migrations/005-FOO-88-venues-management.sql
migrations/006-FOO-89-venues-management-v2.sql
migrations/007-FOO-78-private-messages-system.sql

This keeps everything tidy and all team members work is now easier:

  • If a new developer jumps in development and has been given an old database dump which only contains migrations up to 004, he can compare that to the repository and see that he needs to execute migrations 005, 006 and 007 in order to bring his database in sync with the code.
  • If a deployer (or operations person) is asked to deploy the latest version of the project he can see what scripts have been run on the target server and act accordingly.

Happy migrations.

Edit: Since I’ve been receiving tons of traffic from dzone and reddit with lots of useful comments I need to followup with this F.A.Q.

Q: The column names of the table are the worst piece of SQL I’ve seen in my life. Why did you choose those names?
A: You are free to change the table to your needs. So, although you are missing the purpose of this post (which was to provide a methodology of tracking changes) here you go:

create table schema_migrations (
    `appliedAt` timestamp not null default CURRENT_TIMESTAMP,
    `migrationCode` varchar(256) not null,
    `extraNotes` varchar(256),
    primary key (`migrationCode`)
) ENGINE=InnoDB;

Since the migrations are 1:1 to the releases of the software project I use varchar for key/migrationCode (instead of int) so I can deal with cases such as:

prod-001
prod-002
prod-003
prod-003-quickfix-1
prod-003-quickfix-2
prod-004
...

Also note that If I’ve got something that has a property best described by “key” then I’ll happily name the column “key” any day. Domain modeling is priority number #1 for me and since I use an ORM which allows me to do that I’ll go ahead and do it.

Q: This is old news, you should use Rails or liquibase or something else
A: Thanks for your input. Will definitely try those out. My proposal is just a simple convention which works on any environment right now without any learning curve. Also note that the problem I’m solving here is not tracking the difference of 2 database schemas, but tracking what scripts have been run on each server.
Q: How often do you apply changes to the database schema for this to be helpful?
A: Most real world projects which evolve in time usually require changes in database structure. Also, most real world projects need at least a staging server and a production one. This method helps me track what migrations have been run on each server. For example we’ve got around 120 migration scripts for a medium traffic (1m pageviews/day) public site which we’ve been maintaining for 2 years. No project finishes when it goes live.
Q: Shouldn’t this data go into the code repository instead of the database?
A: The database scripts which apply changes to the database are already in source control. The point is to be able to track which ones have been executed in which database servers (dev, staging, production).
Q: The developer can forget and mess this process up with unpredicted results.
A: Yes, the developer can most probably mess any aspect of a software project.

ugly hack for finding the caller of a method

Friday, October 22nd, 2010

Imagine the following scenario:

  • you are trying to trace a bug in an old, ugly and improperly constructed legacy application (you know… the application with the 2kloc classes containing 14 nested ifs)
  • access to a debugger is not possible but you can recompile one or more classes and restart the server
  • you need to find who is calling a particular method, but don’t want to insert one thousand print messages in the one thousand places where the method is being called from
  • doing it with AOP gives you a headache

If all the above are true, then maybe the only hope is placing the following line in the beginning of the method in question:

System.out.println("called from: " + 
            new RuntimeException().getStackTrace()[1]);

Yes, this is ugly but it does the job. As the application is running it will print traces such as:

called from: com.spaghetti.backend.LoginAction.execute(LoginAction.java:3689)
called from: com.spaghetti.backend.LaunchMissilesAction.execute(LaunchMissilesAction.java:1062)
called from: com.spaghetti.StringHelper.getTime(StringHelper.java:501)
called from: com.spaghetti.TimeUtils.capitaliseString(TimeUtils.java:1723)

Good luck, and don’t forget to remove the print statement before committing. You don’t want the next maintainers to freak out ;)

producing a beep in a Windows Shell

Monday, October 18th, 2010

You can do it by pressing CTRL+G or ALT+7 (numeric keyboard) in a shell.
If you need it in a batch file then do @echo ^G.

Some cases where you’d need this:

  1. You are inside a messy server room with tons of windows desktop boxes (sad) and a KVM switch for interfacing. The boxes have no IP or name stickers on them (sad) and you need to detect which one is the box that the KVM screen and keyboard currently points to. Access to the cables gordian knot is obviously a time consuming task.
  2. You have a batch file you need to execute and want a beep notification when it ends because you are working on something else at the same time.

modulating the throughput in JMeter for better longevity stress tests

Thursday, September 2nd, 2010

When running a longevity stress test with JMeter (a test which runs for many days) you may need to emulate a load which approximates the real traffic that the site is receiving in production. And that is definitelly not a steady and constant load during the duration of the full 24 hour cycle.

Most normal sites (not twitter or facebook) tend to receive different amounts of traffic during a day. Although it depends on the nature of the site, usually the traffic will look like a sine wave with a wave length of 1 day. Even if it doesn’t look as smooth as a sine wave, a sine modulating throughput will be much better than testing with constant one. Having a constant throuput can mess up with the data you receive from the test since the application, db and o/s level caches and other systems of the stack (e.g the GC) may tune to the specific constant throughput.

So, first of all we need to setup some variables in the JMeter test.
JMeter variables setup
Setting oscillationsPerDay to 1 is what we want.

Next we setup a Constant Throughput Timer to reference the hitsPerMinute variable. Note that the initial value of this variable doesn’t play any role since we’ll be constantly changing this via a bean shell script.
JMeter Constant Throughput Timer

Lastly we need a BeanShell PreProcessor with the following script:

// variables
double minHitsPerSec = Double.parseDouble(vars.get("minHitsPerSec"));
double maxHitsPerSec = Double.parseDouble(vars.get("maxHitsPerSec"));
double oscillationsPerDay  = Double.parseDouble(vars.get("oscillationsPerDay"));

// calculation
double oscillationFrequency = 1000L * 60 * 60 * 24 / oscillationsPerDay;
double range = maxHitsPerSec - minHitsPerSec;
double hitsPerSecond = Math.sin(System.currentTimeMillis()/oscillationFrequency*(Math.PI*2))*range/2+range/2+minHitsPerSec;

// set
vars.put("hitsPerMinute", String.valueOf(hitsPerSecond*60));

// log
log.info("throughput: " + hitsPerSecond + " hits per second, or " + vars.get("hitsPerMinute") + " hits per minute");

So this will generate a load which will modulate from minHitsPerSec to maxHitsPerSec for as many times per day you need. Of course, you can make the load and requests behavior more realistic by adding a Random Timer.

Disabling quartz and ehcache UpdateChecker

Monday, August 16th, 2010

Last year Terracotta acquired ehcache and quartz and it was all good an exciting news. The problem is that since then they’ve included an automatic update checker on these two libraries which is turned on by default!

What this does is to connect to www.terracotta.org as soon as you bootstrap your application, send some info (!) and get a response back on whether you are currently using the latest version of the library.
firewall complaining that a Java process wants to connect to www.terracotta.org

You’ll get something this on your logs for ehcache:

2010-08-16 11:18:04,794 DEBUG (UpdateChecker.java:68) - Checking for update...
2010-08-16 11:18:05,934 INFO  (UpdateChecker.java:98) - New update(s) found: 2.2.0 Please check http://ehcache.org for the latest version.

and for quartz:

2010-08-16 11:15:58,218 DEBUG (UpdateChecker.java:56) - Checking for available updated version of Quartz...
2010-08-16 11:16:01,734 INFO  (UpdateChecker.java:86) - New Quartz update(s) found: 1.8.4 [http://www.terracotta.org/kit/reflector?kitID=default&pageID=QuartzChangeLog]

Terracotta gives an explanation on why they did this but no matter how you try it still makes your brain hurt and wonder what would happen if every vendor of Java libraries did this. Complete misery.

Disabling this check is highly recommended both on development and production.

For ehcache you need to add:

updateCheck="false"

in your ehcache.xml root element (<ehcache>) and:

org.quartz.scheduler.skipUpdateCheck=true

in your quartz.properties file.

More discussions:
ehcache UpdateChecker: http://forums.terracotta.org/forums/posts/list/2701.page
quartz UpdateChecker: http://forums.terracotta.org/forums/posts/list/3395.page

Improved svn post-commit hook for hudson

Wednesday, March 24th, 2010

Hudson’s wiki entry about the Subversion plugin explains how to setup a post-commit svn hook so commits trigger the hudson builds without the need of hudson to constantly poll the repository.

The proposed post-commit hook implementation is good, but what happens when the svn server does not respond is that the commit takes place but the hook blocks for ever. This can be confusing and annoying.

The svn server may not be able to respond because it may be down or can’t be reached. In any case, and as the Fallacies of Distributed Computing explain, the network is unreliable, so a better approach is to add a timeout and retries setting on the command which attempts to notify svn:

REPOS="$1"
REV="$2"
UUID=`svnlook uuid $REPOS`
/usr/bin/wget \
  --timeout=2 \
  --tries=2 \
  --header "Content-Type:text/plain;charset=UTF-8" \
  --post-data "`svnlook changed --revision $REV $REPOS`" \
  --output-document "-" \

http://server/hudson/subversion/${UUID}/notifyCommit?rev=$REV

Wget will now fail if after 2 seconds the svn server hasn’t responded, and will try that twice. After that the user will get an error message but the commit will have been done.

Migrating from tomcat to weblogic

Thursday, March 11th, 2010

Moving from tomcat to weblogic may sound crazy. In case you need to do it though (e.g for business reasons) here are a couple of things which may go wrong.

First of all the classloader hierarchy in weblogic do not do what you usually expect from other servers such as tomcat, resin, jetty and jboss. If your application uses hibernate (and implicitly ANTLR) you may get the following exception:

Caused by: java.lang.Throwable: Substituted for missing class org.hibernate.QueryException - ClassNotFoundException: org.hibernate.hql.ast.HqlToken [from com.example.model.Person order by id]
        at org.hibernate.hql.ast.HqlLexer.panic(HqlLexer.java:80)
        at antlr.CharScanner.setTokenObjectClass(CharScanner.java:340)
        at org.hibernate.hql.ast.HqlLexer.setTokenObjectClass(HqlLexer.java:54)
        at antlr.CharScanner.<init>(CharScanner.java:51)
        at antlr.CharScanner.<init>(CharScanner.java:60)
        at org.hibernate.hql.antlr.HqlBaseLexer.<init>(HqlBaseLexer.java:56)
...

As explained in the Hibernate3 Migration Guide Weblogic doesn’t seem to support proper class loader isolation, will not see the Hibernate classes in the application’s context and will try to use it’s own version of ANTLR.

In the same fashion you may get the following exception for commons lang:

java.lang.NoSuchMethodError: org.apache.commons.lang.exception.ExceptionUtils.getMessage(Ljava/lang/Throwable;)Ljava/lang/String;

because weblogic internally uses commons lang 2.1 and the one you use may have more API methods.

For both these problems the solution is to instruct weblogic to prefer the jars from the WEB-INF of your application. You need to create a weblogic specific file called weblogic.xml and place it under WEB-INF:

<?xml version="1.0" encoding="UTF-8"?>
<weblogic-web-app>
    <container-descriptor>
        <prefer-web-inf-classes>true</prefer-web-inf-classes>
    </container-descriptor>
</weblogic-web-app>

Another problem is that, like in resin, the default servlet is not named “default” so if you depend on it in web.xml, your application may throw the following at the deployment phase:

Caused by: weblogic.management.DeploymentException: [HTTP:101170]The servlet default is referenced in servlet-mapping *.avi, but not defined in web.xml.

This is because the default servlet is called FileServlet in the web.xml, so you’ll need to change all references in your web.xml from “default” to “FileServlet”.

Last, but not least, tomcat will automatically issue a 302 redirect from http://localhost:8080/context to http://localhost:8080/context/ before allowing your application to do any processing. So all instances of request.getServletPath() will never return an empty string, but will always start with “/”. Weblogic doesn’t do this so http://localhost:8080/context resolves and if your code contains something like:

request.getServletPath().substring(1)

you’ll get:

java.lang.StringIndexOutOfBoundsException: String index out of range: -1

so a safer way to trim this leading slash is by doing:

request.getServletPath().replaceFirst("^/", "")

Good luck, and remember. Every time you use a full blown application server for something that a simple web container would be enough, god kills a kitten.