The worst codebase I’ve seen in my life

March 30th, 2011

I recently inherited a huge legacy project (let’s call it FailApp) which features brutally substandard code. I’ll present the best gems here, not as a rant, but as a case study of what is possible to be delivered in the software industry for a six figures sum of money.

The gems are categorized in three levels and (due to popular demand) each contains an oneliner of what the author should have done instead.

1) the LOLs

The LOLs are code gems that when you discover them you laugh out loud. They are so funny that you’ll mention them to your coworkers while drinking beer or even post them in a forum in order to have a laugh. They are little harmless anecdotes of failure that the average Joe programmer will do. They are usually the result of incompetent project management (stuffing serious projecs with baby developers) and total absense of industry standard practices such as code reviews, unit tests and code conventions.

2) the WTFs

The WTFs are code gems and design decisions which are totally perverted. You’ll discover some of them after an hour worth of debugging and they are not very funny but rather piss you off. They cause problems in the health of the project and they are the reason why the client ditched the original authors of the app and brought it to you.

3) the showstoppers

Things get hard and serious here. The showstoppers are fatal, dangerous and inexplicable design decisions that the original author made which make your brain explode. They are hard to deal with and will usually cause serious bottlenecks to the application and irreversible data corruptions. When you discover them you will immediatelly call an urgent meeting with the management to explain the situation. These are no fun stuff and probably written by people who have no idea of what they are doing. These problems usually cost serious amounts of money.

The code snippets of the FailApp have not been changed at all. In this case there isn’t any “innocent” to protect since serious harm has been already done with this codebase.

So, here we go.

LOL#1
“Prod” in the application context name. So the staging environment URL looks like http://123.12.34.123:7001/FailAppProd and the production http://123.12.34.134:7001/FailAppProd. Good luck remembering which one is which.

Solution: Not use the environment name in the URL at all. If necessary provide variability on deployment descriptors (possibly upon build generation) which configures “dev”, “prod”, “staging” in the context name.

LOL#2

EscapeIlegalWords eiw = new EscapeIlegalWords();
foo = eiw.escapeIlegalWords(foo);

Yes, we’ve got a spelling mistake here and a really bad local variable name. But the real laugh comes when you navigate into the method:

public static String escapeIlegalWords(String code) { 
    code = code.replaceAll("&lt;", "<");
    code = code.replaceAll("&gt;", ">");
    code = code.replaceAll("&amp;", "&");
    code = code.replaceAll("&apos;", "'");
    code = code.replaceAll("&quot;", "\"");
    code = code.replaceAll("&#13;", "");
    return code;
}

- a new instance for calling the static method? ouch…
- oh I see, there aren’t any “illegal words” here, it simply does something with html which should be done in the template anyway…
- damn, this is the opposite of escaping!
- class name doesn’t make any sense at all

Solution: Fix mentioned issues.

LOL#3

session.setAttribute("contentXML", null);
session.removeAttribute("contentXML");
session.setAttribute("contentOwnerList", null);
session.removeAttribute("contentOwnerList");
session.setAttribute("structuresVector", null);
session.removeAttribute("structuresVector");
session.setAttribute("selectedThematic", null);
session.removeAttribute("selectedThematic");
session.setAttribute("selectedTarget", null);
session.removeAttribute("selectedTarget");
session.setAttribute("targetList", null);
session.removeAttribute("targetList");
session.setAttribute("thematicList", null);
session.removeAttribute("thematicList");
session.setAttribute("metadatas", null);
session.removeAttribute("metadatas");

- yes, please do make sure that you really remove those variables from the session… removing them twice will eventually do the trick.

Solution: Remove the setAttribute(…, null) statements because they are redundant.

LOL#4

ArrayList arrActions = new ArrayList();
HashMap order = new HashMap();
public ArrayList getHighLights(int idStructure, int language, int home) { ...

- thank you very much for declaring on concrete type. ever heard of Interfaces and the Collections API design philosophy?

Solution: Declare on Interfaces which is enough for these cases.

LOL#5

ArrayList avlanguages=null;
if (request.getAttribute("avlanguages")!=null){
    avlanguages=(ArrayList)request.getAttribute("avlanguages");
}
<%if (avlanguages!=null && avlanguages.size()>0){%>
<%for(int x=0;x<avlanguages.size();x++) { %>

- yes, this is too much Java code inside JSP pages instead of using JSTL’s <c:forEach...

Solution: The view layer should use JSTL and not JSP scriptlets.

LOL#6

Some spelling mistakes which will definitely complicate greping for information in logs and code:

END GENERTAION THEMATICS.
updateStatusDocumentDeplubishCron()

Solution: Avoid spelling mistakes.

LOL#7

CommonDatosPopUp pop = new CommonDatosPopUp();
pop = (CommonDatosPopUp)popUp[i];

- that is great. thanks for instantiating something only to throw it away. it does make sense in case the constructor is firing a missile though.

Solution: Declare and fetch from array in one line avoiding the unnecessary construction of an object.

LOL#8

ArrayList List = new ArrayList();
List = (ArrayList) baseManagerDao.getPosition(baseItemIdVar);
for (int i = 0; i < List.size(); i++)

- once again, create that empty ArrayList() and then throw it away
- declaration on concrete Collection type
- iterating a Collection datastructure using index
- exceptional naming of variable List to look like the Interface List (I wonder why the language allows such a raping of itself here)

Solution: Fix mentioned problems.

WTF#1

if (cm.getBlockLevels() != null) {
    request.setAttribute("blocklevel", cm.getBlockLevels());
}
if (cm.getUsers() != null) {
    request.setAttribute("users", cm.getUsers());
}
if (pm.getSigGroupPub() != null) {
    request.setAttribute("siggroups", pm.getSigGroupPub());
}

- hm… cm and pm (really bad local variable names) are managers which delegate calls to DAOs. These DAO methods (probably hitting a database) are being called twice
- and why do we have to nullcheck in the first place? if the result is null we can simply put null (remove) into the request attrs. (and no, these variables where not previously set into the request by another piece of code).

Solution: Remove the if statements altogether. If the statements are necessary assign result from managers to local variables to avoid double DAO method execution.

WTF#2

While trying to fix something I searched for “siggroup”. Notice the variations of the key, all apearing in the same (2000 line) controller:

request.setAttribute("siggroups", pm.getSigGroupPub());
request.setAttribute("sigGroups", sigGroups);
request.setAttribute("SigGroups", hm.getSigGroups(Integer...

- zero consistency is the salt of the programmers life…
- the best part though is in the template:

List sig = (List) request.getAttribute("sigGroups");
if(sig == null) {
    sig = (List) request.getAttribute("SigGroups");
}

- good luck finding which collection you are rendering on screen now…

Solution: Be consistent in the naming of your variables, especially those which are being used in another layer of the application, the view layer. Never do such acrobatics with the capitalization of variables. Once again, use proper names.

WTF#3

There is no javadoc and the most crucial parts of the app are commented in a non English language, including non-ascii characters. http://translate.google.com/ to the rescue.

Solution: Unless your company code style guidelines allow it, never use non-English in projects, especially EU funded ones where you’ll need to deliver full source code later on. Also, non-ascii characters may make some systems of the deployment process choke (e.g peers with old editors, old source diff programs, badly installed C.I environments etc). Also they can become the causes of bugs. e.g can you spot the difference between String action = "NULL"; and String action = "ΝULL";? This can cause problems, that’s why some people decide to never allow non-ascii characters into their IDE so they can catch these issues early on.

WTF#4

Classes with 2000 lines of code methods and 14 level nested ifs. These are usually do-everything controllers which serve many unrelated things from the same codebase (page results, binary downloads, csv). These are usually replicated 10 times with minor differences to accomodate slightly different use cases. Nough said.

Solution: It has happened to all of us. It’s called spaghetti code. When it happens try to think of a better design. If you can’t, please consult the lead developer of the project for assistance. Spaghetti code will bite you back sooner than you expect.

WTF#5

pubHighlights = cm.getHighLights(structId, userLang,
					Constants.PREDEFINED_SEARCH);

- ok, we are using constants for variability instead of different methods or inheritance
body of method:

public ArrayList getHighLights(int idStructure, String lang, int home) {
    ...
    if (home == 1) {
        listado = commonDao.getHightLightHome(idStructure);
    } else {
        listado = commonDao.getHightLightPredefinedSearch(idStructure);
    }

- facepalm for home==1 instead of Constant. good luck with debugging when that constant changes
- by the way it turns out that the initial call should send Constants.HOME instead of Constants.PREDEFINED_SEARCH. It just happens that both equals 1.

Solution: Use OOP practices for solving such problems. If you can’t and need to use constants please use enums. If you can’t use enums and need to use primitive constants please do make full use of them (and not partial, as in this case).

WTF#6

Absence of templating reuse. The 150+ JSP templates contain everything from html declarations to website footer (with copyrights and everything). Only with minor and insignificant differences due to inconsistent copy pasting of headers and whole pages with minor changes. Ever heard of include?

Solution: Do not copy paste like crazy. By copy pasting sections instead of reusing them you may end up with templates which are as complex as your code. All templating systems offer facilities for reuse.

WTF#7

Form validation for the brave:

if (appForm.getFileCV() == null || StringUtils.isEmpty(appForm.getFileCV().getFileName())){
	errors.add("fileCV", new ActionError("error.fileCV.required"));
}else if (!appForm.getFileCV().getFileName().toLowerCase().endsWith(".doc") && !appForm.getFileCV().getFileName().toLowerCase().endsWith(".pdf") 
		&& !appForm.getFileCV().getFileName().toLowerCase().endsWith(".rtf") && !appForm.getFileCV().getFileName().toLowerCase().endsWith(".sdc") 
		&& !appForm.getFileCV().getFileName().toLowerCase().endsWith(".zip") )
	errorExtension = true;
if (appForm.getFileLetter() == null || StringUtils.isEmpty(appForm.getFileLetter().getFileName())){
	errors.add("fileLetter", new ActionError("error.fileLetter.required"));
}else if (!appForm.getFileLetter().getFileName().toLowerCase().endsWith(".doc") && !appForm.getFileLetter().getFileName().toLowerCase().endsWith(".pdf") 
		&& !appForm.getFileLetter().getFileName().toLowerCase().endsWith(".rtf") && !appForm.getFileLetter().getFileName().toLowerCase().endsWith(".zip")
		&& !appForm.getFileLetter().getFileName().toLowerCase().endsWith(".zip"))
	errorExtension = true;
/* if (fileForm == null || StringUtils.isEmpty(fileForm.getFileName())){
	errors.add("fileForm", new ActionError("error.fileForm.required"));
}else*/
if(appForm.getFileForm() != null && !StringUtils.isEmpty(appForm.getFileForm().getFileName()))
	if (!appForm.getFileForm().getFileName().toLowerCase().endsWith(".doc") && !appForm.getFileForm().getFileName().toLowerCase().endsWith(".pdf") 
		&& !appForm.getFileForm().getFileName().toLowerCase().endsWith(".rtf") && !appForm.getFileForm().getFileName().toLowerCase().endsWith(".sdc")
		&& !appForm.getFileForm().getFileName().toLowerCase().endsWith(".zip"))
	errorExtension = true;
if(appForm.getFileOther1() != null && !StringUtils.isEmpty(appForm.getFileOther1().getFileName()))
	if (!appForm.getFileOther1().getFileName().toLowerCase().endsWith(".doc") && !appForm.getFileOther1().getFileName().toLowerCase().endsWith(".pdf") 
			&& !appForm.getFileOther1().getFileName().toLowerCase().endsWith(".rtf")&& !appForm.getFileOther1().getFileName().toLowerCase().endsWith(".sdc")
			&& !appForm.getFileOther1().getFileName().toLowerCase().endsWith(".zip"))
		errorExtension = true;
if(appForm.getFileOther2() != null && !StringUtils.isEmpty(appForm.getFileOther2().getFileName()))
	if (!appForm.getFileOther2().getFileName().toLowerCase().endsWith(".doc") && !appForm.getFileOther2().getFileName().toLowerCase().endsWith(".pdf") 
			&& !appForm.getFileOther2().getFileName().toLowerCase().endsWith(".rtf")&& !appForm.getFileOther2().getFileName().toLowerCase().endsWith(".sdc")
			&& !appForm.getFileOther2().getFileName().toLowerCase().endsWith(".zip"))
		errorExtension = true;
if(appForm.getFileOther3() != null && !StringUtils.isEmpty(appForm.getFileOther3().getFileName()))
	if (!appForm.getFileOther3().getFileName().toLowerCase().endsWith(".doc") && !appForm.getFileOther3().getFileName().toLowerCase().endsWith(".pdf") 
			&& !appForm.getFileOther3().getFileName().toLowerCase().endsWith(".rtf")&& !appForm.getFileOther3().getFileName().toLowerCase().endsWith(".sdc")
			&& !appForm.getFileOther3().getFileName().toLowerCase().endsWith(".zip"))
		errorExtension = true;

Solution: Don’t do this. All MVC frameworks offer some sort of validation facility in order to avoid code soup such as the above. Even if it doesn’t, try to extract the most repeated part of the validation into a method which you’ll reuse.

WTF#8

There are methods which return Vector (yes, 1999 called) and it turns out that the result is never being used but instead the purpose of the method is to mutate the parameters. Smart.

Solution: Instead of Vector use a List. Methods should not have sideeffects and when they do they should mention it in the documentation or reveal it in the method name. Also, returning an unmodified object only to sideeffect the arguments is very confusing. If the method’s logic changes during implementation please try to adapt its signature so it keeps making sense.

showstopper#1

cm.getHighLights is an expensive operation (calculates the nested menu of the website) and the following code is supposed to introduce caching into the game:

ArrayList pubHighlights = (ArrayList)request.getSession().getAttribute("publicationsHighlights");
if(pubHighlights == null || pubHighlights.size() == 0)
pubHighlights = cm.getHighLights(structId, userLang, Constants.PREDEFINED_SEARCH);
request.getSession().setAttribute("publicationsHighlights", pubHighlights);

- it stores the result in the http session so there is a lot of waste of memory in case we have many users
- the generated menu takes into account “structId” and “userLang”. The cache key is only one though (“publicationsHighlights” in the session), so if the user changes structId or userLang, the menu stays the same
- changes on the menu structure are not reflected to already cached clients. They’ll see these changes only if they get a new session (come back later, use another browser etc)

Solution: Think of your cache design. Does it make sense or does it brake the UI and the server? In this case the cache should be in the application layer and the key should take into account all parameters which should modify the appearance and behavior of the cached object.

showstopper#2

Application “does things” to the database on view. Things == if stuff are not there it silently creates them, so for example if you visit the about page of the French site and there is no content there (either from CM error or data corruption) it simply creates an empty one and inserts it into the database. This is nice and useful especially when the application thinks that there isn’t any content there, so after a couple of days you’ll find thousand of empty “about” pages under the French site waiting to be fixed by you.

Solution: This is dangerous. Don’t do it. GET requests should rarely modify the database.

showstopper#3

Total incompetence in exception design and handling. The usual anti-pattern of swallow and return new ArrayList() is followed through the system. Database errors are masked and the system goes on doing what it was doing (e.g continuing with other parts of data changes, email dispatching etc).

Solution: Learn about exception handling and how they make sense in your application’s layers. Have a look at 1, 2 and 3.

showstopper#4

“Change navigation element==edit property” anti-pattern. This is sick. Imagine a CRUD page with a couple of filters on top and an entities listing below. In the year filter you choose 2010 and hit GO. The listing is updated with entries from 2010. Now you change the year filter to 2011 but do not hit GO. Instead you hit EDIT on one of the 2010 entities below. What happens is that the 2011 value from the filter is transfered into the (hidden) element of the edit form. As soon as you hit SUBMIT the entity now belongs on 2011. Nice.

Solution: Don’t do this. The application should have clearly defined use cases for modifying objects. Never trust the UI.

showstopper#5

The search is FUBAR. A single search for “foo” issues 200.000 db queries and requires 5 minutes on the production server because:
- it first does “select *” the whole publications database in sorted batches of 1000 back to a collection.
- it then feeds this collection into a method which filters things out.
- while filtering some entity methods are accessed and due to bad fetch plan from hibernate tons of N+1 statements are executed.

Solution: When developing try to fill in your test database with tons of data to see how your use cases scale. For this particular issue, learning about the ORM’s fetch plans and how these relate to your model and use cases is very useful.

showstopper#6

“Toxic servlets hierarchy”. All actions extend something (a base class) which extends servlet. The base class provides a public static spring factory field which is initialized on boot of the servlet. Yes, the only reason of existence of this base class is to provide this field and the actions extend it in order to get access to this public static field. Great.

Solution: Don’t do this. If you use Spring then use it properly.

showstopper#7

Log4j & hibernate initialization rediscovered! Both libraries are being configured in the following fashion:
- read the log4j.properties and hibernate.cfg.xml configuration files from a custom location using a ContextListener
- write contents into a new file in the server’s root folder
- load from there
- documentation states that if application cannot boot the application’s configuration files should be removed from the server’s root folder!

Solution: Both frameworks require the configuration file in the default classpath. No acrobatics required.
The end.

A Java alternative to xsendfile for apache httpd (that works)

December 15th, 2010

X-Sendfile is a special and non-standard HTTP header that when returned from a backend application server, the frontend webserver will start serving the file that was specified in the header. Quoting mod_xsendfile for apache on why is this useful:

  • Some applications require checking for special privileges.
  • Others have to lookup values first (e.g.. from a DB) in order to correctly process a download request.
  • Or store values (download-counters come into mind).
  • etc.

lighttpd and nginx already have this capability built in. In apache httpd though you need install mod_xsendfile. In case you cannot get it working (I couldn’t in a sensible timeframe) or if you are in an environment where you cannot install extra apache modules then your only hope is serving the file via Java.

For the case of access control I’ve seen (and also written) ProtectedFileServe servlets before, which check a condition and manually stream the file back to the caller. Serving the file like that can be error prone and a better solution is to utilize what already exists in the web container, which in the case of Tomcat is the DefaultServlet.

The following example will delegate the filename for a request from /serve?/resources/filename to the default servlet.

/**
 * Enforces an application authorization check before delegating to the
 * default file servlet which will serve the "protected" file
 * (found under /resources)
 *
 * Will require an apache httpd mod rewrite to convert normal requests:
 *   /resources/image1.png
 *   /resources/docs/doc1.pdf
 *
 * into:
 *   /serve?/resources/image1.png
 *   /serve?/resources/docs/doc1.pdf
 *
 */
public class ProtectedFileServe extends HttpServlet {
    protected void doGet(HttpServletRequest req, HttpServletResponse resp) 
                                 throws ServletException, IOException {
        final String query = req.getQueryString();
        if (query!=null && query.startsWith("/resources/") && isLoggedIn(req)) {
            req.getRequestDispatcher(query).forward(req, resp);
            return;
        }
        resp.sendError(HttpServletResponse.SC_UNAUTHORIZED);
    }
    
    /**
     * Determines whether the requested file should be served
     */
    private boolean isLoggedIn(HttpServletRequest request) {
        return ...;
    }
    
}

Map it in web.xml:

<servlet>
    <servlet-name>ProtectedFileServe</servlet-name>
    <servlet-class>com.example.ProtectedFileServe</servlet-class>
    <load-on-startup>1</load-on-startup>
</servlet>
<servlet-mapping>
    <servlet-name>ProtectedFileServe</servlet-name>
    <url-pattern>/serve</url-pattern>
</servlet-mapping>

Now a request to /serve?/resources/foo.jpg will serve the file /resources/foo.jpg via the default servlet only if the user is logged in.

An enhancement to the URL structure is to apply the following mod_rewrite rule in the apache configuration which will allow URLs such as /resources/foo.jpg to correctly reach the servlet:

RewriteEngine on
RewriteCond %{REQUEST_URI} ^/resources/.*
RewriteRule (.*) /serve?$1 [PT,L]

fixing StringIndexOutOfBoundsException on replaceAll

December 15th, 2010
java.lang.StringIndexOutOfBoundsException: String index out of range: 62
        at java.lang.String.charAt(String.java:686)
        at java.util.regex.Matcher.appendReplacement(Matcher.java:703)
        at java.util.regex.Matcher.replaceAll(Matcher.java:813)
        at java.lang.String.replaceAll(String.java:2189)
        at com.example.XslImportsPathFixer.fix(XslImportsPathFixer.java:50)

This “problem” on String#replaceAll can be a mind bender some times forcing you to debug for hours thinking that the regex you’ve specified (1st parameter of replaceAll) is wrong.

If you are getting the above exception then the problem lies on the replacement String (2nd parameter of replaceAll) which most probably contains $ or \.

This is mentioned in the API:

Note that backslashes (\) and dollar signs ($) in the replacement string may cause the results to be different than if it were being treated as a literal replacement string. Dollar signs may be treated as references to captured subsequences as described above, and backslashes are used to escape literal characters in the replacement string.

So, fixing this is as easy as using Matcher.quoteReplacement(…) and in case you still use Java 1.4.2 you can do it using a helper method such as:

/**
* Escaping "$" and "\" for use as replacement values in regexes.
* 
*/
public static String quoteReplacement(String replacement) {
    return replacement.replaceAll("(\\$|\\\\)", "\\\\$0");
}   

This is especially useful in case your replacement string comes from user controlled content where you don’t know whether these special characters will exist.

This “issue” can obviously be avoided if you are first consulting the API when a JDK method seems to be misbehaving but the problem is that this was added in 1.5. So using the latest JDK (when possible) and always consulting the API is a good practice.

A table that should exist in all projects with a database

December 8th, 2010

It’s called schema_version (or migrations, or whatever suits you) and its purpose is to keep track of structural or data changes to the database.
A possible structure (example in MySQL) is:

create table schema_version (
    `when` timestamp not null default CURRENT_TIMESTAMP,
    `key` varchar(256) not null,
    `extra` varchar(256),
    primary key (`key`)
) ENGINE=InnoDB;

insert into schema_version(`key`, `extra`) values ('001', 'schema version');

Whether you add this table from the beggining of the project or just after you’ve deployed the first version to a staging or production server is up to you.

Whenever you need to execute an SQL script to change the database structure or perform a data migration you should be adding a row in that table as well. And do that via an insert statement at the begining or end of that script (which is committed to the project’s code repository).

E.g:

insert into schema_version(`key`, `extra`)
    values ('002', 'FOO-22 user profile enhancement');

another example:

insert into schema_version(`key`, `extra`)
    values ('003', 'FOO-73 contact us form stores host & user agent');

and another example:

insert into schema_version(`key`, `extra`)
    values ('004', 'FOO-62 deletion of non validated user accounts');

This is useful because now you know which database migration scripts you’ve run on each server (development, staging, production):

mysql> select * from schema_version;
+---------------------+-----+-------------------------------------------------+
| when                | key | extra                                           |
+---------------------+-----+-------------------------------------------------+
| 2010-11-22 11:21:39 | 001 | schema version                                  |
| 2010-12-02 17:02:20 | 002 | FOO-22 user profile enhancement                 |
| 2010-12-06 15:55:41 | 003 | FOO-73 contact us form stores host & user agent |
| 2010-12-06 15:58:12 | 004 | FOO-62 deletion of non validated user accounts  |
+---------------------+-----+-------------------------------------------------+

The key and extra columns could contain anything that makes sense to you. I prefer a sequential key and a very short description (with reference to the issue ID, in our case a JIRA project called FOO) of the issue that this migration relates to.

The scripts which represent each of those database migration should be kept in the code repository together with the code and their filename should be prefixed with the key of the migration. e.g:

migrations/001-schema-version.sql
migrations/002-FOO-22-user-profile-enhancement.sql
migrations/003-FOO-73-contact-form-fixes.sql
migrations/004-FOO-62-data-fix.sql
migrations/005-FOO-88-venues-management.sql
migrations/006-FOO-89-venues-management-v2.sql
migrations/007-FOO-78-private-messages-system.sql

This keeps everything tidy and all team members work is now easier:

  • If a new developer jumps in development and has been given an old database dump which only contains migrations up to 004, he can compare that to the repository and see that he needs to execute migrations 005, 006 and 007 in order to bring his database in sync with the code.
  • If a deployer (or operations person) is asked to deploy the latest version of the project he can see what scripts have been run on the target server and act accordingly.

Happy migrations.

Edit: Since I’ve been receiving tons of traffic from dzone and reddit with lots of useful comments I need to followup with this F.A.Q.

Q: The column names of the table are the worst piece of SQL I’ve seen in my life. Why did you choose those names?
A: You are free to change the table to your needs. So, although you are missing the purpose of this post (which was to provide a methodology of tracking changes) here you go:

create table schema_migrations (
    `appliedAt` timestamp not null default CURRENT_TIMESTAMP,
    `migrationCode` varchar(256) not null,
    `extraNotes` varchar(256),
    primary key (`migrationCode`)
) ENGINE=InnoDB;

Since the migrations are 1:1 to the releases of the software project I use varchar for key/migrationCode (instead of int) so I can deal with cases such as:

prod-001
prod-002
prod-003
prod-003-quickfix-1
prod-003-quickfix-2
prod-004
...

Also note that If I’ve got something that has a property best described by “key” then I’ll happily name the column “key” any day. Domain modeling is priority number #1 for me and since I use an ORM which allows me to do that I’ll go ahead and do it.

Q: This is old news, you should use Rails or liquibase or something else
A: Thanks for your input. Will definitely try those out. My proposal is just a simple convention which works on any environment right now without any learning curve. Also note that the problem I’m solving here is not tracking the difference of 2 database schemas, but tracking what scripts have been run on each server.
Q: How often do you apply changes to the database schema for this to be helpful?
A: Most real world projects which evolve in time usually require changes in database structure. Also, most real world projects need at least a staging server and a production one. This method helps me track what migrations have been run on each server. For example we’ve got around 120 migration scripts for a medium traffic (1m pageviews/day) public site which we’ve been maintaining for 2 years. No project finishes when it goes live.
Q: Shouldn’t this data go into the code repository instead of the database?
A: The database scripts which apply changes to the database are already in source control. The point is to be able to track which ones have been executed in which database servers (dev, staging, production).
Q: The developer can forget and mess this process up with unpredicted results.
A: Yes, the developer can most probably mess any aspect of a software project.

ugly hack for finding the caller of a method

October 22nd, 2010

Imagine the following scenario:

  • you are trying to trace a bug in an old, ugly and improperly constructed legacy application (you know… the application with the 2kloc classes containing 14 nested ifs)
  • access to a debugger is not possible but you can recompile one or more classes and restart the server
  • you need to find who is calling a particular method, but don’t want to insert one thousand print messages in the one thousand places where the method is being called from
  • doing it with AOP gives you a headache

If all the above are true, then maybe the only hope is placing the following line in the beginning of the method in question:

System.out.println("called from: " + 
            new RuntimeException().getStackTrace()[1]);

Yes, this is ugly but it does the job. As the application is running it will print traces such as:

called from: com.spaghetti.backend.LoginAction.execute(LoginAction.java:3689)
called from: com.spaghetti.backend.LaunchMissilesAction.execute(LaunchMissilesAction.java:1062)
called from: com.spaghetti.StringHelper.getTime(StringHelper.java:501)
called from: com.spaghetti.TimeUtils.capitaliseString(TimeUtils.java:1723)

Good luck, and don’t forget to remove the print statement before committing. You don’t want the next maintainers to freak out ;)

producing a beep in a Windows Shell

October 18th, 2010

You can do it by pressing CTRL+G or ALT+7 (numeric keyboard) in a shell.
If you need it in a batch file then do @echo ^G.

Some cases where you’d need this:

  1. You are inside a messy server room with tons of windows desktop boxes (sad) and a KVM switch for interfacing. The boxes have no IP or name stickers on them (sad) and you need to detect which one is the box that the KVM screen and keyboard currently points to. Access to the cables gordian knot is obviously a time consuming task.
  2. You have a batch file you need to execute and want a beep notification when it ends because you are working on something else at the same time.

modulating the throughput in JMeter for better longevity stress tests

September 2nd, 2010

When running a longevity stress test with JMeter (a test which runs for many days) you may need to emulate a load which approximates the real traffic that the site is receiving in production. And that is definitelly not a steady and constant load during the duration of the full 24 hour cycle.

Most normal sites (not twitter or facebook) tend to receive different amounts of traffic during a day. Although it depends on the nature of the site, usually the traffic will look like a sine wave with a wave length of 1 day. Even if it doesn’t look as smooth as a sine wave, a sine modulating throughput will be much better than testing with constant one. Having a constant throuput can mess up with the data you receive from the test since the application, db and o/s level caches and other systems of the stack (e.g the GC) may tune to the specific constant throughput.

So, first of all we need to setup some variables in the JMeter test.
JMeter variables setup
Setting oscillationsPerDay to 1 is what we want.

Next we setup a Constant Throughput Timer to reference the hitsPerMinute variable. Note that the initial value of this variable doesn’t play any role since we’ll be constantly changing this via a bean shell script.
JMeter Constant Throughput Timer

Lastly we need a BeanShell PreProcessor with the following script:

// variables
double minHitsPerSec = Double.parseDouble(vars.get("minHitsPerSec"));
double maxHitsPerSec = Double.parseDouble(vars.get("maxHitsPerSec"));
double oscillationsPerDay  = Double.parseDouble(vars.get("oscillationsPerDay"));

// calculation
double oscillationFrequency = 1000L * 60 * 60 * 24 / oscillationsPerDay;
double range = maxHitsPerSec - minHitsPerSec;
double hitsPerSecond = Math.sin(System.currentTimeMillis()/oscillationFrequency*(Math.PI*2))*range/2+range/2+minHitsPerSec;

// set
vars.put("hitsPerMinute", String.valueOf(hitsPerSecond*60));

// log
log.info("throughput: " + hitsPerSecond + " hits per second, or " + vars.get("hitsPerMinute") + " hits per minute");

So this will generate a load which will modulate from minHitsPerSec to maxHitsPerSec for as many times per day you need. Of course, you can make the load and requests behavior more realistic by adding a Random Timer.

Disabling quartz and ehcache UpdateChecker

August 16th, 2010

Last year Terracotta acquired ehcache and quartz and it was all good an exciting news. The problem is that since then they’ve included an automatic update checker on these two libraries which is turned on by default!

What this does is to connect to www.terracotta.org as soon as you bootstrap your application, send some info (!) and get a response back on whether you are currently using the latest version of the library.
firewall complaining that a Java process wants to connect to www.terracotta.org

You’ll get something this on your logs for ehcache:

2010-08-16 11:18:04,794 DEBUG (UpdateChecker.java:68) - Checking for update...
2010-08-16 11:18:05,934 INFO  (UpdateChecker.java:98) - New update(s) found: 2.2.0 Please check http://ehcache.org for the latest version.

and for quartz:

2010-08-16 11:15:58,218 DEBUG (UpdateChecker.java:56) - Checking for available updated version of Quartz...
2010-08-16 11:16:01,734 INFO  (UpdateChecker.java:86) - New Quartz update(s) found: 1.8.4 [http://www.terracotta.org/kit/reflector?kitID=default&pageID=QuartzChangeLog]

Terracotta gives an explanation on why they did this but no matter how you try it still makes your brain hurt and wonder what would happen if every vendor of Java libraries did this. Complete misery.

Disabling this check is highly recommended both on development and production.

For ehcache you need to add:

updateCheck="false"

in your ehcache.xml root element (<ehcache>) and:

org.quartz.scheduler.skipUpdateCheck=true

in your quartz.properties file.

More discussions:
ehcache UpdateChecker: http://forums.terracotta.org/forums/posts/list/2701.page
quartz UpdateChecker: http://forums.terracotta.org/forums/posts/list/3395.page

Improved svn post-commit hook for hudson

March 24th, 2010

Hudson’s wiki entry about the Subversion plugin explains how to setup a post-commit svn hook so commits trigger the hudson builds without the need of hudson to constantly poll the repository.

The proposed post-commit hook implementation is good, but what happens when the svn server does not respond is that the commit takes place but the hook blocks for ever. This can be confusing and annoying.

The svn server may not be able to respond because it may be down or can’t be reached. In any case, and as the Fallacies of Distributed Computing explain, the network is unreliable, so a better approach is to add a timeout and retries setting on the command which attempts to notify svn:

REPOS="$1"
REV="$2"
UUID=`svnlook uuid $REPOS`
/usr/bin/wget \
  --timeout=2 \
  --tries=2 \
  --header "Content-Type:text/plain;charset=UTF-8" \
  --post-data "`svnlook changed --revision $REV $REPOS`" \
  --output-document "-" \

http://server/hudson/subversion/${UUID}/notifyCommit?rev=$REV

Wget will now fail if after 2 seconds the svn server hasn’t responded, and will try that twice. After that the user will get an error message but the commit will have been done.

Migrating from tomcat to weblogic

March 11th, 2010

Moving from tomcat to weblogic may sound crazy. In case you need to do it though (e.g for business reasons) here are a couple of things which may go wrong.

First of all the classloader hierarchy in weblogic do not do what you usually expect from other servers such as tomcat, resin, jetty and jboss. If your application uses hibernate (and implicitly ANTLR) you may get the following exception:

Caused by: java.lang.Throwable: Substituted for missing class org.hibernate.QueryException - ClassNotFoundException: org.hibernate.hql.ast.HqlToken [from com.example.model.Person order by id]
        at org.hibernate.hql.ast.HqlLexer.panic(HqlLexer.java:80)
        at antlr.CharScanner.setTokenObjectClass(CharScanner.java:340)
        at org.hibernate.hql.ast.HqlLexer.setTokenObjectClass(HqlLexer.java:54)
        at antlr.CharScanner.<init>(CharScanner.java:51)
        at antlr.CharScanner.<init>(CharScanner.java:60)
        at org.hibernate.hql.antlr.HqlBaseLexer.<init>(HqlBaseLexer.java:56)
...

As explained in the Hibernate3 Migration Guide Weblogic doesn’t seem to support proper class loader isolation, will not see the Hibernate classes in the application’s context and will try to use it’s own version of ANTLR.

In the same fashion you may get the following exception for commons lang:

java.lang.NoSuchMethodError: org.apache.commons.lang.exception.ExceptionUtils.getMessage(Ljava/lang/Throwable;)Ljava/lang/String;

because weblogic internally uses commons lang 2.1 and the one you use may have more API methods.

For both these problems the solution is to instruct weblogic to prefer the jars from the WEB-INF of your application. You need to create a weblogic specific file called weblogic.xml and place it under WEB-INF:

<?xml version="1.0" encoding="UTF-8"?>
<weblogic-web-app>
    <container-descriptor>
        <prefer-web-inf-classes>true</prefer-web-inf-classes>
    </container-descriptor>
</weblogic-web-app>

Another problem is that, like in resin, the default servlet is not named “default” so if you depend on it in web.xml, your application may throw the following at the deployment phase:

Caused by: weblogic.management.DeploymentException: [HTTP:101170]The servlet default is referenced in servlet-mapping *.avi, but not defined in web.xml.

This is because the default servlet is called FileServlet in the web.xml, so you’ll need to change all references in your web.xml from “default” to “FileServlet”.

Last, but not least, tomcat will automatically issue a 302 redirect from http://localhost:8080/context to http://localhost:8080/context/ before allowing your application to do any processing. So all instances of request.getServletPath() will never return an empty string, but will always start with “/”. Weblogic doesn’t do this so http://localhost:8080/context resolves and if your code contains something like:

request.getServletPath().substring(1)

you’ll get:

java.lang.StringIndexOutOfBoundsException: String index out of range: -1

so a safer way to trim this leading slash is by doing:

request.getServletPath().replaceFirst("^/", "")

Good luck, and remember. Every time you use a full blown application server for something that a simple web container would be enough, god kills a kitten.