Archive for the ‘webapps’ Category

The worst codebase I’ve seen in my life

Wednesday, March 30th, 2011

I recently inherited a huge legacy project (let’s call it FailApp) which features brutally substandard code. I’ll present the best gems here, not as a rant, but as a case study of what is possible to be delivered in the software industry for a six figures sum of money.

The gems are categorized in three levels and (due to popular demand) each contains an oneliner of what the author should have done instead.

1) the LOLs

The LOLs are code gems that when you discover them you laugh out loud. They are so funny that you’ll mention them to your coworkers while drinking beer or even post them in a forum in order to have a laugh. They are little harmless anecdotes of failure that the average Joe programmer will do. They are usually the result of incompetent project management (stuffing serious projecs with baby developers) and total absense of industry standard practices such as code reviews, unit tests and code conventions.

2) the WTFs

The WTFs are code gems and design decisions which are totally perverted. You’ll discover some of them after an hour worth of debugging and they are not very funny but rather piss you off. They cause problems in the health of the project and they are the reason why the client ditched the original authors of the app and brought it to you.

3) the showstoppers

Things get hard and serious here. The showstoppers are fatal, dangerous and inexplicable design decisions that the original author made which make your brain explode. They are hard to deal with and will usually cause serious bottlenecks to the application and irreversible data corruptions. When you discover them you will immediatelly call an urgent meeting with the management to explain the situation. These are no fun stuff and probably written by people who have no idea of what they are doing. These problems usually cost serious amounts of money.

The code snippets of the FailApp have not been changed at all. In this case there isn’t any “innocent” to protect since serious harm has been already done with this codebase.

So, here we go.

LOL#1
“Prod” in the application context name. So the staging environment URL looks like http://123.12.34.123:7001/FailAppProd and the production http://123.12.34.134:7001/FailAppProd. Good luck remembering which one is which.

Solution: Not use the environment name in the URL at all. If necessary provide variability on deployment descriptors (possibly upon build generation) which configures “dev”, “prod”, “staging” in the context name.

LOL#2

EscapeIlegalWords eiw = new EscapeIlegalWords();
foo = eiw.escapeIlegalWords(foo);

Yes, we’ve got a spelling mistake here and a really bad local variable name. But the real laugh comes when you navigate into the method:

public static String escapeIlegalWords(String code) { 
    code = code.replaceAll("&lt;", "<");
    code = code.replaceAll("&gt;", ">");
    code = code.replaceAll("&amp;", "&");
    code = code.replaceAll("&apos;", "'");
    code = code.replaceAll("&quot;", "\"");
    code = code.replaceAll("&#13;", "");
    return code;
}

- a new instance for calling the static method? ouch…
- oh I see, there aren’t any “illegal words” here, it simply does something with html which should be done in the template anyway…
- damn, this is the opposite of escaping!
- class name doesn’t make any sense at all

Solution: Fix mentioned issues.

LOL#3

session.setAttribute("contentXML", null);
session.removeAttribute("contentXML");
session.setAttribute("contentOwnerList", null);
session.removeAttribute("contentOwnerList");
session.setAttribute("structuresVector", null);
session.removeAttribute("structuresVector");
session.setAttribute("selectedThematic", null);
session.removeAttribute("selectedThematic");
session.setAttribute("selectedTarget", null);
session.removeAttribute("selectedTarget");
session.setAttribute("targetList", null);
session.removeAttribute("targetList");
session.setAttribute("thematicList", null);
session.removeAttribute("thematicList");
session.setAttribute("metadatas", null);
session.removeAttribute("metadatas");

- yes, please do make sure that you really remove those variables from the session… removing them twice will eventually do the trick.

Solution: Remove the setAttribute(…, null) statements because they are redundant.

LOL#4

ArrayList arrActions = new ArrayList();
HashMap order = new HashMap();
public ArrayList getHighLights(int idStructure, int language, int home) { ...

- thank you very much for declaring on concrete type. ever heard of Interfaces and the Collections API design philosophy?

Solution: Declare on Interfaces which is enough for these cases.

LOL#5

ArrayList avlanguages=null;
if (request.getAttribute("avlanguages")!=null){
    avlanguages=(ArrayList)request.getAttribute("avlanguages");
}
<%if (avlanguages!=null && avlanguages.size()>0){%>
<%for(int x=0;x<avlanguages.size();x++) { %>

- yes, this is too much Java code inside JSP pages instead of using JSTL’s <c:forEach...

Solution: The view layer should use JSTL and not JSP scriptlets.

LOL#6

Some spelling mistakes which will definitely complicate greping for information in logs and code:

END GENERTAION THEMATICS.
updateStatusDocumentDeplubishCron()

Solution: Avoid spelling mistakes.

LOL#7

CommonDatosPopUp pop = new CommonDatosPopUp();
pop = (CommonDatosPopUp)popUp[i];

- that is great. thanks for instantiating something only to throw it away. it does make sense in case the constructor is firing a missile though.

Solution: Declare and fetch from array in one line avoiding the unnecessary construction of an object.

LOL#8

ArrayList List = new ArrayList();
List = (ArrayList) baseManagerDao.getPosition(baseItemIdVar);
for (int i = 0; i < List.size(); i++)

- once again, create that empty ArrayList() and then throw it away
- declaration on concrete Collection type
- iterating a Collection datastructure using index
- exceptional naming of variable List to look like the Interface List (I wonder why the language allows such a raping of itself here)

Solution: Fix mentioned problems.

WTF#1

if (cm.getBlockLevels() != null) {
    request.setAttribute("blocklevel", cm.getBlockLevels());
}
if (cm.getUsers() != null) {
    request.setAttribute("users", cm.getUsers());
}
if (pm.getSigGroupPub() != null) {
    request.setAttribute("siggroups", pm.getSigGroupPub());
}

- hm… cm and pm (really bad local variable names) are managers which delegate calls to DAOs. These DAO methods (probably hitting a database) are being called twice
- and why do we have to nullcheck in the first place? if the result is null we can simply put null (remove) into the request attrs. (and no, these variables where not previously set into the request by another piece of code).

Solution: Remove the if statements altogether. If the statements are necessary assign result from managers to local variables to avoid double DAO method execution.

WTF#2

While trying to fix something I searched for “siggroup”. Notice the variations of the key, all apearing in the same (2000 line) controller:

request.setAttribute("siggroups", pm.getSigGroupPub());
request.setAttribute("sigGroups", sigGroups);
request.setAttribute("SigGroups", hm.getSigGroups(Integer...

- zero consistency is the salt of the programmers life…
- the best part though is in the template:

List sig = (List) request.getAttribute("sigGroups");
if(sig == null) {
    sig = (List) request.getAttribute("SigGroups");
}

- good luck finding which collection you are rendering on screen now…

Solution: Be consistent in the naming of your variables, especially those which are being used in another layer of the application, the view layer. Never do such acrobatics with the capitalization of variables. Once again, use proper names.

WTF#3

There is no javadoc and the most crucial parts of the app are commented in a non English language, including non-ascii characters. http://translate.google.com/ to the rescue.

Solution: Unless your company code style guidelines allow it, never use non-English in projects, especially EU funded ones where you’ll need to deliver full source code later on. Also, non-ascii characters may make some systems of the deployment process choke (e.g peers with old editors, old source diff programs, badly installed C.I environments etc). Also they can become the causes of bugs. e.g can you spot the difference between String action = "NULL"; and String action = "╬ŁULL";? This can cause problems, that’s why some people decide to never allow non-ascii characters into their IDE so they can catch these issues early on.

WTF#4

Classes with 2000 lines of code methods and 14 level nested ifs. These are usually do-everything controllers which serve many unrelated things from the same codebase (page results, binary downloads, csv). These are usually replicated 10 times with minor differences to accomodate slightly different use cases. Nough said.

Solution: It has happened to all of us. It’s called spaghetti code. When it happens try to think of a better design. If you can’t, please consult the lead developer of the project for assistance. Spaghetti code will bite you back sooner than you expect.

WTF#5

pubHighlights = cm.getHighLights(structId, userLang,
					Constants.PREDEFINED_SEARCH);

- ok, we are using constants for variability instead of different methods or inheritance
body of method:

public ArrayList getHighLights(int idStructure, String lang, int home) {
    ...
    if (home == 1) {
        listado = commonDao.getHightLightHome(idStructure);
    } else {
        listado = commonDao.getHightLightPredefinedSearch(idStructure);
    }

- facepalm for home==1 instead of Constant. good luck with debugging when that constant changes
- by the way it turns out that the initial call should send Constants.HOME instead of Constants.PREDEFINED_SEARCH. It just happens that both equals 1.

Solution: Use OOP practices for solving such problems. If you can’t and need to use constants please use enums. If you can’t use enums and need to use primitive constants please do make full use of them (and not partial, as in this case).

WTF#6

Absence of templating reuse. The 150+ JSP templates contain everything from html declarations to website footer (with copyrights and everything). Only with minor and insignificant differences due to inconsistent copy pasting of headers and whole pages with minor changes. Ever heard of include?

Solution: Do not copy paste like crazy. By copy pasting sections instead of reusing them you may end up with templates which are as complex as your code. All templating systems offer facilities for reuse.

WTF#7

Form validation for the brave:

if (appForm.getFileCV() == null || StringUtils.isEmpty(appForm.getFileCV().getFileName())){
	errors.add("fileCV", new ActionError("error.fileCV.required"));
}else if (!appForm.getFileCV().getFileName().toLowerCase().endsWith(".doc") && !appForm.getFileCV().getFileName().toLowerCase().endsWith(".pdf") 
		&& !appForm.getFileCV().getFileName().toLowerCase().endsWith(".rtf") && !appForm.getFileCV().getFileName().toLowerCase().endsWith(".sdc") 
		&& !appForm.getFileCV().getFileName().toLowerCase().endsWith(".zip") )
	errorExtension = true;
if (appForm.getFileLetter() == null || StringUtils.isEmpty(appForm.getFileLetter().getFileName())){
	errors.add("fileLetter", new ActionError("error.fileLetter.required"));
}else if (!appForm.getFileLetter().getFileName().toLowerCase().endsWith(".doc") && !appForm.getFileLetter().getFileName().toLowerCase().endsWith(".pdf") 
		&& !appForm.getFileLetter().getFileName().toLowerCase().endsWith(".rtf") && !appForm.getFileLetter().getFileName().toLowerCase().endsWith(".zip")
		&& !appForm.getFileLetter().getFileName().toLowerCase().endsWith(".zip"))
	errorExtension = true;
/* if (fileForm == null || StringUtils.isEmpty(fileForm.getFileName())){
	errors.add("fileForm", new ActionError("error.fileForm.required"));
}else*/
if(appForm.getFileForm() != null && !StringUtils.isEmpty(appForm.getFileForm().getFileName()))
	if (!appForm.getFileForm().getFileName().toLowerCase().endsWith(".doc") && !appForm.getFileForm().getFileName().toLowerCase().endsWith(".pdf") 
		&& !appForm.getFileForm().getFileName().toLowerCase().endsWith(".rtf") && !appForm.getFileForm().getFileName().toLowerCase().endsWith(".sdc")
		&& !appForm.getFileForm().getFileName().toLowerCase().endsWith(".zip"))
	errorExtension = true;
if(appForm.getFileOther1() != null && !StringUtils.isEmpty(appForm.getFileOther1().getFileName()))
	if (!appForm.getFileOther1().getFileName().toLowerCase().endsWith(".doc") && !appForm.getFileOther1().getFileName().toLowerCase().endsWith(".pdf") 
			&& !appForm.getFileOther1().getFileName().toLowerCase().endsWith(".rtf")&& !appForm.getFileOther1().getFileName().toLowerCase().endsWith(".sdc")
			&& !appForm.getFileOther1().getFileName().toLowerCase().endsWith(".zip"))
		errorExtension = true;
if(appForm.getFileOther2() != null && !StringUtils.isEmpty(appForm.getFileOther2().getFileName()))
	if (!appForm.getFileOther2().getFileName().toLowerCase().endsWith(".doc") && !appForm.getFileOther2().getFileName().toLowerCase().endsWith(".pdf") 
			&& !appForm.getFileOther2().getFileName().toLowerCase().endsWith(".rtf")&& !appForm.getFileOther2().getFileName().toLowerCase().endsWith(".sdc")
			&& !appForm.getFileOther2().getFileName().toLowerCase().endsWith(".zip"))
		errorExtension = true;
if(appForm.getFileOther3() != null && !StringUtils.isEmpty(appForm.getFileOther3().getFileName()))
	if (!appForm.getFileOther3().getFileName().toLowerCase().endsWith(".doc") && !appForm.getFileOther3().getFileName().toLowerCase().endsWith(".pdf") 
			&& !appForm.getFileOther3().getFileName().toLowerCase().endsWith(".rtf")&& !appForm.getFileOther3().getFileName().toLowerCase().endsWith(".sdc")
			&& !appForm.getFileOther3().getFileName().toLowerCase().endsWith(".zip"))
		errorExtension = true;

Solution: Don’t do this. All MVC frameworks offer some sort of validation facility in order to avoid code soup such as the above. Even if it doesn’t, try to extract the most repeated part of the validation into a method which you’ll reuse.

WTF#8

There are methods which return Vector (yes, 1999 called) and it turns out that the result is never being used but instead the purpose of the method is to mutate the parameters. Smart.

Solution: Instead of Vector use a List. Methods should not have sideeffects and when they do they should mention it in the documentation or reveal it in the method name. Also, returning an unmodified object only to sideeffect the arguments is very confusing. If the method’s logic changes during implementation please try to adapt its signature so it keeps making sense.

showstopper#1

cm.getHighLights is an expensive operation (calculates the nested menu of the website) and the following code is supposed to introduce caching into the game:

ArrayList pubHighlights = (ArrayList)request.getSession().getAttribute("publicationsHighlights");
if(pubHighlights == null || pubHighlights.size() == 0)
pubHighlights = cm.getHighLights(structId, userLang, Constants.PREDEFINED_SEARCH);
request.getSession().setAttribute("publicationsHighlights", pubHighlights);

- it stores the result in the http session so there is a lot of waste of memory in case we have many users
- the generated menu takes into account “structId” and “userLang”. The cache key is only one though (“publicationsHighlights” in the session), so if the user changes structId or userLang, the menu stays the same
- changes on the menu structure are not reflected to already cached clients. They’ll see these changes only if they get a new session (come back later, use another browser etc)

Solution: Think of your cache design. Does it make sense or does it brake the UI and the server? In this case the cache should be in the application layer and the key should take into account all parameters which should modify the appearance and behavior of the cached object.

showstopper#2

Application “does things” to the database on view. Things == if stuff are not there it silently creates them, so for example if you visit the about page of the French site and there is no content there (either from CM error or data corruption) it simply creates an empty one and inserts it into the database. This is nice and useful especially when the application thinks that there isn’t any content there, so after a couple of days you’ll find thousand of empty “about” pages under the French site waiting to be fixed by you.

Solution: This is dangerous. Don’t do it. GET requests should rarely modify the database.

showstopper#3

Total incompetence in exception design and handling. The usual anti-pattern of swallow and return new ArrayList() is followed through the system. Database errors are masked and the system goes on doing what it was doing (e.g continuing with other parts of data changes, email dispatching etc).

Solution: Learn about exception handling and how they make sense in your application’s layers. Have a look at 1, 2 and 3.

showstopper#4

“Change navigation element==edit property” anti-pattern. This is sick. Imagine a CRUD page with a couple of filters on top and an entities listing below. In the year filter you choose 2010 and hit GO. The listing is updated with entries from 2010. Now you change the year filter to 2011 but do not hit GO. Instead you hit EDIT on one of the 2010 entities below. What happens is that the 2011 value from the filter is transfered into the (hidden) element of the edit form. As soon as you hit SUBMIT the entity now belongs on 2011. Nice.

Solution: Don’t do this. The application should have clearly defined use cases for modifying objects. Never trust the UI.

showstopper#5

The search is FUBAR. A single search for “foo” issues 200.000 db queries and requires 5 minutes on the production server because:
- it first does “select *” the whole publications database in sorted batches of 1000 back to a collection.
- it then feeds this collection into a method which filters things out.
- while filtering some entity methods are accessed and due to bad fetch plan from hibernate tons of N+1 statements are executed.

Solution: When developing try to fill in your test database with tons of data to see how your use cases scale. For this particular issue, learning about the ORM’s fetch plans and how these relate to your model and use cases is very useful.

showstopper#6

“Toxic servlets hierarchy”. All actions extend something (a base class) which extends servlet. The base class provides a public static spring factory field which is initialized on boot of the servlet. Yes, the only reason of existence of this base class is to provide this field and the actions extend it in order to get access to this public static field. Great.

Solution: Don’t do this. If you use Spring then use it properly.

showstopper#7

Log4j & hibernate initialization rediscovered! Both libraries are being configured in the following fashion:
- read the log4j.properties and hibernate.cfg.xml configuration files from a custom location using a ContextListener
- write contents into a new file in the server’s root folder
- load from there
- documentation states that if application cannot boot the application’s configuration files should be removed from the server’s root folder!

Solution: Both frameworks require the configuration file in the default classpath. No acrobatics required.
The end.

A Java alternative to xsendfile for apache httpd (that works)

Wednesday, December 15th, 2010

X-Sendfile is a special and non-standard HTTP header that when returned from a backend application server, the frontend webserver will start serving the file that was specified in the header. Quoting mod_xsendfile for apache on why is this useful:

  • Some applications require checking for special privileges.
  • Others have to lookup values first (e.g.. from a DB) in order to correctly process a download request.
  • Or store values (download-counters come into mind).
  • etc.

lighttpd and nginx already have this capability built in. In apache httpd though you need install mod_xsendfile. In case you cannot get it working (I couldn’t in a sensible timeframe) or if you are in an environment where you cannot install extra apache modules then your only hope is serving the file via Java.

For the case of access control I’ve seen (and also written) ProtectedFileServe servlets before, which check a condition and manually stream the file back to the caller. Serving the file like that can be error prone and a better solution is to utilize what already exists in the web container, which in the case of Tomcat is the DefaultServlet.

The following example will delegate the filename for a request from /serve?/resources/filename to the default servlet.

/**
 * Enforces an application authorization check before delegating to the
 * default file servlet which will serve the "protected" file
 * (found under /resources)
 *
 * Will require an apache httpd mod rewrite to convert normal requests:
 *   /resources/image1.png
 *   /resources/docs/doc1.pdf
 *
 * into:
 *   /serve?/resources/image1.png
 *   /serve?/resources/docs/doc1.pdf
 *
 */
public class ProtectedFileServe extends HttpServlet {
    protected void doGet(HttpServletRequest req, HttpServletResponse resp) 
                                 throws ServletException, IOException {
        final String query = req.getQueryString();
        if (query!=null && query.startsWith("/resources/") && isLoggedIn(req)) {
            req.getRequestDispatcher(query).forward(req, resp);
            return;
        }
        resp.sendError(HttpServletResponse.SC_UNAUTHORIZED);
    }
    
    /**
     * Determines whether the requested file should be served
     */
    private boolean isLoggedIn(HttpServletRequest request) {
        return ...;
    }
    
}

Map it in web.xml:

<servlet>
    <servlet-name>ProtectedFileServe</servlet-name>
    <servlet-class>com.example.ProtectedFileServe</servlet-class>
    <load-on-startup>1</load-on-startup>
</servlet>
<servlet-mapping>
    <servlet-name>ProtectedFileServe</servlet-name>
    <url-pattern>/serve</url-pattern>
</servlet-mapping>

Now a request to /serve?/resources/foo.jpg will serve the file /resources/foo.jpg via the default servlet only if the user is logged in.

An enhancement to the URL structure is to apply the following mod_rewrite rule in the apache configuration which will allow URLs such as /resources/foo.jpg to correctly reach the servlet:

RewriteEngine on
RewriteCond %{REQUEST_URI} ^/resources/.*
RewriteRule (.*) /serve?$1 [PT,L]

modulating the throughput in JMeter for better longevity stress tests

Thursday, September 2nd, 2010

When running a longevity stress test with JMeter (a test which runs for many days) you may need to emulate a load which approximates the real traffic that the site is receiving in production. And that is definitelly not a steady and constant load during the duration of the full 24 hour cycle.

Most normal sites (not twitter or facebook) tend to receive different amounts of traffic during a day. Although it depends on the nature of the site, usually the traffic will look like a sine wave with a wave length of 1 day. Even if it doesn’t look as smooth as a sine wave, a sine modulating throughput will be much better than testing with constant one. Having a constant throuput can mess up with the data you receive from the test since the application, db and o/s level caches and other systems of the stack (e.g the GC) may tune to the specific constant throughput.

So, first of all we need to setup some variables in the JMeter test.
JMeter variables setup
Setting oscillationsPerDay to 1 is what we want.

Next we setup a Constant Throughput Timer to reference the hitsPerMinute variable. Note that the initial value of this variable doesn’t play any role since we’ll be constantly changing this via a bean shell script.
JMeter Constant Throughput Timer

Lastly we need a BeanShell PreProcessor with the following script:

// variables
double minHitsPerSec = Double.parseDouble(vars.get("minHitsPerSec"));
double maxHitsPerSec = Double.parseDouble(vars.get("maxHitsPerSec"));
double oscillationsPerDay  = Double.parseDouble(vars.get("oscillationsPerDay"));

// calculation
double oscillationFrequency = 1000L * 60 * 60 * 24 / oscillationsPerDay;
double range = maxHitsPerSec - minHitsPerSec;
double hitsPerSecond = Math.sin(System.currentTimeMillis()/oscillationFrequency*(Math.PI*2))*range/2+range/2+minHitsPerSec;

// set
vars.put("hitsPerMinute", String.valueOf(hitsPerSecond*60));

// log
log.info("throughput: " + hitsPerSecond + " hits per second, or " + vars.get("hitsPerMinute") + " hits per minute");

So this will generate a load which will modulate from minHitsPerSec to maxHitsPerSec for as many times per day you need. Of course, you can make the load and requests behavior more realistic by adding a Random Timer.

Migrating from tomcat to weblogic

Thursday, March 11th, 2010

Moving from tomcat to weblogic may sound crazy. In case you need to do it though (e.g for business reasons) here are a couple of things which may go wrong.

First of all the classloader hierarchy in weblogic do not do what you usually expect from other servers such as tomcat, resin, jetty and jboss. If your application uses hibernate (and implicitly ANTLR) you may get the following exception:

Caused by: java.lang.Throwable: Substituted for missing class org.hibernate.QueryException - ClassNotFoundException: org.hibernate.hql.ast.HqlToken [from com.example.model.Person order by id]
        at org.hibernate.hql.ast.HqlLexer.panic(HqlLexer.java:80)
        at antlr.CharScanner.setTokenObjectClass(CharScanner.java:340)
        at org.hibernate.hql.ast.HqlLexer.setTokenObjectClass(HqlLexer.java:54)
        at antlr.CharScanner.<init>(CharScanner.java:51)
        at antlr.CharScanner.<init>(CharScanner.java:60)
        at org.hibernate.hql.antlr.HqlBaseLexer.<init>(HqlBaseLexer.java:56)
...

As explained in the Hibernate3 Migration Guide Weblogic doesn’t seem to support proper class loader isolation, will not see the Hibernate classes in the application’s context and will try to use it’s own version of ANTLR.

In the same fashion you may get the following exception for commons lang:

java.lang.NoSuchMethodError: org.apache.commons.lang.exception.ExceptionUtils.getMessage(Ljava/lang/Throwable;)Ljava/lang/String;

because weblogic internally uses commons lang 2.1 and the one you use may have more API methods.

For both these problems the solution is to instruct weblogic to prefer the jars from the WEB-INF of your application. You need to create a weblogic specific file called weblogic.xml and place it under WEB-INF:

<?xml version="1.0" encoding="UTF-8"?>
<weblogic-web-app>
    <container-descriptor>
        <prefer-web-inf-classes>true</prefer-web-inf-classes>
    </container-descriptor>
</weblogic-web-app>

Another problem is that, like in resin, the default servlet is not named “default” so if you depend on it in web.xml, your application may throw the following at the deployment phase:

Caused by: weblogic.management.DeploymentException: [HTTP:101170]The servlet default is referenced in servlet-mapping *.avi, but not defined in web.xml.

This is because the default servlet is called FileServlet in the web.xml, so you’ll need to change all references in your web.xml from “default” to “FileServlet”.

Last, but not least, tomcat will automatically issue a 302 redirect from http://localhost:8080/context to http://localhost:8080/context/ before allowing your application to do any processing. So all instances of request.getServletPath() will never return an empty string, but will always start with “/”. Weblogic doesn’t do this so http://localhost:8080/context resolves and if your code contains something like:

request.getServletPath().substring(1)

you’ll get:

java.lang.StringIndexOutOfBoundsException: String index out of range: -1

so a safer way to trim this leading slash is by doing:

request.getServletPath().replaceFirst("^/", "")

Good luck, and remember. Every time you use a full blown application server for something that a simple web container would be enough, god kills a kitten.

Retaining browser scrollTop between page refreshes

Tuesday, August 18th, 2009

Sometimes when you develop web applications with CRUD pages and other backend functionalities you need retain the vertical scrollbar state between page loads. This gives the user a smoother experience, especially when your application doesn’t do any AJAX but it’s based of good old full HTML page request-responses.

We’ll be using jQuery and the cookie plugin:

$(function(){
  // if current viewport height is at least 70% of the previous
  if ($(document.body).height()>$.cookie('h2')*0.7)
    // retain previous scroll
    $('html,body').scrollTop($.cookie('h1'));
});
$(window).unload(function(){
  // store current scroll
  $.cookie('h1', $(window).scrollTop());
  // store current viewport height
  $.cookie('h2', $(document.body).height());
});

We can completely skip the h2 cookie and the if statement but this is an automatic way to prevent scrolling to the very bottom of a short page when coming from a long page. Such use case is common when jumping from the “long list of items” to the “edit an item” page.

Good luck

mod_expires and Cache Killers

Sunday, May 3rd, 2009

Rule 3 of Steve Souders’ YSlow suggests that websites should Add a far future Expires header to the components. Components with a cache header could be static files such as those with extensions .css, .js, .jpg, .png, .gif etc. This gives a huge boost in client side performance of users with a primed cache. In apache this is done via mod_expires and an example configuration would be:

ExpiresActive On
ExpiresByType image/x-icon "access plus 1 month"
ExpiresByType text/css "access plus 1 month"
ExpiresByType application/javascript "access plus 1 month"
ExpiresByType image/gif "access plus 1 month"
ExpiresByType image/jpeg "access plus 1 month"
ExpiresByType image/png "access plus 1 month"

All this works well until you need to update a cached static file. The users with the primed cache will either have to wait 1 month to get the new file, or explicitly invalidate their cache. Some people will even ask their users to do a hard refresh but this obviously does not scale and it’s not very robust.

Since you cannot send an automatic signal to the browsers to reload those files all you can do is change the URL of those files (explicit invalidation). You could simply rename all those files, but an easier way to achieve the same effect is by adding a fake (unused – dummy) parameter at the end of the resource URL:

<img src="logo.jpg?2" />

The next logical step would be to automate this into the build system and have every production release feature new cache killer tokens. It seems that many well known sites do that already:

http://slashdot.org/

href="//s.fsdn.com/sd/idlecore-tidied.css?T_2_5_0_254a"
src="//s.fsdn.com/sd/all-minified.js?T_2_5_0_254a" 

http://stackoverflow.com/

href="/content/all.css?v=3184"
src="/content/js/master.js?v=3141"

http://digg.com/

@import "/css/189/global.css";
src="http://media.digg.com/js/loader/187/dialog|digg|shouts"

http://www.bbc.co.uk/

@import 'http://wwwimg.bbc.co.uk/home/release-29-7/style/homepage.min.css';
src="http://wwwimg.bbc.co.uk/home/release-29-7/script/glow.homepage.compressed.js"

http://www.guardian.co.uk/

href="http://static.guim.co.uk/static/73484/common/styles/wide/ie.css"
src="http://static.guim.co.uk/static/73484/common/scripts/gu.js"

What happens with images referenced from within css files? You could rewrite the css files automatically as part of your production build process with Ant.

<tstamp>
    <format property="cacheKill" pattern="yyyyMMddhhmm" locale="en,UK"/>
</tstamp>

<target name="rewrite-css">
    <replace dir="${build.web.dir}" value="css?${cacheKill}&quot;)"><include name="css/**/*.css"/><include name="scripts/**/*.css"/><replacetoken>css&quot;)</replacetoken></replace>
    <replace dir="${build.web.dir}" value="png?${cacheKill}&quot;)"><include name="css/**/*.css"/><include name="scripts/**/*.css"/><replacetoken>png&quot;)</replacetoken></replace>
    <replace dir="${build.web.dir}" value="gif?${cacheKill}&quot;)"><include name="css/**/*.css"/><include name="scripts/**/*.css"/><replacetoken>gif&quot;)</replacetoken></replace>
    <replace dir="${build.web.dir}" value="jpg?${cacheKill}&quot;)"><include name="css/**/*.css"/><include name="scripts/**/*.css"/><replacetoken>jpg&quot;)</replacetoken></replace>
    <replace dir="${build.web.dir}" value="css?${cacheKill}')"><include name="css/**/*.css"/><include name="scripts/**/*.css"/><replacetoken>css')</replacetoken></replace>
    <replace dir="${build.web.dir}" value="png?${cacheKill}')"><include name="css/**/*.css"/><include name="scripts/**/*.css"/><replacetoken>png')</replacetoken></replace>
    <replace dir="${build.web.dir}" value="gif?${cacheKill}')"><include name="css/**/*.css"/><include name="scripts/**/*.css"/><replacetoken>gif')</replacetoken></replace>
    <replace dir="${build.web.dir}" value="jpg?${cacheKill}')"><include name="css/**/*.css"/><include name="scripts/**/*.css"/><replacetoken>jpg')</replacetoken></replace>
    <replace dir="${build.web.dir}" value="css?${cacheKill})"><include name="css/**/*.css"/><include name="scripts/**/*.css"/><replacetoken>css)</replacetoken></replace>
    <replace dir="${build.web.dir}" value="png?${cacheKill})"><include name="css/**/*.css"/><include name="scripts/**/*.css"/><replacetoken>png)</replacetoken></replace>
    <replace dir="${build.web.dir}" value="gif?${cacheKill})"><include name="css/**/*.css"/><include name="scripts/**/*.css"/><replacetoken>gif)</replacetoken></replace>
    <replace dir="${build.web.dir}" value="jpg?${cacheKill})"><include name="css/**/*.css"/><include name="scripts/**/*.css"/><replacetoken>jpg)</replacetoken></replace>
</target>

This will take care of the following background image reference styles for css, png, gif and jpg files:

... background-image: url("images/ed-bg.gif");
... background-image: url('images/ed-bg.gif');
... background-image: url(images/ed-bg.gif);

and convert them to:

... background-image: url("images/ed-bg.gif?200905031126");
... background-image: url('images/ed-bg.gif?200905031126');
... background-image: url(images/ed-bg.gif?200905031126);

Good luck!

A better SMTPAppender

Saturday, May 2nd, 2009

SMTPAppender for log4j is a type of appender which sends emails via an SMTP server. It’s very useful for applications released in production where you’d definitely need to know of all application errors logged. Of course every caring developer should look at the server logs every now and then, but if you’ve got hundreds of them (applications) then it becomes a full time job in itself.

Sometimes a fresh release of a high traffic website may produce hundreds or thousands of ERROR level log events. Many times this may be something minor which is being logged deep inside your code. Until the bug is fixed and a new release is deployed, your inbox and the mail server may suffer heavily.

What follows is an extension of SMTPAppender which limits the amount of emails sent in a specified period of time. It features sensible defaults which of course can be configured externally via the log4j configuration file.

package com.cherouvim;

import org.apache.log4j.Logger;
import org.apache.log4j.net.SMTPAppender;

public class LimitedSMTPAppender extends SMTPAppender {

    private int limit = 10;           // max at 10 mails ...
    private int cycleSeconds = 3600;  // ... per hour

    public void setLimit(int limit) {
        this.limit = limit;
    }

    public void setCycleSeconds(int cycleSeconds) {
        this.cycleSeconds = cycleSeconds;
    }

    private int lastVisited;
    private long lastCycle;

    protected boolean checkEntryConditions() {
        final long now = System.currentTimeMillis();
        final long thisCycle =  now - (now % (1000L*cycleSeconds));
        if (lastCycle!=thisCycle) {
            lastCycle = thisCycle;
            lastVisited = 0;
        }
        lastVisited++;
        return super.checkEntryConditions() && lastVisited<=limit;
    }

}

The configuration would look something like this:

log4j.appender.???=com.cherouvim.LimitedSMTPAppender
log4j.appender.???.limit=3
log4j.appender.???.cycleSeconds=60
log4j.appender.???.BufferSize=25
log4j.appender.???.SMTPHost=${mail.smtp.host}
log4j.appender.???.From=${mail-sender}
log4j.appender.???.To=${sysadmin.email}
log4j.appender.???.Subject=An error occured
log4j.appender.???.layout=org.apache.log4j.PatternLayout
log4j.appender.???.layout.ConversionPattern=%d{ISO8601} %-5p (%F:%L) - %m%n
log4j.appender.???.threshold=ERROR

The above configuration will limit the mail dispatch to only 3 emails per minute. Any further errors in that minute will not be emailed. The limit and cycleSeconds setting lines can be omitted and the defaults will be applied.

Happy logging!

My favourite string for testing web applications

Monday, February 16th, 2009

Weird title, huh?

When creating templates, pages and action responses for your web application you really need to take HTML escaping into consideration. Sometimes the use cases of the system are so many that you may omit HTML escaping for some piece of dynamic or user entered text.

HTML escaping means converting <foo>bar & " to &lt;foo&gt;bar &amp; &quot; in the HTML source.

One of the reasons for HTML escaping is to avoid XSS attacks or simply to make your site valid.

Reasons for making your HTML output valid include:

  1. It’s the “right thing to do”
  2. Does not tire the browser
  3. Allows you to manually detect (via CTRL+SHIFT+A on web developer toolbar) for real HTML output errors
  4. Saves you from css rendering issues due to HTML anomalies
  5. Ensures your content is easily parsable from third party agents (crawlers, scrappers, XSLT transformators etc)

So my favourite string is <script’”<i>
You can alter your development database content using statements like these:

update articles set title=concat("<script'\"<i>", title);
update users set firstname=concat("<script'\"<i>", lastname), lastname=concat("<script'\"<i>", lastname);
update categories set title=concat("<script'\"<i>", title);
...

If after this database content change your site is not functional then there is a problem. You can also check for HTML validity with CTRL+SHIFT+A on web developer toolbar and quickly spot areas where you missed HTML escaping.

You could even automate this whole process by having a tool (JTidy?) scan that all your pages and use cases produce valid HTML. So indirectly you would be testing for insecure (in XSS terms) parts of the application.

HTML escaping in JSTL
HTML escaping in freemarker
HTML escaping in velocity

Duplicate content test and URL canonicalization

Sunday, February 15th, 2009

Days ago I uploaded the following script on my server:

<?php

  if ($_SERVER["QUERY_STRING"]=='foo&bar') {
    echo "index test one";
  }

  if ($_SERVER["QUERY_STRING"]=='bar&foo') {
    echo "bar and foo";
  }

  if ($_SERVER["QUERY_STRING"]=='bar&foo&test') {
    echo "bar and foo";
  }

?>

I then published 3 links to my site’s index so Google could follow them:

http://cherouvim.com/foo.php?foo&bar

http://cherouvim.com/foo.php?bar&foo

http://cherouvim.com/foo.php?bar&foo&test

Days later I got this result for the Google query site:cherouvim.com/foo:

The first and third result are the same (duplicate content). Google has indexed them both though. This is a common SEO problem in dynamic web sites where there can be many different URLs linking to the same page (paginators, out of date URLs, archive pages etc) or where you want to do URL Referrer Tracking.

Google has recently published a way of overcoming this problem. You can now specify which is the real (or primary) URL for the page. E.g:

<link rel="canonical" href="/foo.php?foo&bar" />

So, as SEOmoz said, this definitely is The Most Important Advancement in SEO Practices Since Sitemaps.

Re: The 140 character webapp challenge!

Saturday, February 14th, 2009

This is my response to The 140 character webapp challenge!

Epilepsy

javascript:{r=0;setInterval(function(){document.body.style.background=(r++%2==0?'#'+r%7+r%9+r%8:'0')},50);void(0)}

114 bytes of inline javascript that you can paste to your browser’s URL.

It’s a very useful webapp* in case you want to cause epilepsy to yourself.

Via: http://f055.net/article/the-140-character-webapp-challenge/
Via: http://synodinos.net/2009/02/14/re-the-140-character-webapp-challenge/

* just kidding :)