Duplicate content test and URL canonicalization

Days ago I uploaded the following script on my server:

<?php

  if ($_SERVER["QUERY_STRING"]=='foo&bar') {
    echo "index test one";
  }

  if ($_SERVER["QUERY_STRING"]=='bar&foo') {
    echo "bar and foo";
  }

  if ($_SERVER["QUERY_STRING"]=='bar&foo&test') {
    echo "bar and foo";
  }

?>

I then published 3 links to my site’s index so Google could follow them:

http://cherouvim.com/foo.php?foo&bar

http://cherouvim.com/foo.php?bar&foo

http://cherouvim.com/foo.php?bar&foo&test

Days later I got this result for the Google query site:cherouvim.com/foo:

The first and third result are the same (duplicate content). Google has indexed them both though. This is a common SEO problem in dynamic web sites where there can be many different URLs linking to the same page (paginators, out of date URLs, archive pages etc) or where you want to do URL Referrer Tracking.

Google has recently published a way of overcoming this problem. You can now specify which is the real (or primary) URL for the page. E.g:

<link rel="canonical" href="/foo.php?foo&bar" />

So, as SEOmoz said, this definitely is The Most Important Advancement in SEO Practices Since Sitemaps.

4 Responses to “Duplicate content test and URL canonicalization”

  1. Introspective Says:

    I used to publish my articles, but now I wander should I stop doing this, because the risk of duplicate content penalty. Should I stop publish my articles on article directories?

  2. Cristian Says:

    Introspective,

    Duplicate content theory as we know it applies only to on-site duplicate content.
    Aproximately 30% of all the web’s info is duplicate content. Just think about how much duplicate content is passed between prominent news agencies world wide – surely you’ve read breaking news with the same content on multiple news site.

    It depends on that particular article directory if it tolerates duplicated articles. From your SEO perspective is better to spin the article so that SE perceives it as being unique content. This permits you to dominate top 10 positions in SERPs for certain keywords.

    After you’ve posted the same article on multiple directories for a given period of time this same piece of content will show up multiple times on the same search result page, but as Google and other SE filters the new information it will only retain one copy of that article, generaly the one placed on the first article to be indexed with that copy or the directory with the highest ranking score.

    I’ve made I post on duplicate content on my bog, if you’re interseted in the subject. Here’s the link: http://trafficcpanel.com/871/duplicate-content-is-your-business-website-silently-infected/

    Hope it helps!
    Cheers
    Cristian

  3. izdelava spletnih strani Says:

    In these times were social sites, twitter, facebook and stuff are taking over the net, it’s hard to tell what’s duplicate content. If someone bookmarks my post on digg, mixx, delicious and other how can i prevent that my ‘duplicate content’ is being distributed over the internet and harm my true content and web page? In my opinion search engines doesn’t really put so much attention to duplicate content cos it really can’t detect what is true and what is duplicate.

  4. Eva Campbell Says:

    I usually submit 300 word articles on article directories to help me gain backlinks and readers.:*;