robots.txt control for host aliases via mod_rewrite

Suppose you have a website launched at two different hosts.

<VirtualHost *:80>

The content is the same but you want to serve a different robots.txt file, possibly excluding any indexing from the secondary host.

It would be handy if we could simply say:

User-agent: *

User-agent: *

to allow all bots crawl the primary host and dissalow them from the secondary one, but this syntax is imaginary. Firstly there is no Allow keyword in the spec, and secondly URLs must be relative.

The solution is to have 2 different robots.txt files:


User-agent: *


User-agent: *
Dissalow: /

and serve them via mod_rewrite like this:

<VirtualHost *:80>
    RewriteEngine On
    RewriteCond %{HTTP_HOST} ^www\.example\.com$
    RewriteRule ^/robots.txt$ /robots-www.txt [L]
    RewriteCond %{HTTP_HOST} ^beta\.example\.com$
    RewriteRule ^/robots.txt$ /robots-beta.txt [L]

Now will silently serve robots-www.txt and will serve robots-beta.txt

This is also handy in domain name migration periods where you are waiting for dns to flush all around the globe until you feel safe for completely shutting down the secondary host and possibly assigning 301 redirects to the primary.

3 Responses to “robots.txt control for host aliases via mod_rewrite”

  1. Israel WebDev Says:

    This is perfect! Thanks.
    Small note: the robots-beta.txt should read

    User-agent: *
    Disallow: /

    and robots-www.txt should be:

    User-agent: *

    or a blank file (, or no file)

  2. Neil Ferns Says:

    We could use a
    Alias /robots.txt /home/www/
    which could prevent complicated mod rewrite rules.

    and have 2 diff virtaul hosts for the 2 diff domains pointing to the same htdocs.
    Alias /robots.txt /home/www/
    or no file.

  3. cherouvim Says:

    @Israel WebDev: I think plain “Dissalow:” is OK as well.

    @Neil Ferns: The virtual host in this example is 1 with 2 domains (aliases).

Leave a Reply

You must be logged in to post a comment.