Redirect Google crawler to different robots.txt via .htaccess

5,755

Google's bots will still want to request /robots.txt from your sub domain and not /robots_static.txt which would have no meaning to them.

RewriteCond %{HTTP_HOST} ^www\.static\..*$ [NC]
RewriteRule ^/robots\.txt$  /robots_static.txt [L]

When requests for /robots.txt are made from your www.static domain the /robots_static.txt file will be served up as if it were /robots.txt

Share:
5,755

Related videos on Youtube

user3474818
Author by

user3474818

Updated on September 18, 2022

Comments

  • user3474818
    user3474818 over 1 year

    I have googled for the answer all day and still couldn't find an answer.

    I have a virtual subdomain www.static.example.com which is a mirror site of www.example.com. It means I have just one root folder for subdomain and domain aswell.

    I want to redirect crawlers to different robots.txt file - robots_static.txt when they see .static in url in which I will forbid indexing via /disallow command. I want to do this because I have duplicated content in Google search results. Subdomain is showing the exact same content as the main domain.

    Does anyone know how could I achieve that crawlers sees robots_static.txt instead of robots.txt?

    What I have managed to find so far is this:

    RewriteCond %{HTTP_HOST} ^www.static.*$ [NC]
    RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*robots\.txt.*\ HTTP/ [NC]
    RewriteRule ^robots\.txt /robots_static.txt [NC,L]
    

    but when I check in webmaster tools, it still sees robots.txt as my robots file instead of robots_static.txt, so it crawls and index everything twice.

    What did I do wrong? Thanks

    EDIT: This is my .htaccess file

    ##
    # @package      Joomla
    # @copyright    Copyright (C) 2005 - 2013 Open Source Matters. All rights reserved.
    # @license      GNU General Public License version 2 or later; see LICENSE.txt
    ##
    
    ##
    # READ THIS COMPLETELY IF YOU CHOOSE TO USE THIS FILE!
    #
    # The line just below this section: 'Options +FollowSymLinks' may cause problems
    # with some server configurations.  It is required for use of mod_rewrite, but may already
    # be set by your server administrator in a way that dissallows changing it in
    # your .htaccess file.  If using it causes your server to error out, comment it out (add # to
    # beginning of line), reload your site in your browser and test your sef url's.  If they work,
    # it has been set by your server administrator and you do not need it set here.
    ##
    
    ## Can be commented out if causes errors, see notes above.
    Options +FollowSymLinks
    
    ## Mod_rewrite in use.
    
    RewriteEngine On
    
    RewriteEngine On
    RewriteCond %{HTTP_HOST} !^www\.
    RewriteRule ^(.*)$ http://www.%{HTTP_HOST}/$1 [R=301,L]
    
    
    
    
    RewriteCond %{HTTP_HOST} ^www.static.*$ [NC]
    RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*robots\.txt.*\ HTTP/ [NC]
    RewriteRule ^robots\.txt /robots_static.txt [NC,L]
    
    
    ## Begin - Rewrite rules to block out some common exploits.
    # If you experience problems on your site block out the operations listed below
    # This attempts to block the most common type of exploit `attempts` to Joomla!
    #
    # Block out any script trying to base64_encode data within the URL.
    RewriteCond %{QUERY_STRING} base64_encode[^(]*\([^)]*\) [OR]
    # Block out any script that includes a <script> tag in URL.
    RewriteCond %{QUERY_STRING} (<|%3C)([^s]*s)+cript.*(>|%3E) [NC,OR]
    # Block out any script trying to set a PHP GLOBALS variable via URL.
    RewriteCond %{QUERY_STRING} GLOBALS(=|\[|\%[0-9A-Z]{0,2}) [OR]
    # Block out any script trying to modify a _REQUEST variable via URL.
    RewriteCond %{QUERY_STRING} _REQUEST(=|\[|\%[0-9A-Z]{0,2})
    # Return 403 Forbidden header and show the content of the root homepage
    RewriteRule .* index.php [F]
    #
    ## End - Rewrite rules to block out some common exploits.
    
    ## Begin - Custom redirects
    #
    # If you need to redirect some pages, or set a canonical non-www to
    # www redirect (or vice versa), place that code here. Ensure those
    # redirects use the correct RewriteRule syntax and the [R=301,L] flags.
    #
    ## End - Custom redirects
    
    ##
    # Uncomment following line if your webserver's URL
    # is not directly related to physical file paths.
    # Update Your Joomla! Directory (just / for root).
    ##
    
    # RewriteBase /
    
    RewriteCond %{THE_REQUEST} ^GET.*index\.php [NC]
    RewriteCond %{THE_REQUEST} !/system/.*
    RewriteRule (.*?)index\.php/*(.*) /$1$2 [R=301,L]
    RewriteCond %{THE_REQUEST} ^GET
    
    ## Begin - Joomla! core SEF Section.
    #
    RewriteRule .* - [E=HTTP_AUTHORIZATION:%{HTTP:Authorization}]
    #
    # If the requested path and file is not /index.php and the request
    # has not already been internally rewritten to the index.php script
    RewriteCond %{REQUEST_URI} !^/index\.php
    # and the request is for something within the component folder,
    # or for the site root, or for an extensionless URL, or the
    # requested URL ends with one of the listed extensions
    RewriteCond %{REQUEST_URI} /component/|(/[^.]*|\.(php|html?|feed|pdf|vcf|raw))$ [NC]
    # and the requested path and file doesn't directly match a physical file
    RewriteCond %{REQUEST_FILENAME} !-f
    # and the requested path and file doesn't directly match a physical folder
    RewriteCond %{REQUEST_FILENAME} !-d
    # internally rewrite the request to the index.php script
    RewriteRule .* index.php [L]
    #
    ## End - Joomla! core SEF Section.
    
    <FilesMatch "\.(ico|pdf|flv|jpg|ttf|jpg|jpeg|png|gif|js|css|swf)$">
    Header set Expires "Wed, 15 Apr 2020 20:00:00 GMT"
    Header set Cache-Control "public"
    </FilesMatch>
    
    <ifModule mod_headers.c>
        Header set Connection keep-alive
    </ifModule>
    
    ########## Begin - Remove Etags
        #
        FileETag none
        #
        ########## End - Remove Etags
    
    • MrWhite
      MrWhite almost 10 years
      Your .htaccess code actually looks OK (although it could be "tidied" a bit and I think the 2nd line is unnecessary?). Just to clarify, your "virtual subdomain" is www.static. and not simply static.? Do you have other directives in your .htaccess file?
    • user3474818
      user3474818 almost 10 years
      I have added my .htaccess file content for you to see if there is something suspicious. My subdomain is static, but when I type that in browser, it becomes www.static
  • MrWhite
    MrWhite almost 10 years
    Just curious why the need for mod_proxy here? Since it looks like a straightforward internal rewrite on the same server?
  • Dave Lozier
    Dave Lozier almost 10 years
    I have a bad habit of overusing mod_proxy - corrected for a simple remapping via mod_rewrite. Thanks for pointing that out.