Redirect Google crawler to different robots.txt via .htaccess
Google's bots will still want to request /robots.txt from your sub domain and not /robots_static.txt which would have no meaning to them.
RewriteCond %{HTTP_HOST} ^www\.static\..*$ [NC]
RewriteRule ^/robots\.txt$ /robots_static.txt [L]
When requests for /robots.txt are made from your www.static domain the /robots_static.txt file will be served up as if it were /robots.txt
Related videos on Youtube
user3474818
Updated on September 18, 2022Comments
-
user3474818 over 1 year
I have googled for the answer all day and still couldn't find an answer.
I have a virtual subdomain
www.static.example.com
which is a mirror site ofwww.example.com
. It means I have just one root folder for subdomain and domain aswell.I want to redirect crawlers to different
robots.txt
file -robots_static.txt
when they see.static
in url in which I will forbid indexing via/disallow
command. I want to do this because I have duplicated content in Google search results. Subdomain is showing the exact same content as the main domain.Does anyone know how could I achieve that crawlers sees
robots_static.txt
instead ofrobots.txt
?What I have managed to find so far is this:
RewriteCond %{HTTP_HOST} ^www.static.*$ [NC] RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*robots\.txt.*\ HTTP/ [NC] RewriteRule ^robots\.txt /robots_static.txt [NC,L]
but when I check in webmaster tools, it still sees
robots.txt
as my robots file instead ofrobots_static.txt
, so it crawls and index everything twice.What did I do wrong? Thanks
EDIT: This is my
.htaccess
file## # @package Joomla # @copyright Copyright (C) 2005 - 2013 Open Source Matters. All rights reserved. # @license GNU General Public License version 2 or later; see LICENSE.txt ## ## # READ THIS COMPLETELY IF YOU CHOOSE TO USE THIS FILE! # # The line just below this section: 'Options +FollowSymLinks' may cause problems # with some server configurations. It is required for use of mod_rewrite, but may already # be set by your server administrator in a way that dissallows changing it in # your .htaccess file. If using it causes your server to error out, comment it out (add # to # beginning of line), reload your site in your browser and test your sef url's. If they work, # it has been set by your server administrator and you do not need it set here. ## ## Can be commented out if causes errors, see notes above. Options +FollowSymLinks ## Mod_rewrite in use. RewriteEngine On RewriteEngine On RewriteCond %{HTTP_HOST} !^www\. RewriteRule ^(.*)$ http://www.%{HTTP_HOST}/$1 [R=301,L] RewriteCond %{HTTP_HOST} ^www.static.*$ [NC] RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*robots\.txt.*\ HTTP/ [NC] RewriteRule ^robots\.txt /robots_static.txt [NC,L] ## Begin - Rewrite rules to block out some common exploits. # If you experience problems on your site block out the operations listed below # This attempts to block the most common type of exploit `attempts` to Joomla! # # Block out any script trying to base64_encode data within the URL. RewriteCond %{QUERY_STRING} base64_encode[^(]*\([^)]*\) [OR] # Block out any script that includes a <script> tag in URL. RewriteCond %{QUERY_STRING} (<|%3C)([^s]*s)+cript.*(>|%3E) [NC,OR] # Block out any script trying to set a PHP GLOBALS variable via URL. RewriteCond %{QUERY_STRING} GLOBALS(=|\[|\%[0-9A-Z]{0,2}) [OR] # Block out any script trying to modify a _REQUEST variable via URL. RewriteCond %{QUERY_STRING} _REQUEST(=|\[|\%[0-9A-Z]{0,2}) # Return 403 Forbidden header and show the content of the root homepage RewriteRule .* index.php [F] # ## End - Rewrite rules to block out some common exploits. ## Begin - Custom redirects # # If you need to redirect some pages, or set a canonical non-www to # www redirect (or vice versa), place that code here. Ensure those # redirects use the correct RewriteRule syntax and the [R=301,L] flags. # ## End - Custom redirects ## # Uncomment following line if your webserver's URL # is not directly related to physical file paths. # Update Your Joomla! Directory (just / for root). ## # RewriteBase / RewriteCond %{THE_REQUEST} ^GET.*index\.php [NC] RewriteCond %{THE_REQUEST} !/system/.* RewriteRule (.*?)index\.php/*(.*) /$1$2 [R=301,L] RewriteCond %{THE_REQUEST} ^GET ## Begin - Joomla! core SEF Section. # RewriteRule .* - [E=HTTP_AUTHORIZATION:%{HTTP:Authorization}] # # If the requested path and file is not /index.php and the request # has not already been internally rewritten to the index.php script RewriteCond %{REQUEST_URI} !^/index\.php # and the request is for something within the component folder, # or for the site root, or for an extensionless URL, or the # requested URL ends with one of the listed extensions RewriteCond %{REQUEST_URI} /component/|(/[^.]*|\.(php|html?|feed|pdf|vcf|raw))$ [NC] # and the requested path and file doesn't directly match a physical file RewriteCond %{REQUEST_FILENAME} !-f # and the requested path and file doesn't directly match a physical folder RewriteCond %{REQUEST_FILENAME} !-d # internally rewrite the request to the index.php script RewriteRule .* index.php [L] # ## End - Joomla! core SEF Section. <FilesMatch "\.(ico|pdf|flv|jpg|ttf|jpg|jpeg|png|gif|js|css|swf)$"> Header set Expires "Wed, 15 Apr 2020 20:00:00 GMT" Header set Cache-Control "public" </FilesMatch> <ifModule mod_headers.c> Header set Connection keep-alive </ifModule> ########## Begin - Remove Etags # FileETag none # ########## End - Remove Etags
-
MrWhite almost 10 yearsYour .htaccess code actually looks OK (although it could be "tidied" a bit and I think the 2nd line is unnecessary?). Just to clarify, your "virtual subdomain" is
www.static.
and not simplystatic.
? Do you have other directives in your .htaccess file? -
user3474818 almost 10 yearsI have added my .htaccess file content for you to see if there is something suspicious. My subdomain is static, but when I type that in browser, it becomes www.static
-
-
MrWhite almost 10 yearsJust curious why the need for mod_proxy here? Since it looks like a straightforward internal rewrite on the same server?
-
Dave Lozier almost 10 yearsI have a bad habit of overusing mod_proxy - corrected for a simple remapping via mod_rewrite. Thanks for pointing that out.