redirecting bots and crawlers to another but not human via htaccess

5,079

What you are trying to do could technically be classified as cloaking which is a violation of Google's terms and can result in your site being removed from the Google index. Google is very strict in what they class as cloaking and basically the rule is whatever the end user sees the crawler has to see as well. If you are trying to block malicious bots then the easiest thing to do is simply block their user agent strings using .htaccess but if you try cloaking with a legitimate crawler such as Google it will be detected and will result in severe penalties and manual action notices which can severely affect your SERP ranking.

Google not only uses the known Googlebot user agent but also uses other bots which have the user agent string of real browsers on IP addresses not affiliated with Google as a way to detect this on websites so there is no way to prevent yourself from being caught out doing this.

Now having given that warning...

You mention Facebook crawler specifically. Facebook has three different user agents for crawling. facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php) and facebookexternalhit/1.1 which are used when a user shares your website to their wall and Facebot which is used to help improve advertising performance. Out of all of them only Facebot respects the robots.txt rule as the other ones are only triggered by a user action and so are treated the same as a web browser in effect. If you want to block any Facebook crawling simply add a .htaccess rule to detect these user agent strings and if they are detected either block them or return an error page that crawlers are not permitted. Trying to forward them to an alternate site with different content will simply complicate matters and could have the potential of reducing your SERP ranking due to not having context appropriate content on the pages that the Bots can access.

Share:
5,079

Related videos on Youtube

Sergio santa
Author by

Sergio santa

Updated on September 18, 2022

Comments

  • Sergio santa
    Sergio santa over 1 year

    I would to apply this diagram via htaccess I tried a lots of codes but I failed every time

    enter image description here

    So I need to redirecting bots and crawlers especially from facebook via .htaccess

  • Stephen Ostermiller
    Stephen Ostermiller over 7 years
    No need for "technically". As far as Google is concerned, that is a "sneaky redirect" and explicitly against their webmaster guidelines: support.google.com/webmasters/answer/2604723?hl=en
  • MrWhite
    MrWhite almost 7 years
    This won't work as-is. You don't check an environment variable with mod_rewrite like that - the RewriteCond condition will always fail since env=bad_bot is seen as a literal string and compared against the HTTP_USER_AGENT server variable (again, not what you are trying to do). (It looks like you are trying to borrow syntax from mod_auth_...?!). The RewriteCond directive should read something like RewriteCond %{ENV:bad_bot} 1 instead. (Although, as already stated, trying to redirect the bot is probably a bad idea to begin with - if anything it should simply be blocked.)