Block bingbot from crawling my site

15,711

Solution 1

This WILL definitely affect your SEO/search ranking and will cause pages to drop from the index so please use with care

You can block requests based on the user-agent string if you have the iis rewrite module installed (if not go here)

And then add a rule to your webconfig like this:

<system.webServer>
  <rules>
    <rule name="Request Blocking Rule" stopProcessing="true">
      <match url=".*" />
      <conditions>
        <add input="{HTTP_USER_AGENT}" pattern="msnbot|BingBot" />
      </conditions>
      <action type="CustomResponse" statusCode="403" statusReason="Forbidden: Access is denied." statusDescription="You do not have permission to view this page." />
    </rule>
  </rules>
</system.webServer>

This will return a 403 if the bot hits your site.

UPDATE

Looking at your robots.txt i think it should be:

# robots.txt 
User-agent: *
Disallow:
Disallow: *.axd
Disallow: /cgi-bin/
Disallow: /member
User-agent: bingbot
Disallow: /
User-agent: ia_archiver
Disallow: /

Solution 2

Your robots.txt is not correct:

  • You need line breaks between records (a record starts with one or more User-agent lines).

  • Disallow: bingbot disallows crawling of URLs whose paths start with "bingbot" (i.e., http://example.com/bingbot), which is probably not what you want.

  • Not an error, but Disallow: is not needed (as it’s the default anyway).

So you probably want to use:

User-agent: *
Disallow: *.axd
Disallow: /cgi-bin/
Disallow: /member

User-agent: bingbot
User-agent: ia_archiver
Disallow: /

This disallows crawling of anything for "bingbot" and "ia_archiver". All other bots are allowed to crawl everything except URLs whose paths start with /member, /cgi-bin/, or *.axd.

Note that *.axd will be interpreted literally by bots following the original robots.txt specification (so they will not crawl http://example.com/*.axd, but they will crawl http://example.com/foo.axd). However, many bots extend the spec and interpret the * as some kind of wildcard.

Share:
15,711
Zoinky
Author by

Zoinky

Updated on June 15, 2022

Comments

  • Zoinky
    Zoinky almost 2 years

    I would like t completely block bing from crawling my site for now (its attacking my site at an alarming rate (500GB of data a month).

    I have 1000 sub domains added to bing webmaster tools so i cant go and set each one's crawl rate. I have tried blocking it using robots.txt but its not working here is my robots.txt

    # robots.txt 
    User-agent: *
    Disallow:
    Disallow: *.axd
    Disallow: /cgi-bin/
    Disallow: /member
    Disallow: bingbot
    User-agent: ia_archiver
    Disallow: /