Block bingbot from crawling my site
Solution 1
This WILL definitely affect your SEO/search ranking and will cause pages to drop from the index so please use with care
You can block requests based on the user-agent string if you have the iis rewrite module installed (if not go here)
And then add a rule to your webconfig like this:
<system.webServer>
<rules>
<rule name="Request Blocking Rule" stopProcessing="true">
<match url=".*" />
<conditions>
<add input="{HTTP_USER_AGENT}" pattern="msnbot|BingBot" />
</conditions>
<action type="CustomResponse" statusCode="403" statusReason="Forbidden: Access is denied." statusDescription="You do not have permission to view this page." />
</rule>
</rules>
</system.webServer>
This will return a 403 if the bot hits your site.
UPDATE
Looking at your robots.txt i think it should be:
# robots.txt
User-agent: *
Disallow:
Disallow: *.axd
Disallow: /cgi-bin/
Disallow: /member
User-agent: bingbot
Disallow: /
User-agent: ia_archiver
Disallow: /
Solution 2
Your robots.txt is not correct:
You need line breaks between records (a record starts with one or more
User-agent
lines).Disallow: bingbot
disallows crawling of URLs whose paths start with "bingbot" (i.e.,http://example.com/bingbot
), which is probably not what you want.Not an error, but
Disallow:
is not needed (as it’s the default anyway).
So you probably want to use:
User-agent: *
Disallow: *.axd
Disallow: /cgi-bin/
Disallow: /member
User-agent: bingbot
User-agent: ia_archiver
Disallow: /
This disallows crawling of anything for "bingbot" and "ia_archiver". All other bots are allowed to crawl everything except URLs whose paths start with /member
, /cgi-bin/
, or *.axd
.
Note that *.axd
will be interpreted literally by bots following the original robots.txt specification (so they will not crawl http://example.com/*.axd
, but they will crawl http://example.com/foo.axd
). However, many bots extend the spec and interpret the *
as some kind of wildcard.
Zoinky
Updated on June 15, 2022Comments
-
Zoinky almost 2 years
I would like t completely block bing from crawling my site for now (its attacking my site at an alarming rate (500GB of data a month).
I have 1000 sub domains added to bing webmaster tools so i cant go and set each one's crawl rate. I have tried blocking it using robots.txt but its not working here is my robots.txt
# robots.txt User-agent: * Disallow: Disallow: *.axd Disallow: /cgi-bin/ Disallow: /member Disallow: bingbot User-agent: ia_archiver Disallow: /