Why deny access to website for msnbot/bingbot?

8,167

Solution 1

I don't think people should ban bing bot.

Bing has an equivalent Bing Webmaster tools at http://www.bing.com/toolbox/webmaster/ where they also have 'Crawl Settings' where you can adjust the crawl rate as seen in this video: http://www.bing.com/videos/watch/video/bing-webmaster-tools-crawl-rate-settings/1ii1ej9jz

Googlebot is just as notorious in excessive crawling of sites as msnbot. Also the better (traffic/linkage) your site gets, the more googlebot crawls. Just look how fast stackexchange questions get indexed after being posted. You can see how much these bots hit up your server if you check your access logs.

I also discovered that msnbot supports robots.txt Crawl-delay parameter. http://www.bing.com/community/site_blogs/b/webmaster/archive/2009/08/10/crawl-delay-and-the-bing-crawler-msnbot.aspx

Solution 2

msnbot is quite prolific when it comes to spidering servers and if you have a lot of pages to index it can quite easily cripple your server. As traffic from MSN is considerably less than what Google can give it's quite common just to deny the msnbot via .htaccess, iptables or robots.txt. With Googlebot you can limit the speed quite easily in http://google.com/webmasters

Solution 3

Whilst BingBot has a Webmaster Tools section that allows you to limit the speed at which the bot crawls your site, there are three major problems with their approach.

  1. They don't allow you to select a crawl rate by number of seconds like Google do. Instead they have a crappy low to high range, but make no attempt to explain what low and high actually mean in terms of seconds between hits.

  2. BingBot may adhere to your wishes to crawl at a slower rate, but they often have multiple spiders crawling your site at the same time. Many spiders crawling at a low rate can be far worse than one spider crawling at a high rate.

  3. Microsoft don't care. I have contacted them about instances where they had around 20 individual bot connections to our server loading pages every few seconds and bringing the server to a halt. Their response was that there was nothing they could do about it.

A simple bit of programming skill by Microsoft could easily ensure that only one bot crawls a site at any time.

My solution is to limit the MSN IP ranges in iptables. I'm still experimenting with this, but I believe this can still allow them access to the sites, but force them to slow down. When the connections become too aggressive they are rejected.

Solution 4

One of my clients was doing $10,000 monthly from Bing shopping alone. Organics from Bing was even more. Banning them would cause a big loss of revenue. Anyone suggesting it must have their own personal reasons. Bing generates visits so if you want to decrease your traffic go a head and ban Bing. Otherwise like Anthony said you can work with their Webmaster Tools to better your site for Bing.com

Solution 5

MSNBot is extremely aggressive and has sucked up over 2.5GB of bandwidth from many of my sites in less than a month (that's 2.5GB+ for each site). Microsoft really needs to straighten that out but probably never will. Until then, I'm treating MSNBot as the malicious program it is and banning it from my systems.

Share:
8,167

Related videos on Youtube

Quandary
Author by

Quandary

Mind the gap.

Updated on September 18, 2022

Comments

  • Quandary
    Quandary over 1 year

    I've seen quite a lot of tutorials that recommend you to ban user agents containing the strings libwww-perl and msnbot. I understand why one would ban libwww-perl, it's mainly if not only used for hacking and spamming.

    But why are there so many sites recommending to ban msnbot/bingbot?
    Since it's a search engine, even if only with a marginal market share, I would except one would want this bot to crawl one's sites.

    What is it that msnbot does that makes people ban it?

  • Quandary
    Quandary almost 12 years
  • Stephen Ostermiller
    Stephen Ostermiller almost 7 years
    You should disallow robots from crawling any secret URLs. It sounds like Bing has gotten a hold of quite a few of them in this manner, but you won't just have problems with Bing. Google is also notorious for finding out secret URLs.
  • Tom
    Tom almost 7 years
    One thing is Google "finding out" secret urls, thats been the case for years. Its a whole different game now when Microsoft now is starting to send Bing every single url visited from the Edge browser. At least Google havent been doing that. Bing is now basically replicating every Edge user.