Robots.txt: allow only major SE
Solution 1
User-agent: * Disallow: / User-agent: Googlebot Allow: / User-agent: Slurp Allow: / User-Agent: msnbot Disallow:
Slurp is Yahoo's robot
Solution 2
Why?
Anyone doing evil (e.g., gathering email addresses to spam) will just ignore robots.txt. So you're only going to be blocking legitimate search engines, as robots.txt compliance is voluntary.
But — if you insist on doing it anyway — that's what the User-Agent:
line in robots.txt is for.
User-agent: googlebot
Disallow:
User-agent: *
Disallow: /
With lines for all the other search engines you'd like traffic from, of course. Robotstxt.org has a partial list.
Solution 3
There are more than 3 major search engines depending on which country you are talking. Facebook seem to be doing a good job listing only legitimate ones: https://facebook.com/robots.txt
So your robots.txt can be something like:
User-agent: Applebot
Allow: /
User-agent: baiduspider
Allow: /
User-agent: Bingbot
Allow: /
User-agent: Facebot
Allow: /
User-agent: Googlebot
Allow: /
User-agent: msnbot
Allow: /
User-agent: Naverbot
Allow: /
User-agent: seznambot
Allow: /
User-agent: Slurp
Allow: /
User-agent: teoma
Allow: /
User-agent: Twitterbot
Allow: /
User-agent: Yandex
Allow: /
User-agent: Yeti
Allow: /
User-agent: *
Disallow: /
Admin
Updated on August 01, 2022Comments
-
Admin almost 2 years
Is there a way to configure the robots.txt so that the site accepts visits ONLY from Google, Yahoo! and MSN spiders?