Robots.txt: allow only major SE

24,800

Solution 1

User-agent: *
Disallow: /
User-agent: Googlebot
Allow: /
User-agent: Slurp
Allow: /
User-Agent: msnbot
Disallow: 

Slurp is Yahoo's robot

Solution 2

Why?

Anyone doing evil (e.g., gathering email addresses to spam) will just ignore robots.txt. So you're only going to be blocking legitimate search engines, as robots.txt compliance is voluntary.

But — if you insist on doing it anyway — that's what the User-Agent: line in robots.txt is for.

User-agent: googlebot
Disallow: 

User-agent: *
Disallow: /

With lines for all the other search engines you'd like traffic from, of course. Robotstxt.org has a partial list.

Solution 3

There are more than 3 major search engines depending on which country you are talking. Facebook seem to be doing a good job listing only legitimate ones: https://facebook.com/robots.txt

So your robots.txt can be something like:

User-agent: Applebot
Allow: /

User-agent: baiduspider
Allow: /

User-agent: Bingbot
Allow: /

User-agent: Facebot
Allow: /

User-agent: Googlebot
Allow: /

User-agent: msnbot
Allow: /

User-agent: Naverbot
Allow: /

User-agent: seznambot
Allow: /

User-agent: Slurp
Allow: /

User-agent: teoma
Allow: /

User-agent: Twitterbot
Allow: /

User-agent: Yandex
Allow: /

User-agent: Yeti
Allow: /

User-agent: *
Disallow: /
Share:
24,800
Admin
Author by

Admin

Updated on August 01, 2022

Comments

  • Admin
    Admin almost 2 years

    Is there a way to configure the robots.txt so that the site accepts visits ONLY from Google, Yahoo! and MSN spiders?