Allow all robots robots.txt

18,509

Solution 1

Allow is non-standard according to Wikipedia: http://en.wikipedia.org/wiki/Robots.txt.

Solution 2

User-agent: *
Disallow: /

The one above tells robots not to crawl anything that matches the URL path. With the above instruction, Googlebot other search engine bots will not crawl your entire website.

User-agent: *
Allow: /

The one above tells everything is allowed. All visiting bots including Googlebot is allowed to crawl the website.

Solution 3

To allow all crawling you have some options. The clearest and most widely support is:

User-agent: *
Disallow:

To paraphrase, it means, "All user agents have nothing disallowed, they can crawl everything." This is the version of "allow all crawling" that is listed on robotstxt.org.


Another option is to have no robots.txt file. When robots encounter a 404 error at /robots.txt they assume that crawling is not restricted.


I would not recommend using Allow: directives in robots.txt. Not all crawlers support them. When you have both Allow: and Disallow: directives, the longest matching rule takes precedence instead of the first or last matching rule. This drastically complicates the process. If you do use use "Allow", be sure to test your robots.txt file with a testing tool such as the one from Google.

Share:
18,509
Admin
Author by

Admin

Updated on September 18, 2022

Comments

  • Admin
    Admin over 1 year

    In my robots.txt file I have a list of robots that are not allowed to be indexed on my site and for the rest I have to allow all other robots, but I would like to know the real difference between these two rules:

    User-agent: *
    Disallow:
    

    and this:

    User-agent: *
    Allow: /