Allow all robots robots.txt
Solution 1
Allow
is non-standard according to Wikipedia: http://en.wikipedia.org/wiki/Robots.txt.
Solution 2
User-agent: *
Disallow: /
The one above tells robots not to crawl anything that matches the URL path. With the above instruction, Googlebot other search engine bots will not crawl your entire website.
User-agent: *
Allow: /
The one above tells everything is allowed. All visiting bots including Googlebot is allowed to crawl the website.
Solution 3
To allow all crawling you have some options. The clearest and most widely support is:
User-agent: *
Disallow:
To paraphrase, it means, "All user agents have nothing disallowed, they can crawl everything." This is the version of "allow all crawling" that is listed on robotstxt.org.
Another option is to have no robots.txt file. When robots encounter a 404 error at /robots.txt
they assume that crawling is not restricted.
I would not recommend using Allow:
directives in robots.txt. Not all crawlers support them. When you have both Allow:
and Disallow:
directives, the longest matching rule takes precedence instead of the first or last matching rule. This drastically complicates the process. If you do use use "Allow", be sure to test your robots.txt file with a testing tool such as the one from Google.
Admin
Updated on September 18, 2022Comments
-
Admin over 1 year
In my robots.txt file I have a list of robots that are not allowed to be indexed on my site and for the rest I have to allow all other robots, but I would like to know the real difference between these two rules:
User-agent: * Disallow:
and this:
User-agent: * Allow: /