Why is Disallow: /search in Blogger's robots.txt?

robots.txt blogger blogspot

7,440

Solution 1

In addition to closetnoc's answer...

Should I remove /search from the Disallow: line?

No. It is a good idea to block bots from crawling your search results (which I assume is what this is referring to).

You don't normally want your search result pages from appearing in Google's search results pages! And Google doesn't want this either. Google wants to index your actual pages, and return these in the SERPs. Allowing bots to crawl your search results (which could potentially be infinite) could also use up a lot of unnecessary bandwidth.

However, Mediapartners-Google (Google's AdSense bot) is permitted to crawl your /search results. I believe this is necessary if you wish to serve adverts from your search results pages.

Do I need to edit anything in it?

Not unless you want/need to block some bots from crawling certain areas of your site. Note that some bots will completely ignore your robots.txt file anyway.

Solution 2

Robots.txt is a way of telling bots (robot agents) where they can go and cannot go. It is placed in the root of your web site as a standard to be found easily. It is really that simple.

In your example:

User-agent: Mediapartners-Google is not disallowed. The Disallow: with nothing following is an allow all (without restriction).

User-agent: * is a directive that applies to all bots to disallow access URI /search (example.com/search) and allow access the site otherwise.

Sitemap: tells bots that you have a sitemap available. A sitemap is an XML (a standardized data mark-up language) formatted file that lists the pages of your site. This is handy for search engines to know your sites pages. Sitemaps are not always necessary, however, if some pages are not easily available to a search engine, the sitemap makes it easier for the search engine find our page.

Solution 3

Robots.txt is a file that other websites, ISP's, and search engines use to "ask you" whats ok to visit. It allows you to whitelist or blacklist all or specific bots from areas of your realm. It's like a treaty. It's a promise. Good things keep the promise, bad things do not.

As far as search: I agree that in the past it was not good practice to allow robots to hit search. Nowadays, allowing Google to hit up search may work out well; at least in certain niches; and you don't even need search caching.

The robots.txt's across our platforms vary, but we always leave the search disallow commented out (AKA robots allowed to search, but it's ready to be uncommented if needed). There are a few reasons:

Fills in SEO - sometimes you will see search results popup for category niches you missed.
Fills in LSI - helps you create organics from organics, automagically
May Help RDF - this is edge but allowing G to search may expose rich snippets faster
Makes Authority - See a search page SERP result dominating organics? Turn it into a lander to gain PR
Helps G Understand - between tab-search in address bar, analytics search teach, and webmaster tools query string parameters, G will understand and help.

Look for areas in G analytics, G webmaster tools, and other G areas to set up search now and in the future.

7,440

arximughal

Updated on September 18, 2022

Comments

arximughal over 1 year
Can anyone tell me what does this mean in Blogger's "robots.txt" file? Do I need to edit anything in it? Should I remove /search from the Disallow: line?
```
User-agent: Mediapartners-Google
Disallow: 

User-agent: *
Disallow: /search
Allow: /

Sitemap: http://css3wdesign.blogspot.com/sitemap.xml
```
MrWhite over 9 years

"User-agent: * is a directive that applies to all bots" ... that don't match any of the other groups, so it won't match the "Mediapartners-Google" bot. (+1)
closetnoc over 9 years

Excellent additions! I think I wrote my answer a little too close to nap time. ;-) +1 back atcha! Thanks for chiming in.
Stephen Ostermiller over 8 years

See this blog post from Google's Matt Cutts that explains why Google doesn't want to index your search results and why they penalize sites that allow site search to be crawled: mattcutts.com/blog/search-results-in-search-results
Goyllo almost 7 years

Actually the /search is not only made to search blog post on blogspot blog. It is also used in labels(category) links like https://search.googleblog.com/search/label/mobile And the labels link are mostly displayed at the end of blog post. So I think blogger team should include /label/ outside the /search/ directory to make it crawl properly.