What does "Disallow: /search" mean in robots.txt?

26,105

Solution 1

In the Disallow field you specify the beginning of URL paths of URLs that should be blocked.

So if you have Disallow: /, it blocks everything, as every URL path starts with /.

If you have Disallow: /a, it blocks all URLs whose paths begin with /a. That could be /a.html, /a/b/c/hello, or /about.

In the same sense, if you have Disallow: /search, it blocks all URLs whose paths begin with the string /search. So it would block the following URLs, for example (if the robots.txt is in http://example.com/):

  • http://example.com/search
  • http://example.com/search.html
  • http://example.com/searchengine
  • http://example.com/search/
  • http://example.com/search/index.html

While the following URLs would still be allowed:

  • http://example.com/foo/search
  • http://example.com/sea

Note that robots.txt doesn’t know/bother if the string matches a directory, a file or nothing at all. It only looks at the characters in the URL.

Solution 2

Other answers explain how robots.txt is processed to apply this rule, but don't address why you would want to disallow bots from crawling your search results.

One reason might be that your search results are expensive to generate. Telling bots not to crawl those pages could reduce load on your servers.

Search results pages are also not great landing pages. A search result page typically just has a list of 10 pages from your site with titles and descriptions. Users would generally be better served by going directly to the most relevant of those pages. In fact, Google has said that they don't want your site search results indexed by Google. If you don't disallow them, Google could penalize your site.

Solution 3

It tells AdSense not to crawl anything files in the /search directory or below (i.e. any subdirectories of /search).

Solution 4

Since the OP indicated in his comments that he was only interested in the "/search directory", my answer below is in regards to disallowing just a "search" directory:

The following is a directive for robots not to crawl something named "search" located in the root directory:

Disallow: /search

According to the following Google Webmaster Tools help doc below, directory names should be proceeded and followed by a forward slash /, as also specified in the other following reference sources:

Google Webmaster Tools - Block or remove pages using a robots.txt file

To block a directory and everything in it, follow the directory name with a forward slash. Disallow: /junk-directory/

Robotstxt.org - What to put in it

User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /~joe/

In this example, three directories are excluded.

Wikipedia - Robots exclusion standard

This example tells all robots not to enter three directories:

User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /junk/

So according to Google (as copied above), the following would disallow bots with the user-agent Mediapartners-Google from crawling the "search" directory located in the the root directory, but allow all other directories to be crawled:

User-agent: Mediapartners-Google
Disallow: /search/
Allow: /

Solution 5

it means that the user agent Mediapartners-Google will not be allowed to go into any of the directories under /search

/search/go blocked
/search blocked
/ not blocked.
Share:
26,105

Related videos on Youtube

Sathiya Kumar V M
Author by

Sathiya Kumar V M

Sathiya Kumar V M - An Enthusiastic Senior Database Developer especially in MS SQL Server, BI Developer, Specialist in SSIS, SSRS having 9+ years IT Experience. Started my career in 2012 @ContempoTech(Search Engine Genie) a startup company gave me a good platform for my SEO career. With the 1 year experience I joined Sulekha New Media Pvt ltd(Sulekha.com) where I learned SEO as well as SEM. Later I got a chance to join Servion Global Solutions as Database Engineer where I worked for nearly 7 years as Senior Database Developer as well as Reporting Developer. Currently working with BajajFinServ as Senior Technical Specialist who is working in Contact Center IT. I have my own blogs: http://sqlservertutorialspoint.blogspot.com/ - SQL Server Tutorials Point Blog http://latest-seo-news-updates.blogspot.com/ - Info about SEO http://mytamilkavithaigal.blogspot.in/ - Tamil Language Poets and Quotes Reach out me through my following profiles: Wikipedia - https://en.wikipedia.org/wiki/User:SathiyaKumarVM Twitter - http://twitter.com/SathiyaKumarseo Facebook - https://www.facebook.com/sathiyakumarseo Google+ - https://plus.google.com/+SathiyaKumarSEO About.me - https://www.about.me/sathiya.kumar Quora - http://www.quora.com/Sathiya-Kumar

Updated on September 18, 2022

Comments

  • Sathiya Kumar V M
    Sathiya Kumar V M over 1 year

    In my blog's Google Webmaster Tools panel, I found the following code in my robots.txt of blocked URLs section.

    User-agent: Mediapartners-Google
    Disallow: /search
    Allow: /
    

    I know that Disallow will prevent Googlebot from indexing a webpage, but I don't understand the usage of Disallow: /search.

    What is the exact meaning of Disallow: /search?

  • Zistoloen
    Zistoloen almost 11 years
    John, I think the search/ restriction is not for all user-agents but only for Mediapartners-Google user-agent (AdSense bot).
  • John Conde
    John Conde almost 11 years
    Small detail that makes quite a difference. Thanks for pointing that out.
  • Sathiya Kumar V M
    Sathiya Kumar V M almost 11 years
    John code mentioned about /search directory. What does it means? my question is about this only.
  • Zistoloen
    Zistoloen almost 11 years
    John answered your question, it means bots (AdSense bot only here) are not allowed to access resources under /search directory. For example, AdSense bot doesn't have the right to access on this type of URL: www.example.com/search/ or www.example.com/search/file.html.