What does "Disallow: /search" mean in robots.txt?
Solution 1
In the Disallow
field you specify the beginning of URL paths of URLs that should be blocked.
So if you have Disallow: /
, it blocks everything, as every URL path starts with /
.
If you have Disallow: /a
, it blocks all URLs whose paths begin with /a
. That could be /a.html
, /a/b/c/hello
, or /about
.
In the same sense, if you have Disallow: /search
, it blocks all URLs whose paths begin with the string /search
. So it would block the following URLs, for example (if the robots.txt is in http://example.com/
):
http://example.com/search
http://example.com/search.html
http://example.com/searchengine
http://example.com/search/
http://example.com/search/index.html
While the following URLs would still be allowed:
http://example.com/foo/search
http://example.com/sea
Note that robots.txt doesn’t know/bother if the string matches a directory, a file or nothing at all. It only looks at the characters in the URL.
Solution 2
Other answers explain how robots.txt is processed to apply this rule, but don't address why you would want to disallow bots from crawling your search results.
One reason might be that your search results are expensive to generate. Telling bots not to crawl those pages could reduce load on your servers.
Search results pages are also not great landing pages. A search result page typically just has a list of 10 pages from your site with titles and descriptions. Users would generally be better served by going directly to the most relevant of those pages. In fact, Google has said that they don't want your site search results indexed by Google. If you don't disallow them, Google could penalize your site.
Solution 3
It tells AdSense not to crawl anything files in the /search
directory or below (i.e. any subdirectories of /search
).
Solution 4
Since the OP indicated in his comments that he was only interested in the "/search directory", my answer below is in regards to disallowing just a "search" directory:
The following is a directive for robots not to crawl something named "search" located in the root directory:
Disallow: /search
According to the following Google Webmaster Tools help doc below, directory names should be proceeded and followed by a forward slash /
, as also specified in the other following reference sources:
Google Webmaster Tools - Block or remove pages using a robots.txt file
To block a directory and everything in it, follow the directory name with a forward slash.
Disallow: /junk-directory/
Robotstxt.org - What to put in it
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /~joe/
In this example, three directories are excluded.
Wikipedia - Robots exclusion standard
This example tells all robots not to enter three directories:
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /junk/
So according to Google (as copied above), the following would disallow bots with the user-agent Mediapartners-Google
from crawling the "search" directory located in the the root directory, but allow all other directories to be crawled:
User-agent: Mediapartners-Google
Disallow: /search/
Allow: /
Solution 5
it means that the user agent Mediapartners-Google will not be allowed to go into any of the directories under /search
/search/go blocked
/search blocked
/ not blocked.
Related videos on Youtube
Sathiya Kumar V M
Sathiya Kumar V M - An Enthusiastic Senior Database Developer especially in MS SQL Server, BI Developer, Specialist in SSIS, SSRS having 9+ years IT Experience. Started my career in 2012 @ContempoTech(Search Engine Genie) a startup company gave me a good platform for my SEO career. With the 1 year experience I joined Sulekha New Media Pvt ltd(Sulekha.com) where I learned SEO as well as SEM. Later I got a chance to join Servion Global Solutions as Database Engineer where I worked for nearly 7 years as Senior Database Developer as well as Reporting Developer. Currently working with BajajFinServ as Senior Technical Specialist who is working in Contact Center IT. I have my own blogs: http://sqlservertutorialspoint.blogspot.com/ - SQL Server Tutorials Point Blog http://latest-seo-news-updates.blogspot.com/ - Info about SEO http://mytamilkavithaigal.blogspot.in/ - Tamil Language Poets and Quotes Reach out me through my following profiles: Wikipedia - https://en.wikipedia.org/wiki/User:SathiyaKumarVM Twitter - http://twitter.com/SathiyaKumarseo Facebook - https://www.facebook.com/sathiyakumarseo Google+ - https://plus.google.com/+SathiyaKumarSEO About.me - https://www.about.me/sathiya.kumar Quora - http://www.quora.com/Sathiya-Kumar
Updated on September 18, 2022Comments
-
Sathiya Kumar V M over 1 year
In my blog's Google Webmaster Tools panel, I found the following code in my robots.txt of blocked URLs section.
User-agent: Mediapartners-Google Disallow: /search Allow: /
I know that
Disallow
will prevent Googlebot from indexing a webpage, but I don't understand the usage ofDisallow: /search
.What is the exact meaning of
Disallow: /search
? -
Zistoloen almost 11 yearsJohn, I think the
search/
restriction is not for all user-agents but only for Mediapartners-Google user-agent (AdSense bot). -
John Conde almost 11 yearsSmall detail that makes quite a difference. Thanks for pointing that out.
-
Sathiya Kumar V M almost 11 yearsJohn code mentioned about /search directory. What does it means? my question is about this only.
-
Zistoloen almost 11 yearsJohn answered your question, it means bots (AdSense bot only here) are not allowed to access resources under
/search
directory. For example, AdSense bot doesn't have the right to access on this type of URL:www.example.com/search/
orwww.example.com/search/file.html
.