Best way to prevent Google from indexing a directory

6,587

Supposedly only robots.txt will be fine, but that depends on how the crawler bot is programmed. Almost all 'legal' crawlers (If not all) use robots.txt, and for disabling a directory robots.txt is the best option, as you don't have to add a meta tag to each file, and also disallows txt files, images and everything you forgot/can't add a meta tag and may be linked from elsewhere. Also, if you upload more files in the future to the directory, you don't have to worry about remembering about adding meta tags too.

Definitely is safer to use robots.txt to deny the access to an entire directory.

EDIT: Although Disallow tells robots to disallow a directory from being crawled, maybe Google indexes it, as stated here: https://developers.google.com/webmasters/control-crawl-index/docs/faq#h17 so using the tag instead of the robots.txt will be better.

However, robots.txt Disallow does not guarantee that a page will not appear in results: Google may still decide, based on external information such as incoming links, that it is relevant.

Share:
6,587

Related videos on Youtube

Gkhan14
Author by

Gkhan14

Updated on September 18, 2022

Comments

  • Gkhan14
    Gkhan14 over 1 year

    I've researched many methods on how to prevent Google/other search engines from crawling a specific directory. The two most popular ones I've seen are:

    1. Adding it into the robots.txt file: Disallow: /directory/
    2. Adding a meta tag: <meta name="robots" content="noindex, nofollow">

    Which method would work the best? I want this directory to remain "invisible" from search engines so it does not affect any of my site's ranking.

    In other words, I want this directory to be neutral/invisible and "just there." I don't want it to affect any ranking. Which method would be the best to achieve this?

  • Gkhan14
    Gkhan14 almost 11 years
    Awesome! Also, by adding this to the robots.txt file, will it make the directory "invisible" to the search engine? Meaning will it not affect the ranking of the other pages on my site.
  • Jorge Fuentes González
    Jorge Fuentes González almost 11 years
    Wait, I'm adding more info. Just read the Google faq and his bot is a bit special with robots.txt.
  • Jorge Fuentes González
    Jorge Fuentes González almost 11 years
    And, in reply about the pagerank, that depends on how Google calculates pagerank exactly and they don't tell much about that when talking about bad practices, but seems that having 'Duplicate content' (Google hates that so much) don't affects your pagerank as long as the page has the nonindex tag, as you can read here: support.google.com/webmasters/answer/66359?hl=en so maybe that works with other practices.
  • Gkhan14
    Gkhan14 almost 11 years
    Thanks! Lastly, do you think it's safe to just place a 404 error for the entire directory in my .htaccess file, according to the IP? This directory is mainly just for me and a few other people.
  • Jorge Fuentes González
    Jorge Fuentes González almost 11 years
    What do you mean with safe? .htaccess can only be edited by the admin/admins that has write access to the file so any normal user can edit it, is as safe as your config files. If you like the main answer, please, accept it by clicking the 'V' at his left please :P
  • Gkhan14
    Gkhan14 almost 11 years
    I mean if by adding a 404 error, and making it look like the directory doesn't even exist to Google, will it just leave my rankings the way they are now?
  • Jorge Fuentes González
    Jorge Fuentes González almost 11 years
    Supposedly any robot will know that your file exists if there isn't any link to it on the internet, so in your index file inside your directory you add a blank page or an error page or something, then any bot can't notice it, they cannot guess the file name by magic :P The problem is that if users navigate to your file, is possible that by "magic" Google will know that your file exists, because it crawls the links that the users navigate altough they arent linked.
  • Jorge Fuentes González
    Jorge Fuentes González almost 11 years
    I mean, Google will know all the links you enter, and eventually index them. For example, some time ago I sent to about 200k Spanish users a mobile message with a link on it, because SMS are too short the link dont had an space between the link and a final dot, like this: click here http://mylink.com/mobile/.For more info call 123456789. As you can see, the link is mylink.com/mobile/ but some mobile phones, when clicking the link, sent the user to mylink.com/mobile/.For but well, after some months mylink.com/mobile/.For started appearing in search results...
  • Jorge Fuentes González
    Jorge Fuentes González almost 11 years
    So yes, Google indexes all the sites that users navigates because of Chrome.
  • Gkhan14
    Gkhan14 almost 11 years
    Thanks! My last (and hopefully FINAL :P) question is that: Only a few files in the directory will be available to the public, the rest of the files (including the main directory URL) will have either a 404 or a 403 error. Which one would be better for this case (404 or 403)? Keep in mind that I want to maintain the ranking of the rest of the pages on my site, so I don't want this 404 or 403 error affecting it.
  • Jorge Fuentes González
    Jorge Fuentes González almost 11 years
    BUT, if your site is a really bad site (Download malware without user permission or a phising site), that will affect pagerank really bad altough you add a nonindex tag, and also access to the entire website and all sites that load content from that website (images or something) will be blocked from almost all well known browsers with a "this site has malware or is a phishing site" alert. Is obvious that people that has malware sites and phising sites don't want this to happen so also is obviuos that adding tags or disallowing robots from accesing the site is not going to work...
  • Jorge Fuentes González
    Jorge Fuentes González almost 11 years
    If not, it would be really easy to have malware sites without being detected.
  • Jorge Fuentes González
    Jorge Fuentes González almost 11 years
    Having a 404/403 error wont affect pagerank. If you want your files being really invisible and unacknowledgable use a 404, otherwise instead of telling that "There is nothing to see" you are telling users that receive 403 that "There is something. but you cant see it". By the way, if you add a 404 error when accesing the file, how are you going to access it? I presume that you are trying to hide files so you are the only one to access them.
  • Gkhan14
    Gkhan14 almost 11 years
    Precisely! So all in all, if I add a 403 error to the directory and only to a FEW files in the directory, will it affect the overall ranking of the other content on my website? Some of the files that aren't affected by the 403 error (only images) on the directory will be visible to the public and placed on some of the pages on the website.
  • Jorge Fuentes González
    Jorge Fuentes González almost 11 years
    No no, 403 errors does not affect pagerank, don't worry. But I mean, how are you going to access those files? If you add a 403 error to those files, users and you are going to receive the same error, so is impossible to access them for you. If you want to store files but don't want to use them as website files, you can save them in another directory out of htdocs.
  • Gkhan14
    Gkhan14 almost 11 years
    Thanks! Let's say I have a directory that has a 403 forbidden error, however a few of the images in that directory don't have a 403 forbidden error, will that fact affect my ranking?