Best way to prevent Google from indexing a directory
Supposedly only robots.txt
will be fine, but that depends on how the crawler bot is programmed. Almost all 'legal' crawlers (If not all) use robots.txt
, and for disabling a directory robots.txt
is the best option, as you don't have to add a meta tag to each file, and also disallows txt files, images and everything you forgot/can't add a meta tag and may be linked from elsewhere. Also, if you upload more files in the future to the directory, you don't have to worry about remembering about adding meta tags too.
Definitely is safer to use robots.txt
to deny the access to an entire directory.
EDIT: Although Disallow
tells robots to disallow a directory from being crawled, maybe Google indexes it, as stated here: https://developers.google.com/webmasters/control-crawl-index/docs/faq#h17 so using the tag instead of the robots.txt
will be better.
However, robots.txt Disallow does not guarantee that a page will not appear in results: Google may still decide, based on external information such as incoming links, that it is relevant.
Related videos on Youtube
Gkhan14
Updated on September 18, 2022Comments
-
Gkhan14 over 1 year
I've researched many methods on how to prevent Google/other search engines from crawling a specific directory. The two most popular ones I've seen are:
- Adding it into the robots.txt file:
Disallow: /directory/
- Adding a meta tag:
<meta name="robots" content="noindex, nofollow">
Which method would work the best? I want this directory to remain "invisible" from search engines so it does not affect any of my site's ranking.
In other words, I want this directory to be neutral/invisible and "just there." I don't want it to affect any ranking. Which method would be the best to achieve this?
- Adding it into the robots.txt file:
-
Gkhan14 almost 11 yearsAwesome! Also, by adding this to the robots.txt file, will it make the directory "invisible" to the search engine? Meaning will it not affect the ranking of the other pages on my site.
-
Jorge Fuentes González almost 11 yearsWait, I'm adding more info. Just read the Google faq and his bot is a bit special with
robots.txt
. -
Jorge Fuentes González almost 11 yearsAnd, in reply about the pagerank, that depends on how Google calculates pagerank exactly and they don't tell much about that when talking about bad practices, but seems that having 'Duplicate content' (Google hates that so much) don't affects your pagerank as long as the page has the
nonindex
tag, as you can read here: support.google.com/webmasters/answer/66359?hl=en so maybe that works with other practices. -
Gkhan14 almost 11 yearsThanks! Lastly, do you think it's safe to just place a 404 error for the entire directory in my .htaccess file, according to the IP? This directory is mainly just for me and a few other people.
-
Jorge Fuentes González almost 11 yearsWhat do you mean with safe? .htaccess can only be edited by the admin/admins that has write access to the file so any normal user can edit it, is as safe as your config files. If you like the main answer, please, accept it by clicking the 'V' at his left please :P
-
Gkhan14 almost 11 yearsI mean if by adding a 404 error, and making it look like the directory doesn't even exist to Google, will it just leave my rankings the way they are now?
-
Jorge Fuentes González almost 11 yearsSupposedly any robot will know that your file exists if there isn't any link to it on the internet, so in your index file inside your directory you add a blank page or an error page or something, then any bot can't notice it, they cannot guess the file name by magic :P The problem is that if users navigate to your file, is possible that by "magic" Google will know that your file exists, because it crawls the links that the users navigate altough they arent linked.
-
Jorge Fuentes González almost 11 yearsI mean, Google will know all the links you enter, and eventually index them. For example, some time ago I sent to about 200k Spanish users a mobile message with a link on it, because SMS are too short the link dont had an space between the link and a final dot, like this:
click here http://mylink.com/mobile/.For more info call 123456789
. As you can see, the link ismylink.com/mobile/
but some mobile phones, when clicking the link, sent the user tomylink.com/mobile/.For
but well, after some monthsmylink.com/mobile/.For
started appearing in search results... -
Jorge Fuentes González almost 11 yearsSo yes, Google indexes all the sites that users navigates because of Chrome.
-
Gkhan14 almost 11 yearsThanks! My last (and hopefully FINAL :P) question is that: Only a few files in the directory will be available to the public, the rest of the files (including the main directory URL) will have either a 404 or a 403 error. Which one would be better for this case (404 or 403)? Keep in mind that I want to maintain the ranking of the rest of the pages on my site, so I don't want this 404 or 403 error affecting it.
-
Jorge Fuentes González almost 11 yearsBUT, if your site is a really bad site (Download malware without user permission or a phising site), that will affect pagerank really bad altough you add a nonindex tag, and also access to the entire website and all sites that load content from that website (images or something) will be blocked from almost all well known browsers with a "this site has malware or is a phishing site" alert. Is obvious that people that has malware sites and phising sites don't want this to happen so also is obviuos that adding tags or disallowing robots from accesing the site is not going to work...
-
Jorge Fuentes González almost 11 yearsIf not, it would be really easy to have malware sites without being detected.
-
Jorge Fuentes González almost 11 yearsHaving a 404/403 error wont affect pagerank. If you want your files being really invisible and unacknowledgable use a 404, otherwise instead of telling that "There is nothing to see" you are telling users that receive 403 that "There is something. but you cant see it". By the way, if you add a 404 error when accesing the file, how are you going to access it? I presume that you are trying to hide files so you are the only one to access them.
-
Gkhan14 almost 11 yearsPrecisely! So all in all, if I add a 403 error to the directory and only to a FEW files in the directory, will it affect the overall ranking of the other content on my website? Some of the files that aren't affected by the 403 error (only images) on the directory will be visible to the public and placed on some of the pages on the website.
-
Jorge Fuentes González almost 11 yearsNo no, 403 errors does not affect pagerank, don't worry. But I mean, how are you going to access those files? If you add a 403 error to those files, users and you are going to receive the same error, so is impossible to access them for you. If you want to store files but don't want to use them as website files, you can save them in another directory out of
htdocs
. -
Gkhan14 almost 11 yearsThanks! Let's say I have a directory that has a 403 forbidden error, however a few of the images in that directory don't have a 403 forbidden error, will that fact affect my ranking?