Can I invoke Google to check my robots.txt?

16,199

Solution 1

You can't make them re-download your robots.txt when you want them to. Google will re-crawl it and use the new data whenever they feel it is appropriate for your site. They tend to crawl it regularly so I wouldn't expect it to take long for your updated file to be found and your pages re-crawled and re-indexed. Keep in mind that it may take some time after the new robots.txt file is found before your pages are re-crawled and even more time for them to reappear in Google's search results.

Solution 2

I faced the same problem when I started my new website www.satyabrata.com on June 16.

I had a Disallow: / in my robots.txt, exactly like Oliver. There was also a warning message in Google Webmaster Tools about blocked URLs.

The problem was solved yesterday, June 18. I did the following. I am not sure which step worked.

  1. Health -> Fetch as Google: robots.txt and the home page. Then, submit to index.
  2. Settings -> Preffered domain: Display URL as www.satyabrata.com
  3. Optimization -> Sitemaps: Added XML sitemap.

The warning message about blocked URLs is gone now and a fresh robots.txt is shown downloaded in Google Webmaster Tools.

Presently, I have only two pages indexed in Google, the home page and robots.txt. I have 10 pages on the website. I hope the rest will get indexed soon.

Share:
16,199

Related videos on Youtube

Oliver Salzburg
Author by

Oliver Salzburg

Never forget: :w !sudo tee % Save a file in vim when you neglected to open the file with sudo but already made changes you don't want to lose. Ctrl+X,* Evaluate globbing on your current input on the bash command line. postfix flush Pump out the postfix queue on your backup MX after you've fixed the issue with your primary MX. git tag -l | xargs -n 1 git push --delete origin; git tag | xargs git tag -d Delete all tags from a git repo - remotely and locally. sudo apt-get purge $(for tag in "linux-image" "linux-headers"; do dpkg-query -W -f'${Package}\n' "$tag-[0-9]*.[0-9]*.[0-9]*" | sort -V | awk 'index($0,c){exit} //' c=$(uname -r | cut -d- -f1,2); done) Delete old kernels https://signup.microsoft.com/productkeystart Register new Office 365 product keys for an existing tenant.

Updated on September 18, 2022

Comments

  • Oliver Salzburg
    Oliver Salzburg over 1 year

    I read the answers in this question, but they still leave my question open: Does Google cache robots.txt?

    I didn't find a way in the Google Webmaster Tools to invoke a re-download of my robots.txt.

    Through some error, my robots.txt was replaced with:

    User-agent: *
    Disallow: /
    

    And now all my content was removed from Google search results.

    Obviously, I'm interested in correcting this as soon as possible. I already replaced the robots.txt, but I can't find a way to make Google update the cached version.

    enter image description here

    • Admin
      Admin about 12 years
      Just disallowing all your pages in robots.txt should generally not be enough to completely remove them from Google's results, as long as other sites still link to them.
    • Admin
      Admin about 11 years
      Hmm its a tricky one. ZenCart URLs seem to confuse the robots.txt web crawler bot and before you know it, you have blocked URLs that you don't want to be blocked. My experience is that you are better off without robots.txt, but just keeping a clean web site. I lost many web rank places due to this robots.txt error blocking of valid URLs. Because ZenCart uses dynamic URLs it seems to confuse the robots.txt web crawler resulting in blocking of URLs that you don't expect to be blocked. Not sure if it relates to the disabling of a category in ZenCart and then moving products out of that category a
  • Oliver Salzburg
    Oliver Salzburg about 12 years
    They re-read the file about 6 hours after I posted. Everything is back to normal by now.
  • Fiasco Labs
    Fiasco Labs about 12 years
    Whew! Back on track then!
  • studgeek
    studgeek almost 12 years
    According to them they check every day or so, but they probably check more often for busy sites. See webmasters.stackexchange.com/a/32949/17430.
  • studgeek
    studgeek almost 12 years
    I tried asking webmaster tools to fetch robots.txt, it complained it was denied by robots.txt :). So apparently that trick will not work if you have robots.txt doing a full block.
  • Kasapo
    Kasapo over 11 years
    Same here... Request for robots.txt denied by robots.txt! Hah!
  • Fiasco Labs
    Fiasco Labs over 11 years
    Whelp, if you put deny on the root then I guess you're kind of SOL. In my case, it was a subfolder that was being refused, so forcing a reread of robots.txt through the mechanisms provided actually worked.
  • Oliver Salzburg
    Oliver Salzburg over 10 years
    That's not what this question is about
  • Stephen Ostermiller
    Stephen Ostermiller about 6 years
    I don't see what DNS or having free site has to do with robots.txt or telling Google to refetch it.
  • Stefan Monov
    Stefan Monov about 6 years
    @StephenOstermiller: I don't see either, but the fact is, this helped in my case.
  • Stephen Ostermiller
    Stephen Ostermiller about 6 years
    It helped Google check your robots.txt?
  • Stefan Monov
    Stefan Monov about 6 years
    @StephenOstermiller: Yes.