What does Google do with indexed pages returning 403?

5,526

Solution 1

As you know Google will eventually remove pages which return server codes 404 Not Found and 410 Gone, however a 403 is a forbidden response from the server not an error.

Google will most likely continue checking the page and keep it in it's index in the event it becomes accessible. If you would like it to not be indexed set the meta noindex header tag

Solution 2

They get removed from the index. There is no official documentation from Google on this, but you can find plenty of cases where websites have been returning 403 by mistake and got deindexed.

Example case : http://wordpress.org/support/topic/403-crawl-error-on-google-all-pages-have-been-removed-from-google-index

Share:
5,526

Related videos on Youtube

Admin
Author by

Admin

Updated on September 18, 2022

Comments

  • Admin
    Admin over 1 year

    I guess that Google removes already indexed pages that are now returning the HTTP error 403.

    Unfortunately I can't find any definitive statement on that matter. Do you know the answer?

  • Zoltan Kochan
    Zoltan Kochan almost 12 years
    @ilanco Unfortunately your answers are conflicting. Does anyone of you have a source supporting your opinion?
  • BeowulfNode42
    BeowulfNode42 about 7 years
    support.google.com/webmasters/answer/1663419 says "To remove content or a URL from Google search permanently you must take one or more of the following additional actions: return, 404, 410, or 'Block access to the content, for example by requiring a password' or 'use the noindex metatag'. The block access could be a 403 but it would be nice if their help page used the standard terminology.
  • Barry
    Barry over 6 years
    The example case says his server was misconfigured in June and he noticed his sites had been removed by August. He then marks it solved after fixing his misconfiguration. Unfortunately we still don't know how quickly Google re-indexed his sites, nor do we know how quickly it de-listed them after the misconfiguration (it might have been before he noticed it).
  • Barry
    Barry over 6 years
    Requiring a password would normally be status code 401, not 403. But the generic "block access to the content" advice might be referring to the fact that Google periodically checks that your pages still host the same content as they did before. For example, let's say you have a page about astronomy, well that should be in search results about astronomy only as long as it continues to be about astronomy. If it gets replaced by a page about a completely different subject, or a password prompt, then Google would seem reasonable to remove it from the astronomy results.