Google keeps crawling pages that don't exist

9,555

This is going to happen for a looong time. Other sites might be linking to the old URLs, which will prompt Google to crawl them and/or your site might just be having problems (in Googles eyes) and is giving you the benefit of the doubt that the pages might return. Either way, Google continues to crawl old pages for a long time. It would be far worse for Google to suddenly stop crawling your pages after getting a bunch of 404's.

If the pages genuinely do not exist and there is no alternative then it is correct to return a 404 (Not Found). Or you can return a 410 (Gone) instead for these pages that are never going to return - that is a far stronger/definite indication to Google that the pages aren't coming back.

Note that serving a 404 (or 410) for these pages, and having them reported as such in GWT is not necessarily a bad thing, and will not detrimentally affect your site in itself. This is a private report for your benefit.

However, what can be bad for SEO is if there are other sites linking to your old page that would have otherwise passed PageRank. By returning a 404, these sites will no longer be able to pass that PageRank. If you have alternative pages for those removed, then 301 redirect to the new location in order to preserve your ranking, help search engines re-index your content and ... to please your users. But if you have simply removed the old content and not replaced it then you need to be prepared to take the potential SEO hit.

Share:
9,555

Related videos on Youtube

Hashman
Author by

Hashman

just an ordinary web dev

Updated on September 18, 2022

Comments

  • Hashman
    Hashman over 1 year

    I made a simple new website which went live about a month and a half ago (replacing the last one). The site contains new pages, like for example the navigation markup:

    <div class="header">
        <div class="navigation menu">
            <ul>
                <li><a href="index.php">Home page</a></li>
                <li><a href="products.php">Products</a></li>
                <li><a href="reference.php">Reference</a></li>
                <li><a href="about_us.php">About us</a></li>
                <li><a href="contact.php">Contact</a></li>
            </ul>
        </div>
    </div>
    

    The problem is, that google crawlers keep looking for the old pages, which I removed and aren't on the server.

    I keep removing the crawl errors (marked as fixed) in the webmaster tools, but the crawlers keep attempting to crawl those pages.

    I believe that the last website version is cached somewhere, so is there a way to remove it? (I don't have a lot of experience with google webmaster tools.)

    • Tim Fountain
      Tim Fountain over 10 years
      Do you redirect the old URLs to their new equivalents?
    • Hashman
      Hashman over 10 years
      @TimFountain I completely removed the old pages from the server, and I'm not using them anymore.
    • Beda Schmid
      Beda Schmid almost 3 years
      This is just one more google garbage - notice that neither Bing, nor Yahoo do this indexing of 404 pages. You will under circumstances have to wait years until Google realised the page is gone and stops indexing it, it has nothing to do with links from other pages, it is just how bad google works. The only you can do to avoid it is never even go to the moment where you delete a page or post or change it without a 301 redirect immediately following the action. Google tries to culprit the users (us) with it but clearly it is a flaw in their algorythms or bing would do the same, and it does not
  • Hashman
    Hashman over 10 years
    So your saying that is ok that webaster tools gives 404 for those sites that aren't anymore on the server ? Does that effect in SEOing ( when I type the keywords in google search, my new website is ranked bad) ?
  • Simon Hayter
    Simon Hayter over 10 years
    404's are completely normal if the page doesn't exist no more, the only time you should be worried is that if you have some good links point to pageB but pageB doesn't exist so no more so you lose out on those good links. So you'd 301 old link to new link, but only necessary if you have a large volume of links, otherwise 404 is perfectly valid.
  • Hashman
    Hashman over 10 years
    And will those errors effect in SEO ranking ? And is there a way to somehow remove in other pages link that are redirecting to pages on my site that doesn't exist ?
  • MrWhite
    MrWhite over 10 years
    As bybe states above, the detrimental effect in SEO is if their are other sites that are linking to your old page. In returning a 404 you are no longer benefiting from any PageRank that the other sites would otherwise be passing to you. If you have a suitable new page on your site then you should 301-redirect requests for the old page to the new. (You cannot do anything else to influence the other sites links, other than manually informing them!) A 301-redirect will also help the search engines to update their index.
  • MrWhite
    MrWhite over 10 years
    However, if you have chosen to remove the old content and there is nothing related that replaces it then that is a decision you have made and should be prepared to take the SEO hit. A 410 is probably the correct response in this instance. The fact that Google is reporting a 404 in the GWT report is not itself a problem in terms of SEO - if it is expected - that is what I was referring to in my answer. I'll update my answer to clarify this.
  • MrWhite
    MrWhite over 10 years
    This simply masks the error in GWT and does not benefit you. It is better to see the 404 in GWT - if it is indeed a genuine 404 with no alternative content. Redirecting, what should be a 404, to the homepage just confuses search engines and users.
  • Arne Kröger
    Arne Kröger over 10 years
    @w3d That's what I said. I redirected to a 404. I confused you by also saying 'homepage' because one of the redirects needed to go to the homepage cause that's what it was for. (It's complicated)
  • MrWhite
    MrWhite over 10 years
    Sorry, but that also confused me, since you don't normally "redirect to a 404". Often when people say that, what they are actually doing is an external redirection to a what looks like a 404 page and inadvertently returning a 200 status - which could potentially hide the crawl error in GWT (or it results in a "soft-404"). Apologies if I took your statement too literally, but it seemed ambiguous to me so I thought it was worth pointing out.