Prevent XML sitemaps from showing up in Google search results

12,919

Solution 1

Google does index XML sitemaps (like any XML file). If Google is aware of a URL and it returns a valid response then it's going to pass Google's inclusion rules and could get indexed. Personally, I only submit the sitemap through GWT and include a Sitemap: reference in robots.txt and this is certainly enough to get it indexed.

The recommended method to prevent these files from being indexed by Google is to include an X-Robots-Tag HTTP response header when serving the XML sitemap. For example:

X-Robots-Tag: noindex

Just like including a robots META tag in HTML files, the X-Robots-Tag header can be used for any type of file.

Reference: This document (from Nov 2008!) appears to quote our very own John Mueller (Google) with regards to the use of the X-Robots-Tag response when dealing with XML sitemaps.
Yes, Google Will Index & Rank Your XML Sitemap File

For more information see Google's developer guide:
Robots meta tag and X-Robots-Tag HTTP header specifications

Solution 2

MrWhite's answer about using X-Robots-Tag appears to be the correct way to do this.

Here is code that can be used in .htaccess or Apache configuration files to do so. (Reference: WebmasterWorld - Sitemaps showing up in SERP - How to prevent this?)

<Files ~ "sitemap.*\.xml(\.gz)?$">
  Header append X-Robots-Tag "noindex"
</Files>

Under nginx the configuration would be as follows. (Reference: Yoast X-Robots-Tag examples)

location ~* sitemap.*\.xml(\.gz)?$ {
    add_header X-Robots-Tag "noindex";
}

Solution 3

Why does it matter?

If you can actually find your sitemap in SERP then you have bigger problems.

I would focus more on getting pages up with useful content instead. That way, you will have a very hard time even finding you sitemap. Not that you would care at that point anyway.

P.S.

Pretty much every one keeps sitemaps in the same place. So if someone wanted to find where you keep it they will :)

Share:
12,919

Related videos on Youtube

Stephen Ostermiller
Author by

Stephen Ostermiller

Updated on September 18, 2022

Comments

  • Stephen Ostermiller
    Stephen Ostermiller over 1 year

    How do I prevent my XML sitemap files from showing up in Google search results like this result of a site: search query:

    sitemap in search results

    I don't understand why Google would choose to show sitemap files in search results to begin with. These files are not meant for human consumption.

    Google needs to be able to crawl it to be able to process it, so I can't disallow it in robots.txt. I just don't want them to put it into the search results after processing it.

    • Admin
      Admin almost 10 years
      Huh. Interesting. The only thought I have is if you have a link to it on your site, or it appears in your sitemap file. As well, I am not sure if you refer to it in your robots.txt file if that may be a factor. I would not think so, just something to consider. I supply my sitemap only through Google WMT only and have not seen this issue- at least not yet. I can understand not wanting your sitemap public. I do not want mine public. Too many hackers/scrapers out there.
    • Admin
      Admin almost 10 years
      On this particular site I have /sitemap.xml listed in robots.txt and then that links to a different set of other sitemaps like /sitemap-123.xml and /sitemap-124.xml. I regenerate the sitemaps every day and the numbers change daily. The one that is indexed is a fairly old one. I don't link to it anywhere on my site, but it is possible that some other site has a link to it somewhere.
    • Admin
      Admin almost 10 years
      If it is not used, make sure it is deleted, then exclude it in your robots.txt file and it will drop from the SERPs fairly quickly. Oddly, the remove URL thing in Google WMT takes forever (months for me) while robots.txt is fairly quick.
    • Admin
      Admin almost 10 years
      Did you submit the XML sitemap to your GWMT account?
    • Admin
      Admin almost 10 years
      I submitted /sitemap.xml to GWT, but I don't submit the sub-sitemaps that change daily.
    • Admin
      Admin almost 10 years
      I personally submit a sitemap to Google WMT that is a unique name as opposed to just sitemap.xml. From there, it is not necessary to resubmit it. Google will check it from time to time and upload the file if it changes. In fact, I decided to let Google know I had a new sitemap once by resubmitting an updated file and Boy! was that a mistake. It slowed down the existing natural process by a few weeks. The sitemap that I submitted to Google is an index of sitemaps. All of my sitemaps can change anytime. This is okay. It all works. Google gets it.
    • Admin
      Admin almost 10 years
      Does the sitemap still exist? If not, does it show up as a 404 in GWT? I have this same problem with a few, but not all of my sites. I always just assumed there was some link out there to the ones that are indexed. Perhaps I was wrong.
    • Admin
      Admin almost 10 years
      The sitemap file still existed until today. I removed it and now it redirects to /sitemap.xml I assume that this particular sitemap will now fall out of the index. I'd like to prevent Google from showing them to search users in the future too.
    • Admin
      Admin over 6 years
      The only thing I can think off: Maybe there was a bug in the past where the sitemap wasn't a valid xml file. Or you shared a public link to the sitemap. In that situation Google might have indexed it. My advice would be: Add the sitemap to sitemap: in robots.txt and remove the indexed version with the removal tool: google.com/webmasters/tools/removals You only remove it from the index, Google will still be able to crawl it and use it as sitemap. And because you point to it in your robots.txt file, Google will be sure enough that's your sitemap and should be treated as such.
  • MrWhite
    MrWhite almost 10 years
    Can you please clarify your logic - your first sentence appears to conflict with your last?
  • Stephen Ostermiller
    Stephen Ostermiller almost 10 years
    Will Google still crawl a sitemap that is blocked in robots.txt? Do you have a reference to support the assertion?
  • Stephen Ostermiller
    Stephen Ostermiller almost 10 years
    I use Google for site search and I stumbled across a sitemap when using it. It would be very confusing for my users if they were to click on it.
  • dasickle
    dasickle almost 10 years
    How many of your users do you think use Google for site search?
  • Stephen Ostermiller
    Stephen Ostermiller almost 10 years
    All the users that type in search terms into the search box at the top of my pages.
  • dasickle
    dasickle almost 10 years
    In that case. Have you considered using something like swiftype.com for your site search? There are a top of others you can use. You can re-order, remove and add results. You also get great stats and etc.
  • Duarte Patrício
    Duarte Patrício over 9 years
    If you disallow the sitemap.xml, I'm pretty sure it would no longer be crawled. Not something you would want to happen!
  • Admin
    Admin over 9 years
    Google will not crawl any doc with the robots.txt disallowed. Not ordinarily, anyway...site maps included.
  • xameeramir
    xameeramir over 8 years
    Where shall I write the X-Robots-Tag: noindex header code? Inside sitemap.xml or robots.txt?
  • MrWhite
    MrWhite over 8 years
    @student It's an HTTP response header so it must be set before serving those files (as part of the HTTP response header) - it can't be set "inside" them. Depending on how you are serving these files you can set this in your server-side code (eg. in PHP header('X-Robots-Tag: noindex',true)) or, if you are using Apache then in your .htaccess file or server config. See Stephen's answer for example code. Also see Google's developer guide linked to above.