Google Webmaster Tools says my XML sitemap "appears to be an HTML page"

6,135

Solution 1

Farseeker's suggestion is a good first step in troubleshooting (a text/html content-type would certainly produce this result) - Google Webmaster Tools should display a different error message if the sitemap file contains invalid XML.

Given the temporary nature of the issue, have you checked your server logs to determine whether an error page was produced on Google's prior requests?

If you are dynamically generating sitemap files, a scripting error, database timeout, or other issue could produce an HTML error page intermittently.

Solution 2

Because of the content-type header that it's spitting out. Inspect it with your favourite tool (Firebug, etc) and see what it's sending.

Solution 3

You could extend the header to include the schema stuff:

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9
http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">

and then validate online

If it passes that it must be Google's problem.

Share:
6,135

Related videos on Youtube

Adam Lynch
Author by

Adam Lynch

Updated on September 18, 2022

Comments

  • Adam Lynch
    Adam Lynch almost 2 years

    We're running a lot of sites and we've started to get a lot of these errors in Webmaster Tools:

    Sitemap is HTML
    Your Sitemap appears to be an HTML page. Please use a supported sitemap format instead.

    One of the problematic sitemaps:

    <?xml version="1.0" encoding="UTF-8"?> 
    <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> 
        <url> 
            <loc>http://www.same_domain.co.uk/folder/file1.shtml</loc> 
            <lastmod>2011-05-11</lastmod> 
            <changefreq>weekly</changefreq> 
            <priority>0.5</priority> 
        </url> 
        <url> 
            <loc>http://www.same_domain.co.uk/folder/file2.shtml</loc> 
            <lastmod>2011-05-11</lastmod> 
            <changefreq>weekly</changefreq> 
            <priority>0.5</priority> 
        </url> 
        <url> 
            <loc>http://www.same_domain.co.uk/folder/file3.shtml</loc> 
            <lastmod>2011-05-11</lastmod> 
            <changefreq>weekly</changefreq> 
            <priority>0.5</priority> 
        </url> 
        <url> 
            <loc>http://www.same_domain.co.uk/folder/file4.shtml</loc> 
            <lastmod>2011-05-11</lastmod> 
            <changefreq>weekly</changefreq> 
            <priority>0.5</priority> 
        </url> 
    </urlset>
    

    Why would GWTs think this is anything but XML?

    (Server: IIS)


    Edit:

    "This document was successfully checked as well-formed XML!" -W3C Validator.


    Edit:

    I resubmitted two problematic sitemaps, one with no changes, and one with a couple of extra lines to ensure it's treated as XML. Ran the "Fetch as Googlebot" diagnostic tool. Both are fine now. I'm just going to re-submit all sitemaps with the "Sitemap is HTML" error

    The question remains:

    Why did this happen? Why did GWTs think these XML sitemaps were HTML?

  • Adam Lynch
    Adam Lynch about 13 years
    Nope. Content-Type text/xml
  • Adam Lynch
    Adam Lynch about 13 years
    "Line 2 (<urlset>): 192 SchemaLocation: schemaLocation value = 'sitemaps.org/schemas/sitemap/0.9/sitemap.xsd' must have even number of URI's. Line 2: 192 cvc-elt.1: Cannot find the declaration of element 'urlset'"
  • Adam Lynch
    Adam Lynch about 13 years
    "No errors were found" but does that mean that this will solve the errors? We have a lot of sitemaps
  • paulmorriss
    paulmorriss about 13 years
    It means it's valid, so I guess it's Google's problem. You could report it on the webmaster forums google.com/support/forum/p/Webmasters/… with all the info you've put here (right content type, extension, valid XML).
  • Adam Lynch
    Adam Lynch about 13 years
    I've now added the schema attributes & values you've given so I'm waiting to see if the problem is gone
  • paulmorriss
    paulmorriss about 13 years
    They aren't necessary according to the minimal sitemap example on sitemaps.org/protocol.php, however they do mean the XML validator can check the file, and won't do any harm.
  • Adam Lynch
    Adam Lynch about 13 years
    Ok then, I'll post in the forum
  • Jonx
    Jonx about 13 years
    @Adam Lynch - The correct content type is application/xml (Edit: A review of RFC 3023 leaves some ambiguity on this point, but try the application type for troubleshooting) ... Strike that - tested and GWT works with either content-type.
  • John Conde
    John Conde about 13 years
    An error message from the server would definitely send out HTML headers and is a very plausible explanation.