Ampersand (&) in actual URL and sitemap

6,081

Solution 1

if this difference of ampersand in URL and sitemap will cause any issue.

tl;dr No issue, because the URLs are the same.

Since in sitemap & has to be escaped I replaced & with & ...

Your sitemap is an XML document. As with any XML document, the data values must be stored XML-entity encoded. The & character is a special character (it itself denotes the start of an XML-entity) and therefore must be encoded to negate its special meaning. This is just the way data is stored inside an XML document.

When the XML document is read by an XML parser the data values are XML-entity decoded, back to the actual value. So, & becomes & when the XML document is read.

So, a URL of the form /page?foo=1&bar=2 stored inside an XML document is identical to the URL /page?foo=1&bar=2 in your HTML5 document.

My actual page URLs contain just &

In HTML5 that is perfectly OK, providing there is no ambiguity. However, in HTML4.1 (and earlier) you would have needed to correctly HTML-entity encode the & as & in your HTML source code for valid HTML. However, browsers are very tolerant and your HTML document would most probably have still "worked".

In HTML5 you only strictly need to HTML-entity encode the & if there is an ambiguity. Take the following contrived example. We want to pass the literal string "$" in the foo URL parameter.

<!-- In an HTML document (WRONG) -->
<a href="/page?foo=&dollar;">link</a>

The desired URL is http://example.com/page?foo=&dollar;, however, the above HTML anchor results in sending the user to http://example.com/page?foo=$ - which is not the intention. To create the desired result, the & must be HTML-entity encoded to negate its special meaning, resulting in the following (correct) HTML:

<!-- In an HTML document (CORRECT) -->
<a href="/page?foo=&amp;dollar;">link</a>

It is always safer to consistently HTML-entity encode the & in your HTML-document. If you are generating your content through a CMS, then this should be automatic.

I am able to access the site after replacing & with &amp; in the URL.

Presumably you mean "in the URL, in your HTML"? Because if you were to HTML-entity encode the & with &amp; in the browsers address bar (for instance), ie. outside of an HTML context, then you will not get the expected results. For example, if you typed the following directly into the browser's address bar:

/page?foo=1&amp;bar=2

Then you would get the two URL parameters [foo] => 1 and [amp;bar] => 2, which is clearly not the intention.

Solution 2

The URLs you include in the sitemap must follow the RFC-3986 standard. As you can read in the Google official guide, & must be escaped with &amp; so you are good to go.

Nevertheless, once you submit your sitemap through Google Searh Console you will be able to know if there is any problem with the URLs submitted.

Just in case, you should only include final URLs, that means that maybe there are some parameters you could avoid. For example:

 - https://example.com/shoes (good one)
 - https://example.com/shoes?order=1&amp;color=blue (remove this one)
Share:
6,081

Related videos on Youtube

user109308
Author by

user109308

Updated on September 18, 2022

Comments

  • user109308
    user109308 over 1 year

    I have a sitemap (which is submitted to Google for indexing) in which I am adding URLs which have an ampersand (&) in it. Since in sitemap & has to be escaped I replaced & with &amp; in the URLs in the sitemap. My actual page URLs contain just &.

    As I am new to Google webmasters and SEO, I wish to understand if this difference of ampersand in URL and sitemap will cause any issue. Will my pages get indexed? This may seem like a noob level question because I am able to access the site after replacing & with &amp; in the URL. But still any help on this front will be highly appreciated.

    • MrWhite
      MrWhite over 5 years
      In which part of the URL are you using an ampersand?
    • user109308
      user109308 over 5 years
      @MrWhite I am passing few parameters in the URL and I am separating those parameters using ampersand to separate parameters
  • user109308
    user109308 over 5 years
    Can we use example.com/shoes?order=1&color=blue as the actual URL and in sitemap as (example.com/shoes?order=1&amp;color=blue.
  • Emirodgar
    Emirodgar over 5 years
    Yes, you can as long as it's the main link. Meaning, that's the primary URL you use in internal linking and the content value for users is higher than the URL without the parameters.
  • Mikko Rantalainen
    Mikko Rantalainen over 5 years
    TL;DR: the URL contains a plain ampersand (&) but if you use sitemap.xml you need to encode plain ampersand as &amp;. The meaning of encoded ampersand is still plain ampersand. If you want to skip most of the encoding, use sitemap.txt format.