Special Characters in XML

17,946

Solution 1

You are trying to use an HTML entity in a non-HTML or non-XHTML document. These entities are declared in the document's Document Type Definition (DTD).

You should use the numerical Unicode version of the entity reference. For example, in the case of » you should use »

Alternatively, you can define them in your XML document's DTD:

<!ENTITY entity-name "entity-value">
<!ENTITY raquo "&#187;">

Otherwise, if your document is UTF-8, I believe you can just use the actual character directly in your XML document.

»

Solution 2

did you specify a doc type for your file ?

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

I think you might get such errors if you forget to specify it.

Also sometimes the entities work if you specify them by number instead of name.

&#187; &#171; instead of &raquo; and &laquo;

Solution 3

You don't need to declare an entity in your DTD, or even use a DTD. You probably don't need to use the Unicode representation of the character. You certainly don't need to use a CDATA section.

What you need to do is use a DOM to build your XML instead of trying to build it with string manipulation. The DOM will fix this problem for you.

In C#, this code:

 XmlDocument d = new XmlDocument();
 d.LoadXml("<foo/>");
 char c = (char)187;
 d.DocumentElement.InnerText = "Here's that character: " + c;
 Debug.WriteLine(d.OuterXml);
 d.DocumentElement.InnerText = "Here it is as an HTML entity: &raquo;";
 Debug.WriteLine(d.OuterXml);

produces this output:

<foo>Here's that character: »</foo>
<foo>Here it is as an HTML entity: &amp;raquo;</foo>

As you can see from the first example, the » character is perfectly legal in XML text. But I don't think you're trying to represent that character.

I think you're trying to do what's in the second example, based on the error message that you reported. You're trying to represent the string of characters &raquo;. The proper way to represent that string of characters in XML text is by escaping the ampersand; thus: &amp;raquo;.

So if you must use string manipulation to build your XML, just make sure that you escape any ampersands in your source data. Not to belabor the point, but if you were using a DOM, this would have been done for you automatically.

One other thing. It's quite likely that in your original question, which now reads "I am using »", what you actually typed is "I am using &raquo;". The actual post doesn't look like that, though. If you need to represent text literally in markdown, enclose it in backticks; otherwise, HTML entities will get converted to their character representation when the post is rendered.

Solution 4

This is an issue because not all HTML entities are XML entity. You can import the DTD of HTML into your document as Pat suggested, or do one of the following:

Replace all the occurances of the special character with the numeric entity code:

&raquo; becomes &#187;

Wrap all occurances of the special characters in a CDATA Tag

<![CDATA[&raquo;]]>

Define entitys at the top of your document

<!DOCTYPE ROOT_XML_ELEMENT [ <!ENTITY raquo "&#187;"> ]>
Share:
17,946
BillZ
Author by

BillZ

Updated on June 01, 2022

Comments

  • BillZ
    BillZ almost 2 years

    I am creating a left navigation system utilizing xml and xsl. Everything was been going great until I tried to use a special character in my xml document. I am using &raquo; and I get th error.

    reason: Reference to undefined entity 'raquo'.
    error code: -1072898046

    How do I make this work?

  • BillZ
    BillZ over 15 years
    I am defining it as ». Double checked and no I didn;t forget the semicolon just missed when I pasted into here.
  • James Sulak
    James Sulak over 15 years
    Definitely use unicode characters or unicode entity references if you can. Named character references should be avoided in XML.
  • bortzmeyer
    bortzmeyer over 15 years
    I wonder why it has been downvoted. It is a perfectly correct answer.
  • Quentin
    Quentin over 13 years
    No, that's an XHTML Doctype. XHTML is an XML application and it defines &raquo;.
  • Nic Gibson
    Nic Gibson over 13 years
    It's quite possible that the OP doesn't have a DTD for his XML. Even then, your answer could be used inside an internal subset if the user wanted. However, you are right that the simple answer is UTF-8 and just use the character.