How to handle HTML entity nbsp in XSLT. Without changing the input file

15,312

Solution 1

You could use XML Entities to create an XML file that defines the nbsp entity, and includes the (broken) XML fragment.

For example, assume that your fragment is saved as a file called: "invalid.xml"

<div><span>&nbsp;some text</span></div>

Create an XML file like this:

<!DOCTYPE wrapper [
   <!ENTITY nbsp "&#160;">
   <!ENTITY invalid-xml-document SYSTEM "./invalid.xml">
]><wrapper>
&invalid-xml-document;</wrapper>

When it that file gets parsed, it will have defined the nbsp entity, include the content from the "invalid.xml", and resolve the nbsp entity properly. The result is this:

<wrapper>
  <div>
    <span> some text</span> 
  </div>
</wrapper>

Then, just adjust your XSLT to accomodate the new document element (in this example the element <wrapper>).

Solution 2

As far as I know, you're going to need to make changes to the input file.

Either by changing your &nbsp; to &#160; or by declaring a custom doctype that will do the conversion for you:

<!DOCTYPE doctypeName [
   <!ENTITY nbsp "&#160;">
]> 

This is because &nbsp; isn't one of XMLs predefined entities.

Share:
15,312
Ramesh
Author by

Ramesh

??------????

Updated on July 01, 2022

Comments

  • Ramesh
    Ramesh almost 2 years

    I am trying to convert an HTML file into XML file using XSLT (Using Oxygen 9.0 for transformation).

    When I configure and run the XSLT transformation with the HTML file then Oxygen outputs

    The entity 'nbsp' was referenced,but not declared.

    My input html file is:

    <div><span>&nbsp;some text</span></div>
    

    Note: I want to know how handle that entity only using the XSLT, I don't want to make any changes to the input file.