How can I escape text for an XML document in Perl?

20,307

Solution 1

I personally prefer XML::LibXML - Perl binding for libxml. One of the pros - it uses one of the fastest XML processing library available. Here is an example for creating text node:

use XML::LibXML;
my $doc = XML::LibXML::Document->new('1.0',$some_encoding);
my $element = $doc->createElement($name);
$element->appendText($text);
$xml_fragment = $element->toString();
$xml_document = $doc->toString();

And, never, ever create XML by hand. It's gonna be bad for your health when people find out what you did.

Solution 2

The XML::Simple escape_value could be used also, but use of XML::Simple is not recommended for new programs. See this post post 17436965.

A manual escape could be done using regex (copied from escape_value):

$data =~ s/&/&/sg;
$data =~ s/</&lt;/sg;
$data =~ s/>/&gt;/sg;
$data =~ s/"/&quot;/sg;

Solution 3

I am not sure why you need to escape text that is in an XML file. If your file contains:

<foo>x < y</foo>

The file is not an XML file despite the proliferation of angle brackets. An XML file must contain valid data meaning something like this:

<foo>x &lt; y</foo>

or

<foo><![CDATA[x < y]]></foo>

Therefore, either:

  1. You are not asking for escaping data in an XML file. Rather, you want to figure out how to put character data in an XML file so the resulting file is valid XML; or

  2. You have some data in an XML file that needs to be escaped for some other reason.

Care to elaborate?

Solution 4

Use XML::Code.

From CPAN

XML::code escape()

Normally any content of the node will be escaped during rendering (i. e. special symbols like '&' will be replaced by corresponding entities). Call escape() with zero argument to prevent it:

        my $p = XML::Code->('p');
        $p->set_text ("&#8212;");
        $p->escape (0);
        print $p->code(); # prints <p>&#8212;</p>
        $p->escape (1);
        print $p->code(); # prints <p>&amp;#8212;</p>

Solution 5

XML::Entities:

use XML::Entities;
my $a_encoded = XML::Entities::numify('all', $a);

Edit: XML::Entities only numifies HTML entities. Use HTML::Entities encode_entities($a) instead

Share:
20,307
tacoscool
Author by

tacoscool

Updated on April 26, 2020

Comments

  • tacoscool
    tacoscool about 4 years

    Anyone know of any Perl module to escape text in an XML document?

    I'm generating XML which will contain text that was entered by the user. I want to correctly handle the text so that the resulting XML is well formed.

  • jrockway
    jrockway almost 15 years
    People get mad when you remind them that their pseudo-XML is not actually real XML. It is amusing... and sad. Anyway, I upvoted you :)
  • tacoscool
    tacoscool almost 15 years
    My question would be #1. I didn't realise my question wasn't clear. I'll update the question to clarify.
  • tacoscool
    tacoscool almost 15 years
    XML::Entities::numify seems only to convert named XML entities to numeric XML entities.
  • Updo2008
    Updo2008 almost 15 years
    You are right, my mistake. It is possible to use HTML::Entities and encode_entities instead.
  • tacoscool
    tacoscool almost 15 years
    Point taken. I shouldn't have created the XML by hand (they were simple XML documents when I started). I'll need to get around to rewriting those bits of code.
  • tacoscool
    tacoscool almost 15 years
    I've accepted this answer not for the XML::LibXML recommendation (I used XML::Writer) but for pointing out that it is not good practice to create XML by hand.
  • nick
    nick over 10 years
    Nice interface but too slow if you are writing millions of lines of XML.
  • zinking
    zinking almost 10 years
    here my case is I am putting one XML inside another SOAP, and the soap parser have problem parsing the encapsulated message.
  • zinking
    zinking almost 10 years
    this naive implementation usually works, but for cases a.txt = " a&b " this wont work.
  • muenalan
    muenalan about 9 years
    Note that XML::LibXML has non-perl dependencies and might not readily install on your platform.
  • arhak
    arhak almost 8 years
    you're missing $doc->setDocumentElement($element); if you want to get everything in
  • mivk
    mivk over 5 years
    Useless non-answer. Yes, the original question was not very clear, but it was still easy to guess, or you could have asked for clarification. I understnad your point, but it would come across better if it were with a useful answer (like most other answers on this page).
  • schulwitz
    schulwitz over 3 years
    @zinking It seems to work fine for me using the string you provided.