Is there c++ function that replace xml Special Character with their escape sequence?

14,031

Solution 1

As has been stated, it would be possible to write your own. For example:

#include <iostream>
#include <string>
#include <map>

int main()
{
    std::string xml("a < > & ' \" string");
    std::cout << xml << "\n";

    // Characters to be transformed.
    //
    std::map<char, std::string> transformations;
    transformations['&']  = std::string("&amp;");
    transformations['\''] = std::string("&apos;");
    transformations['"']  = std::string("&quot;");
    transformations['>']  = std::string("&gt;");
    transformations['<']  = std::string("&lt;");

    // Build list of characters to be searched for.
    //
    std::string reserved_chars;
    for (auto ti = transformations.begin(); ti != transformations.end(); ti++)
    {
        reserved_chars += ti->first;
    }

    size_t pos = 0;
    while (std::string::npos != (pos = xml.find_first_of(reserved_chars, pos)))
    {
        xml.replace(pos, 1, transformations[xml[pos]]);
        pos++;
    }

    std::cout << xml << "\n";

    return 0;
}

Output:

a < > & ' " string
a &lt; &gt; &amp; &apos; &quot; string

Add an entry into transformations to introduce new transformations.

Solution 2

Writing your own is easy enough, but scanning the string multiple times to search/replace individual characters can be inefficient:

std::string escape(const std::string& src) {
    std::stringstream dst;
    for (char ch : src) {
        switch (ch) {
            case '&': dst << "&amp;"; break;
            case '\'': dst << "&apos;"; break;
            case '"': dst << "&quot;"; break;
            case '<': dst << "&lt;"; break;
            case '>': dst << "&gt;"; break;
            default: dst << ch; break;
        }
    }
    return dst.str();
}

Note: I used a C++11 range-based for loop for convenience, but you can easily do the same thing with an iterator.

Solution 3

These types of functions should be standard and we should never have to rewrite them. If you are using VS, have a look at atlenc.h This file is part of the VS installation. Inside the file there is a function called EscapeXML which is much more complete then any of the examples above.

Solution 4

There is a function, I namely just wrote it:

void replace_all(std::string& str, const std::string& old, const std::string& repl) {
    size_t pos = 0;
    while ((pos = str.find(old, pos)) != std::string::npos) {
        str.replace(pos, old.length(), repl);
        pos += repl.length();
    }
}

std::string escape_xml(std::string str) {
    replace_all(str, std::string("&"), std::string("&amp;"));
    replace_all(str, std::string("'"), std::string("&apos;"));
    replace_all(str, std::string("\""), std::string("&quot;"));
    replace_all(str, std::string(">"), std::string("&gt;"));
    replace_all(str, std::string("<"), std::string("&lt;"));

    return str;
}

Solution 5

I slightly modified Ferruccio's solution to also eliminate the other characters that are in the way, such as anything < 0x20 and so on (found somewhere on the Internet). Tested and working.

    void strip_tags(string* s) {
    regex kj("</?(.*)>");
    *s = regex_replace(*s, kj, "", boost::format_all);

    std::map<char, std::string> transformations;
    transformations['&']  = std::string("&amp; ");
    transformations['\''] = std::string("&apos; ");
    transformations['"']  = std::string("&quot; ");
    transformations['>']  = std::string("&gt; ");
    transformations['<']  = std::string("&lt; ");

  // Build list of characters to be searched for.
    //
    std::string reserved_chars;
    for ( std::map<char, std::string>::iterator ti = transformations.begin(); ti != transformations.end(); ti++)
    {
        reserved_chars += ti->first;
    }

    size_t pos = 0;
    while (std::string::npos != (pos = (*s).find_first_of(reserved_chars, pos)))
    {
        s->replace(pos, 1, transformations[(*s)[pos]]);
        pos++;
    }



}


string removeTroublesomeCharacters(string inString)
{

    if (inString.empty()) return "";

    string newString;
    char ch;

    for (int i = 0; i < inString.length(); i++)
    {

        ch = inString[i];
        // remove any characters outside the valid UTF-8 range as well as all control characters
        // except tabs and new lines
        if ((ch < 0x00FD && ch > 0x001F) || ch == '\t' || ch == '\n' || ch == '\r')
        {
            newString.push_back(ch);
        }
    }
    return newString;

So in this case, there are two functions. We can get the result with something like:

string StartingString ("Some_value");
string FinalString = removeTroublesomeCharacters(strip_tags(&StartingString));

Hope it helps!

(Oh yeah: credit for the other function goes to the author of the answer here: How do you remove invalid hexadecimal characters from an XML-based data source prior to constructing an XmlReader or XPathDocument that uses the data? )

Share:
14,031
Dor Cohen
Author by

Dor Cohen

C# and web developer

Updated on July 27, 2022

Comments

  • Dor Cohen
    Dor Cohen almost 2 years

    I search the web alot and didn't find c++ function that replace xml Special Character with their escape sequence? Is there something like this?

    I know about the following:

    Special Character   Escape Sequence Purpose  
    &                   &amp;           Ampersand sign 
    '                   &apos;          Single quote 
    "                   &quot;          Double quote
    >                   &gt;            Greater than 
    <                   &lt;            Less than
    

    is there more? what about writing hexadecimal value like 0×00, Is this also a problem?

  • Mr Lister
    Mr Lister about 12 years
    Most xml generators and xml readers are very generous with characters under 0x20; so that would be not that much of a problem. The xml 1.1 standard even formally accepts them (as character references, not the characters themselves). The exception is 0x00, which is not allowed in any shape or form.
  • Dor Cohen
    Dor Cohen about 12 years
  • Mr Lister
    Mr Lister about 12 years
    Yes, that article confirms that you can't store 0x00 chars in an XML file and demonstrates how to remove them. Does that help you?