Is there c++ function that replace xml Special Character with their escape sequence?
Solution 1
As has been stated, it would be possible to write your own. For example:
#include <iostream>
#include <string>
#include <map>
int main()
{
std::string xml("a < > & ' \" string");
std::cout << xml << "\n";
// Characters to be transformed.
//
std::map<char, std::string> transformations;
transformations['&'] = std::string("&");
transformations['\''] = std::string("'");
transformations['"'] = std::string(""");
transformations['>'] = std::string(">");
transformations['<'] = std::string("<");
// Build list of characters to be searched for.
//
std::string reserved_chars;
for (auto ti = transformations.begin(); ti != transformations.end(); ti++)
{
reserved_chars += ti->first;
}
size_t pos = 0;
while (std::string::npos != (pos = xml.find_first_of(reserved_chars, pos)))
{
xml.replace(pos, 1, transformations[xml[pos]]);
pos++;
}
std::cout << xml << "\n";
return 0;
}
Output:
a < > & ' " string
a < > & ' " string
Add an entry into transformations
to introduce new transformations.
Solution 2
Writing your own is easy enough, but scanning the string multiple times to search/replace individual characters can be inefficient:
std::string escape(const std::string& src) {
std::stringstream dst;
for (char ch : src) {
switch (ch) {
case '&': dst << "&"; break;
case '\'': dst << "'"; break;
case '"': dst << """; break;
case '<': dst << "<"; break;
case '>': dst << ">"; break;
default: dst << ch; break;
}
}
return dst.str();
}
Note: I used a C++11 range-based for loop for convenience, but you can easily do the same thing with an iterator.
Solution 3
These types of functions should be standard and we should never have to rewrite them. If you are using VS, have a look at atlenc.h This file is part of the VS installation. Inside the file there is a function called EscapeXML which is much more complete then any of the examples above.
Solution 4
There is a function, I namely just wrote it:
void replace_all(std::string& str, const std::string& old, const std::string& repl) {
size_t pos = 0;
while ((pos = str.find(old, pos)) != std::string::npos) {
str.replace(pos, old.length(), repl);
pos += repl.length();
}
}
std::string escape_xml(std::string str) {
replace_all(str, std::string("&"), std::string("&"));
replace_all(str, std::string("'"), std::string("'"));
replace_all(str, std::string("\""), std::string("""));
replace_all(str, std::string(">"), std::string(">"));
replace_all(str, std::string("<"), std::string("<"));
return str;
}
Solution 5
I slightly modified Ferruccio's solution to also eliminate the other characters that are in the way, such as anything < 0x20 and so on (found somewhere on the Internet). Tested and working.
void strip_tags(string* s) {
regex kj("</?(.*)>");
*s = regex_replace(*s, kj, "", boost::format_all);
std::map<char, std::string> transformations;
transformations['&'] = std::string("& ");
transformations['\''] = std::string("' ");
transformations['"'] = std::string("" ");
transformations['>'] = std::string("> ");
transformations['<'] = std::string("< ");
// Build list of characters to be searched for.
//
std::string reserved_chars;
for ( std::map<char, std::string>::iterator ti = transformations.begin(); ti != transformations.end(); ti++)
{
reserved_chars += ti->first;
}
size_t pos = 0;
while (std::string::npos != (pos = (*s).find_first_of(reserved_chars, pos)))
{
s->replace(pos, 1, transformations[(*s)[pos]]);
pos++;
}
}
string removeTroublesomeCharacters(string inString)
{
if (inString.empty()) return "";
string newString;
char ch;
for (int i = 0; i < inString.length(); i++)
{
ch = inString[i];
// remove any characters outside the valid UTF-8 range as well as all control characters
// except tabs and new lines
if ((ch < 0x00FD && ch > 0x001F) || ch == '\t' || ch == '\n' || ch == '\r')
{
newString.push_back(ch);
}
}
return newString;
So in this case, there are two functions. We can get the result with something like:
string StartingString ("Some_value");
string FinalString = removeTroublesomeCharacters(strip_tags(&StartingString));
Hope it helps!
(Oh yeah: credit for the other function goes to the author of the answer here: How do you remove invalid hexadecimal characters from an XML-based data source prior to constructing an XmlReader or XPathDocument that uses the data? )
Comments
-
Dor Cohen almost 2 years
I search the web alot and didn't find c++ function that replace xml Special Character with their escape sequence? Is there something like this?
I know about the following:
Special Character Escape Sequence Purpose & & Ampersand sign ' ' Single quote " " Double quote > > Greater than < < Less than
is there more? what about writing hexadecimal value like 0×00, Is this also a problem?
-
Mr Lister about 12 yearsMost xml generators and xml readers are very generous with characters under 0x20; so that would be not that much of a problem. The xml 1.1 standard even formally accepts them (as character references, not the characters themselves). The exception is 0x00, which is not allowed in any shape or form.
-
Dor Cohen about 12 years@MrLister read this seattlesoftware.wordpress.com/2008/09/11/…
-
Mr Lister about 12 yearsYes, that article confirms that you can't store 0x00 chars in an XML file and demonstrates how to remove them. Does that help you?