PHP generated XML shows invalid Char value 27 message
Solution 1
A useful function to get rid of that error is suggested on this website. http://www.phpwact.org/php/i18n/charsets#common_problem_areas_with_utf-8
When you put utf-8 encoded strings in a XML document you should remember that not all utf-8 valid chars are accepted in a XML document http://www.w3.org/TR/REC-xml/#charsets
So you should strip away the unwanted chars, else you’ll have an XML fatal parsing error such as above
function utf8_for_xml($string)
{
return preg_replace ('/[^\x{0009}\x{000a}\x{000d}\x{0020}-\x{D7FF}\x{E000}-\x{FFFD}]+/u', ' ', $string);
}
Hope that saves someone else some time..
Solution 2
Prashant is absolutely right. You can also strip away invalid characters in Javascript by doing:
function utf8_for_xml(inputStr) {
return inputStr.replace(/[^\x09\x0A\x0D\x20-\xFF\x85\xA0-\uD7FF\uE000-\uFDCF\uFDE0-\uFFFD]/gm, '');
}
Prashant
10+ years of web development experience and 3+ years of management experience. Worked with one fortune 500 company and several start-ups. Expertise in Full Stack development. Languages - PHP, Node, Python. Platforms - RPA Frontend Technologies: React, Angular JS, Webpack, SASS, CSS, Bootstrap, HTML, XML. Testing frameworks: Jasmine, Supertest, Mocha, Karma, PHPUnit and Selenium. Expertise in requirement gathering and align business goals with technology. Firm practitioner of Do not Repeat Yourself, thus believe in modular testable software. Good analytical skills. Confident managing diverse team. In-depth knowledge and ability to persuade with reason. Prioritize amidst conflicts and set reasonable expectations.
Updated on April 02, 2021Comments
-
Prashant about 3 years
I am generating XML using PHP library as below:
$dom = new DOMDocument("1.0","utf-8");
Doing above results in a page which shows a message on top of the output.
This page contains the following errors: error on line 16 at column 274505: PCDATA invalid Char value 27 Below is a rendering of the page up to the first error.
I have tried rectifying using Tidy library.. used iconv to get the chinese character in UTF-8.
-
Michal over 7 yearsThank you very much. I am quite surprised that php xml writer does not do these things itself.
-
Tom Lord over 7 yearsHere is an equivalent sanitisation function in ruby, in case anyone finds it useful:
sring.gsub(/[^\u{0009}\u{000a}\u{000d}\u{0020}-\u{D7FF}\u{E000}-\u{FFFD}]+/u, ' ')
... Or, more efficiently, this can also be achieved with:string.tr("^\u{0009}\u{000a}\u{000d}\u{0020}-\u{D7FF}\u{E000}-\u{FFFD}", ' ')
-
ijpatricio over 7 yearsThank you so much Prashant!!
-
Michal over 7 yearsThis is awesome. I see that I have liked this already. I want to give you another like.
-
Supun Kavinda over 3 yearsI wasted 2 days because of this. Thank you very much!
-
Wouter almost 3 yearsFor me, this function returns NULL. Possibly because the input is not UTF-8. Not sure what the input is...