PHP generated XML shows invalid Char value 27 message

36,581

Solution 1

A useful function to get rid of that error is suggested on this website. http://www.phpwact.org/php/i18n/charsets#common_problem_areas_with_utf-8

When you put utf-8 encoded strings in a XML document you should remember that not all utf-8 valid chars are accepted in a XML document http://www.w3.org/TR/REC-xml/#charsets

So you should strip away the unwanted chars, else you’ll have an XML fatal parsing error such as above

function utf8_for_xml($string)
{
    return preg_replace ('/[^\x{0009}\x{000a}\x{000d}\x{0020}-\x{D7FF}\x{E000}-\x{FFFD}]+/u', ' ', $string);
}

Hope that saves someone else some time..

Solution 2

Prashant is absolutely right. You can also strip away invalid characters in Javascript by doing:

function utf8_for_xml(inputStr) {
  return inputStr.replace(/[^\x09\x0A\x0D\x20-\xFF\x85\xA0-\uD7FF\uE000-\uFDCF\uFDE0-\uFFFD]/gm, '');
}
Share:
36,581
Prashant
Author by

Prashant

10+ years of web development experience and 3+ years of management experience. Worked with one fortune 500 company and several start-ups. Expertise in Full Stack development. Languages - PHP, Node, Python. Platforms - RPA Frontend Technologies: React, Angular JS, Webpack, SASS, CSS, Bootstrap, HTML, XML. Testing frameworks: Jasmine, Supertest, Mocha, Karma, PHPUnit and Selenium. Expertise in requirement gathering and align business goals with technology. Firm practitioner of Do not Repeat Yourself, thus believe in modular testable software. Good analytical skills. Confident managing diverse team. In-depth knowledge and ability to persuade with reason. Prioritize amidst conflicts and set reasonable expectations.

Updated on April 02, 2021

Comments

  • Prashant
    Prashant about 3 years

    I am generating XML using PHP library as below:

    $dom = new DOMDocument("1.0","utf-8");
    

    Doing above results in a page which shows a message on top of the output.

    This page contains the following errors: error on line 16 at column 274505: PCDATA invalid Char value 27 Below is a rendering of the page up to the first error.

    I have tried rectifying using Tidy library.. used iconv to get the chinese character in UTF-8.

  • Michal
    Michal over 7 years
    Thank you very much. I am quite surprised that php xml writer does not do these things itself.
  • Tom Lord
    Tom Lord over 7 years
    Here is an equivalent sanitisation function in ruby, in case anyone finds it useful: sring.gsub(/[^\u{0009}\u{000a}\u{000d}\u{0020}-\u{D7FF}\u{E0‌​00}-\u{FFFD}]+/u, ' ') ... Or, more efficiently, this can also be achieved with: string.tr("^\u{0009}\u{000a}\u{000d}\u{0020}-\u{D7FF}\u{E000‌​}-\u{FFFD}", ' ')
  • ijpatricio
    ijpatricio over 7 years
    Thank you so much Prashant!!
  • Michal
    Michal over 7 years
    This is awesome. I see that I have liked this already. I want to give you another like.
  • Supun Kavinda
    Supun Kavinda over 3 years
    I wasted 2 days because of this. Thank you very much!
  • Wouter
    Wouter almost 3 years
    For me, this function returns NULL. Possibly because the input is not UTF-8. Not sure what the input is...