PHP invalid character error

18,905

Solution 1

Simply you can not use an element name start with number

1OhmStable  <-- rename this
_1OhmStable <-- this is fine

php parse xml - error: StartTag: invalid element name

A nice article :- http://www.xml.com/pub/a/2001/07/25/namingparts.html

A Name is a token beginning with a letter or one of a few punctuation characters, and continuing with letters, digits, hyphens, underscores, colons, or full stops, together known as name characters.

Solution 2

You have not written where you get that error. In case it's after you cleaned the value, this is my guess:

preg_replace("/[^A-Za-z0-9]/", "",$c);

This replacement is not written for UTF-8 encoded strings (which are used by DOMDocument). You can make it UTF-8 compatible by using the u-modifier (PCRE8)­Docs:

preg_replace("/[^A-Za-z0-9]/u", "",$c);
                            ^

It's just a guess, I suggest you make it more precise in your question which part of your code triggers the error.

Solution 3

Even if __cleandata() will remove all other characters than latin alphabets a-z and numbers, it doesn't necessarily guarantee that the result is a valid XML name. Your function can return strings that begin with a number but numbers are illegal name start characters in XML, they can only appear in a name after the first name character. Also spaces are forbidden in names, so that is another point where your expected XML output would fail.

Share:
18,905
Ryan
Author by

Ryan

Updated on July 23, 2022

Comments

  • Ryan
    Ryan almost 2 years

    I'm getting this error when running this code: Fatal error: Uncaught exception 'DOMException' with message 'Invalid Character Error' in test.php:29 Stack trace: #0 test.php(29): DOMDocument->createElement('1OhmStable', 'a') #1 {main} thrown in test.php on line 29

    The nodes that from the original XML file do contain invalid characters, but as I am stripping the invalid characters away from the nodes, the nodes should be created. What type of encoding do I need to do on the original XML document? Do I need to decode the saveXML?

    function __cleanData($c) 
    {
        return preg_replace("/[^A-Za-z0-9]/", "",$c);
    }
    $xml = new DOMDocument('1.0', 'UTF-8');
    $xml->load('test.xml');    
    $xml->formatOutput = true; 
    
    $append = array();
    foreach ($xml->getElementsByTagName('product') as $product ) 
        {
            foreach($product->getElementsByTagName('name') as $name ) 
            {
    
                $append[] = $name;
            }
                    foreach ($append as $a)  
                    {
                        $nodeName = __cleanData($a->textContent);
    
                            $element = $xml->createElement(htmlentities($nodeName) , 'a');
                    }
            $product->removeChild($xml->getElementsByTagName('details')->item(0));
            $product->appendChild($element);
        }
    
    $result = $xml->saveXML();
    $file = "data.xml";
    file_put_contents($file,$result);
    

    This is what the original XML looks like:

    <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet href="/v1/xsl/xml_pretty_printer.xsl" type="text/xsl"?>
    <products>
    <product>
    <modelNumber>M100</modelNumber>
    <itemId>1553725</itemId>
    <details>
      <detail>
        <name>1 Ohm Stable</name>
        <value>600 x 1</value>
      </detail>
     </details>
    </product>
     </products>
    

    The new document is supposed to look like this:

     <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet href="/v1/xsl/xml_pretty_printer.xsl" type="text/xsl"?>
    <products>
    <product>
    <modelNumber>M100</modelNumber>
    <itemId>1553725</itemId>
      <1 Ohm Stable>
    
      </1 Ohm Stable>
    
      </product>
     </products>