PHP library for parsing XML with a colons in tag names?

29,832

Solution 1

Say you have some xml like this.

<xhtml:div>
  <xhtml:em>italic</xhtml:em>
  <date>2010-02-01 06:00</date>
</xhtml:div>

You can access 'em' like this: $xml->children('xhtml', true)->div->em;

however, if you want the date field, this: $xml->children('xhtml', true)->div->date; wont work, because you are stuck in the xhtml namespace.

you must execute 'children' again to get back to the default namespace:

$xml->children('xhtml', true)->div->children()->date;

Solution 2

If you want to fix it quickly do this (I do when I feel lazy):

// Will replace : in tags and attributes names with _ allowing easy access
$xml = preg_replace('~(</?|\s)([a-z0-9_]+):~is', '$1$2_', $xml);

This will convert <xhtml: to <xhtml_ and </xhtml: to </xhtml_. Kind of hacky and can fail if CDATA NameSpaced XML container blocks are involved or UNICODE tag names but I'd say you are usually safe using it (hasn't failed me yet).

Solution 3

Colon denotes an XML namespace. The DOM has good support for namespaces.

Solution 4

I don't think it's a good idea to get rid of the colon or to replace it with something else as some people suggested. You can easily access elements that have a namespace prefix. You can either pass the URL that identifies the namespace as an argument to the children() method or pass the namespace prefix and "true" to the children() method. The second approach requires PHP 5.2 and up.

SimpleXMLElement::children

Share:
29,832
mpen
Author by

mpen

Updated on November 24, 2020

Comments

  • mpen
    mpen over 3 years

    I've been trying to use SimpleXML, but it doesn't seem to like XML that looks like this:

    <xhtml:div>sample <xhtml:em>italic</xhtml:em> text</xhtml:div>
    

    So what library will handle tags that look like that (have a colon in them)?

  • Matthew Scharley
    Matthew Scharley over 14 years
    SimpleXML probably does too, but the OP is looking for a tag "xhtml:div" instead of just "div".
  • mpen
    mpen over 14 years
    SimpleXML has some stuff for dealing with it, but I still can't get it to work right.
  • daveomcd
    daveomcd almost 13 years
    not sure why this isn't the selected answer. But for anyone in the future this is the one that solved my questions/problem! :)
  • mpen
    mpen about 11 years
    Isn't this exactly what Nathan Reed suggested in the answer I accepted? I agree that regex-fu is a dirty hack, but having to go through the children() selector isn't very fun either.
  • Patryk K
    Patryk K about 11 years
    Yes, the same. I just wanted to indicate the you can also pass the URL that identifies the namespace to the children() method which works with PHP 5 and up. IMHO, there is no need to do a dirty hack, when there is a core method available.
  • supersan
    supersan almost 7 years
    Works but it is a bit hacky.. screwed the contents of my <pubDate> tags and changed time
  • user2924019
    user2924019 about 4 years
    What about multiple tags with ':'? This is how I were trying to access it: $array['body']['Order']['Extensions']['data:AdditionalRefere‌​nces']['data:YourRef‌​']