PHP convert XML to JSON

333,093

Solution 1

I figured it out. json_encode handles objects differently than strings. I cast the object to a string and it works now.

foreach($xml->children() as $state)
{
    $states[]= array('state' => (string)$state->name); 
}       
echo json_encode($states);

Solution 2

Json & Array from XML in 3 lines:

$xml = simplexml_load_string($xml_string);
$json = json_encode($xml);
$array = json_decode($json,TRUE);

Solution 3

Sorry for answering an old post, but this article outlines an approach that is relatively short, concise and easy to maintain. I tested it myself and works pretty well.

http://lostechies.com/seanbiefeld/2011/10/21/simple-xml-to-json-with-php/

<?php   
class XmlToJson {
    public function Parse ($url) {
        $fileContents= file_get_contents($url);
        $fileContents = str_replace(array("\n", "\r", "\t"), '', $fileContents);
        $fileContents = trim(str_replace('"', "'", $fileContents));
        $simpleXml = simplexml_load_string($fileContents);
        $json = json_encode($simpleXml);

        return $json;
    }
}
?>

Solution 4

I guess I'm a bit late to the party but I have written a small function to accomplish this task. It also takes care of attributes, text content and even if multiple nodes with the same node-name are siblings.

Dislaimer: I'm not a PHP native, so please bear with simple mistakes.

function xml2js($xmlnode) {
    $root = (func_num_args() > 1 ? false : true);
    $jsnode = array();

    if (!$root) {
        if (count($xmlnode->attributes()) > 0){
            $jsnode["$"] = array();
            foreach($xmlnode->attributes() as $key => $value)
                $jsnode["$"][$key] = (string)$value;
        }

        $textcontent = trim((string)$xmlnode);
        if (count($textcontent) > 0)
            $jsnode["_"] = $textcontent;

        foreach ($xmlnode->children() as $childxmlnode) {
            $childname = $childxmlnode->getName();
            if (!array_key_exists($childname, $jsnode))
                $jsnode[$childname] = array();
            array_push($jsnode[$childname], xml2js($childxmlnode, true));
        }
        return $jsnode;
    } else {
        $nodename = $xmlnode->getName();
        $jsnode[$nodename] = array();
        array_push($jsnode[$nodename], xml2js($xmlnode, true));
        return json_encode($jsnode);
    }
}   

Usage example:

$xml = simplexml_load_file("myfile.xml");
echo xml2js($xml);

Example Input (myfile.xml):

<family name="Johnson">
    <child name="John" age="5">
        <toy status="old">Trooper</toy>
        <toy status="old">Ultrablock</toy>
        <toy status="new">Bike</toy>
    </child>
</family>

Example output:

{"family":[{"$":{"name":"Johnson"},"child":[{"$":{"name":"John","age":"5"},"toy":[{"$":{"status":"old"},"_":"Trooper"},{"$":{"status":"old"},"_":"Ultrablock"},{"$":{"status":"new"},"_":"Bike"}]}]}]}

Pretty printed:

{
    "family" : [{
            "$" : {
                "name" : "Johnson"
            },
            "child" : [{
                    "$" : {
                        "name" : "John",
                        "age" : "5"
                    },
                    "toy" : [{
                            "$" : {
                                "status" : "old"
                            },
                            "_" : "Trooper"
                        }, {
                            "$" : {
                                "status" : "old"
                            },
                            "_" : "Ultrablock"
                        }, {
                            "$" : {
                                "status" : "new"
                            },
                            "_" : "Bike"
                        }
                    ]
                }
            ]
        }
    ]
}

Quirks to keep in mind: Several tags with the same tagname can be siblings. Other solutions will most likely drop all but the last sibling. To avoid this each and every single node, even if it only has one child, is an array which hold an object for each instance of the tagname. (See multiple "" elements in example)

Even the root element, of which only one should exist in a valid XML document is stored as array with an object of the instance, just to have a consistent data structure.

To be able to distinguish between XML node content and XML attributes each objects attributes are stored in the "$" and the content in the "_" child.

Edit: I forgot to show the output for your example input data

{
    "states" : [{
            "state" : [{
                    "$" : {
                        "id" : "AL"
                    },
                    "name" : [{
                            "_" : "Alabama"
                        }
                    ]
                }, {
                    "$" : {
                        "id" : "AK"
                    },
                    "name" : [{
                            "_" : "Alaska"
                        }
                    ]
                }
            ]
        }
    ]
}

Solution 5

A common pitfall is to forget that json_encode() does not respect elements with a textvalue and attribute(s). It will choose one of those, meaning dataloss. The function below solves that problem. If one decides to go for the json_encode/decode way, the following function is advised.

function json_prepare_xml($domNode) {
  foreach($domNode->childNodes as $node) {
    if($node->hasChildNodes()) {
      json_prepare_xml($node);
    } else {
      if($domNode->hasAttributes() && strlen($domNode->nodeValue)){
         $domNode->setAttribute("nodeValue", $node->textContent);
         $node->nodeValue = "";
      }
    }
  }
}

$dom = new DOMDocument();
$dom->loadXML( file_get_contents($xmlfile) );
json_prepare_xml($dom);
$sxml = simplexml_load_string( $dom->saveXML() );
$json = json_decode( json_encode( $sxml ) );

by doing so, <foo bar="3">Lorem</foo> will not end up as {"foo":"Lorem"} in your JSON.

Share:
333,093

Related videos on Youtube

Bryan Hadlock
Author by

Bryan Hadlock

Updated on February 24, 2022

Comments

  • Bryan Hadlock
    Bryan Hadlock about 2 years

    I am trying to convert xml to json in php. If I do a simple convert using simple xml and json_encode none of the attributes in the xml show.

    $xml = simplexml_load_file("states.xml");
    echo json_encode($xml);
    

    So I am trying to manually parse it like this.

    foreach($xml->children() as $state)
    {
        $states[]= array('state' => $state->name); 
    }       
    echo json_encode($states);
    

    and the output for state is {"state":{"0":"Alabama"}} rather than {"state":"Alabama"}

    What am I doing wrong?

    XML:

    <?xml version="1.0" ?>
    <states>
        <state id="AL">     
        <name>Alabama</name>
        </state>
        <state id="AK">
            <name>Alaska</name>
        </state>
    </states>
    

    Output:

    [{"state":{"0":"Alabama"}},{"state":{"0":"Alaska"}
    

    var dump:

    object(SimpleXMLElement)#1 (1) {
    ["state"]=>
    array(2) {
    [0]=>
    object(SimpleXMLElement)#3 (2) {
      ["@attributes"]=>
      array(1) {
        ["id"]=>
        string(2) "AL"
      }
      ["name"]=>
      string(7) "Alabama"
    }
    [1]=>
    object(SimpleXMLElement)#2 (2) {
      ["@attributes"]=>
      array(1) {
        ["id"]=>
        string(2) "AK"
      }
      ["name"]=>
      string(6) "Alaska"
    }
    }
    }
    
    • nikc.org
      nikc.org over 12 years
      Please include a snippet of the XML and the final array structure you have after parsing it. (A var_dump works fine.)
    • Bryan Hadlock
      Bryan Hadlock over 12 years
      added input, output and var_dump
    • Peter Krauss
      Peter Krauss over 7 years
      Some applications need "perfec XML-to-JSON map", that is jsonML, see solution here.
  • Bryan Hadlock
    Bryan Hadlock over 12 years
    looks like the attributes are arrays but not $state->name
  • ethree
    ethree over 10 years
    This will not work if you have multiple instances of the same tag in your XML, json_encode will end up only serializing the last instance of the tag.
  • Sabbir
    Sabbir almost 10 years
    the best I get. BTW how about large xml around 150MB. How much memory it'll take?
  • iXcoder
    iXcoder almost 10 years
    split the big file to small
  • Richard Kiefer
    Richard Kiefer over 9 years
    Does not compile and does not produce the described output if syntax-errors are corrected.
  • Jake Wilson
    Jake Wilson over 9 years
    This solution is not flawless. It completely discards XML attributes. So <person my-attribute='name'>John</person> is interpreted as <person>John</person>.
  • Jake Wilson
    Jake Wilson over 9 years
    What is $dom? Where did that come from?
  • useless
    useless about 9 years
    Jackobud, well then you are talking about a specific structure. for the general purposes what Antonio provided is just great.
  • txyoji
    txyoji almost 9 years
    $xml = simplexml_load_string($xml_string,'SimpleXMLElement',LIBXML_‌​NOCDATA); to flatten cdata elements.
  • Scott
    Scott over 8 years
    $dom = new DOMDocument(); is where it comes from
  • Lawrence Cooke
    Lawrence Cooke about 8 years
    Last line of code: $json = json_decode( json_encode( $sxml ) ) ); should be : $json = json_decode( json_encode( $sxml ) );
  • Volatil3
    Volatil3 about 8 years
    Can it parse large XML data?
  • Octavio Perez Gallegos
    Octavio Perez Gallegos almost 8 years
    It is a small and universal solution based on an array of data can be a JSON transformed json_decode ...lucky
  • Dan R
    Dan R almost 8 years
    In what way does this answer the original question? Your answer seems more complicated than the original question, and also doesn't seem to even mention JSON anywhere.
  • Peter Krauss
    Peter Krauss over 7 years
    This solution is better because not discards XML attributes. See also why this complex structure is better than simplified ones, at xml.com/lpt/a/1658 (see "Semi-Structured XML").... Ops, for CDATA, as @txyoji suggested to flatten CDATA elements $xml = simplexml_load_file("myfile.xml",'SimpleXMLElement',LIBXML_‌‌​​NOCDATA); .
  • Peter Krauss
    Peter Krauss over 7 years
    @AntonioMax and others, try <states> <state>Alabama</state> <p>John</p> <state>Alaska</state> </states>, it lost tag order, so it is a bug... The solution is to change representation-map, see stackoverflow.com/a/39889010/287948
  • TheStoryCoder
    TheStoryCoder over 7 years
    I have made an improved version of this which also works with namespaces. See answer further below (stackoverflow.com/a/40866796/2404541)
  • Alex
    Alex over 7 years
    @JakeWilson maybe it's the 2 years that have passed, and various version fixes, but on PHP 5.6.30, this method produces ALL of the data. Attributes are stored in the array under the @attributes key, so it works absolutely flawlessly, and beautifully. 3 short lines of code solve my problem beautifully.
  • Alex
    Alex over 7 years
    One does not use Regex to parse XML, unless it's a simple XML with trivial structure and very predictable data. I can't stress enough how bad this solution is. This BREAKS DATA. Not to mention that it's incredibly slow (you parse with regex, and then you re-parse again?) and doesn't handle self-closing tags.
  • TheStoryCoder
    TheStoryCoder over 7 years
    I don't think you really looked at the function. It doesn't use regex to do the actual parsing, only as a simple fix to deal with namespaces - which has been working for all my xml cases - and that it is working is the most important, rather than being "politically correct". You're welcome to improve it if you want, though!
  • TheStoryCoder
    TheStoryCoder over 7 years
    Very unusual xml structure that I doubt would have real life use cases.
  • Alex
    Alex about 7 years
    The fact that it has worked for you doesn't mean it's right. It's code like this that generates bugs that are immensely hard to diagnose, and generates exploits. I mean even looking superficially at XML specs on sites like this w3schools.com/xml/xml_elements.asp show a lot of reasons why this solution wouldn't work. Like I said, it fails to detect self-closing tags like <element/>, fails to address elements that start with, or contain underscores, which is allowed in XML. Fails to detect CDATA. And as I've said, it's SLOW. It's an O(n^2) complexity because of inner parsing.
  • Alex
    Alex about 7 years
    The thing is that dealing with namespaces wasn't even asked here, and there are PROPER ways to deal with namespaces. Namespaces exist as a helpful construction, NOT to be parsed like that and turned into an abomination that won't be processed by any reasonable parser. And all you needed to do for that is not to create the contender for the prize of "slowest algorithm of 2016", but to do a bit of searching, to come up with a myriad of actual solutions, like this one stackoverflow.com/questions/16412047/… And to call this an improvement? Wow.
  • jirislav
    jirislav over 6 years
    This doesn't work if you have multiple namespaces, you can choose only one, which will pass into the $json_string :'(
  • ryabenko-pro
    ryabenko-pro over 6 years
    I used this approach, but JSON is empty. XML is valid.
  • nanocv
    nanocv over 6 years
    @AlexanderMP Not flawless, sorry. 3v4l.org/S3jP8 This solution requires attributes to be only on parent to work well.
  • Klesun
    Klesun over 5 years
    Keep in mind that with this solution, when there may be multiple nodes with same name, one node will result in a key just pointing to an element, but multiple nodes will result in key pointing to array of elements: <list><item><a>123</a><a>456</a></item><item><a>123</a></ite‌​m></list> -> {"item":[{"a":["123","456"]},{"a":"123"}]}. A solution at php.net by ratfactor solves that issue by always storing elements in an array.
  • TheStoryCoder
    TheStoryCoder about 5 years
    @AlexanderMP I'm running 7.1.15 and it still doesn't include the attributes in <logentry revision="7"><paths><path action="M" text-mods="true" kind="file">module.php</path><path action="A" text-mods="true" kind="file">js/module.js</path></paths></logentry>. It includes the one in <logentry> but not in <path>!
  • Marc Pope
    Marc Pope over 4 years
    @txyoji This answer of stripping our CDATA was something I was looking for hours. Excellent answer.
  • lucifer63
    lucifer63 over 4 years
    Many thanks for a custom function! It makes tuning pretty easy. Btw, added an edited version of your function that parses XML in a JS way: every entry has its own object (entries aren't stored in a single array if they have equal tagnames), thus the order is preserved.
  • KingRider
    KingRider over 4 years
    Error Fatal error: Uncaught Error: Call to a member function getName() on bool .. i think a version php is fail :-( .. please help!
  • aaron
    aaron almost 4 years
    this actually works for multi-namespace cases, better than other solutions, why got a down vote...
  • G Chris DCosta
    G Chris DCosta about 3 years
    After trying tens of solutions this one is the only one that worked for me, thank you so much!
  • Coreus
    Coreus almost 3 years
    To everyone looking at this old answer: Please bear in mind the times this was written in, and perhaps consider more modern approaches.