Best way to process large XML in PHP

51,971

Solution 1

For a large file, you'll want to use a SAX parser rather than a DOM parser.

With a DOM parser it will read in the whole file and load it into an object tree in memory. With a SAX parser, it will read the file sequentially and call your user-defined callback functions to handle the data (start tags, end tags, CDATA, etc.)

With a SAX parser you'll need to maintain state yourself (e.g. what tag you are currently in) which makes it a bit more complicated, but for a large file it will be much more efficient memory wise.

Solution 2

My take on it:

https://github.com/prewk/XmlStreamer

A simple class that will extract all children to the XML root element while streaming the file. Tested on 108 MB XML file from pubmed.com.

class SimpleXmlStreamer extends XmlStreamer {
    public function processNode($xmlString, $elementName, $nodeIndex) {
        $xml = simplexml_load_string($xmlString);

        // Do something with your SimpleXML object

        return true;
    }
}

$streamer = new SimpleXmlStreamer("myLargeXmlFile.xml");
$streamer->parse();

Solution 3

When using a DOMDocument with large XML files, don't forget to pass the LIBXML_PARSEHUGE flag in the options of the load() method. (Same applies for the other load methods of the DOMDocument object)

    $checkDom = new \DOMDocument('1.0', 'UTF-8');
    $checkDom->load($filePath, LIBXML_PARSEHUGE);

(Works with a 120mo XML file)

Solution 4

A SAX Parser, as Eric Petroelje recommends, would be better for large XML files. A DOM parser loads in the entire XML file and allows you to run xpath queries-- a SAX (Simple API for XML) parser will simply read one line at a time and give you hook points for processing.

Solution 5

It really depends on what you want to do with the data? Do you need it all in memory to effectively work with it?

6.5 MB is not that big, in terms of today's computers. You could, for example, ini_set('memory_limit', '128M');

However, if your data can be streamed, you may want to look at using a SAX parser. It really depends on your usage needs.

Share:
51,971
Petruza
Author by

Petruza

General software engineer, golang advocate, also typescript, C, C++, GDScript dev. Interested in emulation, video games, image processing, machine learning, computer vision, natural language processing, web scraping.

Updated on July 27, 2022

Comments

  • Petruza
    Petruza almost 2 years

    I have to parse large XML files in php, one of them is 6.5 MB and they could be even bigger. The SimpleXML extension as I've read, loads the entire file into an object, which may not be very efficient. In your experience, what would be the best way?