Indentation with DOMDocument in PHP

22,641

Solution 1

DomDocument will do the trick, I personally spent couple of hours Googling and trying to figure this out and I noted that if you use

$xmlDoc = new DOMDocument ();
$xmlDoc->loadXML ( $xml );
$xmlDoc->preserveWhiteSpace = false;
$xmlDoc->formatOutput = true;
$xmlDoc->save($xml_file);

In that order, It just doesn't work but, if you use the same code but in this order:

$xmlDoc = new DOMDocument ();
$xmlDoc->preserveWhiteSpace = false;
$xmlDoc->formatOutput = true;
$xmlDoc->loadXML ( $xml );
$xmlDoc->save($archivoxml);

Works like a charm, hope this helps

Solution 2

After some help from John and playing around with this on my own, it seems that even DOMDocument's inherent support for formatting didn't meet my needs. So, I decided to write my own indentation function.

This is a pretty crude function that I just threw together quickly, so if anyone has any optimization tips or anything to say about it in general, I'd be glad to hear it!

function indent($text)
{
    // Create new lines where necessary
    $find = array('>', '</', "\n\n");
    $replace = array(">\n", "\n</", "\n");
    $text = str_replace($find, $replace, $text);
    $text = trim($text); // for the \n that was added after the final tag

    $text_array = explode("\n", $text);
    $open_tags = 0;
    foreach ($text_array AS $key => $line)
    {
        if (($key == 0) || ($key == 1)) // The first line shouldn't affect the indentation
            $tabs = '';
        else
        {
            for ($i = 1; $i <= $open_tags; $i++)
                $tabs .= "\t";
        }

        if ($key != 0)
        {
            if ((strpos($line, '</') === false) && (strpos($line, '>') !== false))
                $open_tags++;
            else if ($open_tags > 0)
                $open_tags--;
        }

        $new_array[] = $tabs . $line;

        unset($tabs);
    }
    $indented_text = implode("\n", $new_array);

    return $indented_text;
}

Solution 3

I have tried running the code below setting formatOutput and preserveWhiteSpace in different ways, and the only member that has any effect on the output is formatOutput. Can you run the script below and see if it works?

<?php
    echo "<pre>";
    $foo = new DOMDocument();
    //$foo->preserveWhiteSpace = false;
    $foo->formatOutput = true;
    $root = $foo->createElement("root");
    $root->setAttribute("attr", "that");
    $bar = $foo->createElement("bar", "some text in bar");
    $baz = $foo->createElement("baz", "some text in baz");
    $foo->appendChild($root);
    $root->appendChild($bar);
    $root->appendChild($baz);
    echo htmlspecialchars($foo->saveXML());
    echo "</pre>";
?>

Solution 4

Most answers in this topic deal with xml text flow. Here is another approach using the dom functionalities to perform the indentation job. The loadXML() dom method imports indentation characters present in the xml source as text nodes. The idea is to remove such text nodes from the dom and then recreate correctly formatted ones (see comments in the code below for more details).

The xmlIndent() function is implemented as a method of the indentDomDocument class, which is inherited from domDocument. Below is a complete example of how to use it :

$dom = new indentDomDocument("1.0");
$xml = file_get_contents("books.xml");

$dom->loadXML($xml);
$dom->xmlIndent();
echo $dom->saveXML();

class indentDomDocument extends domDocument {
    public function xmlIndent() {
        // Retrieve all text nodes using XPath
        $x = new DOMXPath($this);
        $nodeList = $x->query("//text()");
        foreach($nodeList as $node) {
            // 1. "Trim" each text node by removing its leading and trailing spaces and newlines.
            $node->nodeValue = preg_replace("/^[\s\r\n]+/", "", $node->nodeValue);
            $node->nodeValue = preg_replace("/[\s\r\n]+$/", "", $node->nodeValue);
            // 2. Resulting text node may have become "empty" (zero length nodeValue) after trim. If so, remove it from the dom.
            if(strlen($node->nodeValue) == 0) $node->parentNode->removeChild($node);
        }
        // 3. Starting from root (documentElement), recursively indent each node. 
        $this->xmlIndentRecursive($this->documentElement, 0);
    } // end function xmlIndent

    private function xmlIndentRecursive($currentNode, $depth) {
        $indentCurrent = true;
        if(($currentNode->nodeType == XML_TEXT_NODE) && ($currentNode->parentNode->childNodes->length == 1)) {
            // A text node being the unique child of its parent will not be indented.
            // In this special case, we must tell the parent node not to indent its closing tag.
            $indentCurrent = false;
        }
        if($indentCurrent && $depth > 0) {
            // Indenting a node consists of inserting before it a new text node
            // containing a newline followed by a number of tabs corresponding
            // to the node depth.
            $textNode = $this->createTextNode("\n" . str_repeat("\t", $depth));
            $currentNode->parentNode->insertBefore($textNode, $currentNode);
        }
        if($currentNode->childNodes) {
            $indentClosingTag = false;
            foreach($currentNode->childNodes as $childNode) $indentClosingTag = $this->xmlIndentRecursive($childNode, $depth+1);
            if($indentClosingTag) {
                // If children have been indented, then the closing tag
                // of the current node must also be indented.
                $textNode = $this->createTextNode("\n" . str_repeat("\t", $depth));
                $currentNode->appendChild($textNode);
            }
        }
        return $indentCurrent;
    } // end function xmlIndentRecursive

} // end class indentDomDocument

Solution 5

Which method do you call when printing the xml?

I use this:

$doc = new DOMDocument('1.0', 'utf-8');
$root = $doc->createElement('root');
$doc->appendChild($root);

(...)

$doc->formatOutput = true;
$doc->saveXML($root);

It works perfectly but prints out only the element, so you must print the <?xml ... ?> part manually..

Share:
22,641
Josh Leitzel
Author by

Josh Leitzel

Ruby ± Rails, JavaScript | CoffeeScript | Angular I can be contacted at [email protected]. Twitter GitHub Blog

Updated on July 23, 2020

Comments

  • Josh Leitzel
    Josh Leitzel almost 4 years

    I'm using DOMDocument to generate a new XML file and I would like for the output of the file to be indented nicely so that it's easy to follow for a human reader.

    For example, when DOMDocument outputs this data:

    <?xml version="1.0"?>
    <this attr="that"><foo>lkjalksjdlakjdlkasd</foo><foo>lkjlkasjlkajklajslk</foo></this>
    

    I want the XML file to be:

    <?xml version="1.0"?>
    <this attr="that">
        <foo>lkjalksjdlakjdlkasd</foo>
        <foo>lkjlkasjlkajklajslk</foo>
    </this>
    

    I've been searching around looking for answers, and everything that I've found seems to say to try to control the white space this way:

    $foo = new DOMDocument();
    $foo->preserveWhiteSpace = false;
    $foo->formatOutput = true;
    

    But this does not seem to do anything. Perhaps this only works when reading XML? Keep in mind I'm trying to write new documents.

    Is there anything built-in to DOMDocument to do this? Or a function that can accomplish this easily?