XPath to get one level of childnodes

12,314

Solution 1

If you use

/div/*

then you get a list of all direct children in this element but these children contain their children. I think that you can't remove children of child

There is used default axis, it is called child::. This axis returns only elements in 1 level under the current node

* matches all elements but neither attributes nor text()

You have to specify path to your node and be careful about //node because it means descendant::node and it returns all nodes of this name in this tree

Solution 2

Your question is a bit under-specified, so there are several ways to interpret it. If you want all direct child elements of the current element (with all of their sub-elements), then use

*/*

For your example, this gives you

<span>
    <cite>
    </cite>
</span>

and

<span>
    <cite>
    </cite>
</span>

If you want all child nodes, then use node() instead of *:

*/node()

For your example, this gives you both sub-elements as above, alongside with newline/indentation text() nodes.

If, however, you want to have only the child nodes and not their children as well (i.e. only the span elements, but without their child elements), you must use two expressions:

  1. select the direct child elements via */*
  2. process the those child elements and select only the text nodes and not the grandchildren elements via text()

My PHP is a bit rusty, but it should work a bit like this:

$doc = new DOMDocument;
// set up $doc
$xpath = new DOMXPath($doc);

// perform step #1
$childElements = $xpath->query('*/*');

$directChildren = array();
foreach ($childElements as $child) {
  // perform step #2
  $textChildren = $xpath->query('text()', $child);
  foreach ($textChildren as $text) {
    $directChildren[] = $text;
  }
}
// now, $directChildren contains all text nodes
Share:
12,314
Greg
Author by

Greg

Updated on June 05, 2022

Comments

  • Greg
    Greg almost 2 years

    Using DOMXPath::query is it possible to get just one level deep of childNodes?

    For example if I had a document like:

    <div>
        <span>
            <cite>
            </cite>
        </span>
        <span>
            <cite>
            </cite>
        </span>
    </div>
    

    I would want the NodeList to contain just the spans and not the cites.

    Should also mention that it won't always be the same elements (divs, spans, etc). I would need it to work with any type of element.

    This is what I tried and it didn't seem to work:

    //*[not(ancestor::div)]