DOMXpath - Get href attribute and text value of an a element

37,069

Solution 1

Fetch

//td[@class='name']/a

and then pluck the text with nodeValue and the attribute with getAttribute('href').

Apart from that, you can combine Xpath queries with the Union Operator | so you can use

//td[@class='name']/a/@href|//td[@class='name']

as well.

Solution 2

To reduce the code to a single loop, try:

$anchors = $domXpath->query("//td[@class='name']/a");
foreach($anchors as $a)
{ 
    print $a->nodeValue." - ".$a->getAttribute("href")."<br/>";
}

As per above :) Too slow ..

Solution 3

Simplest way, evaluate is for this task!

The simplest way to obtain a value is by evaluate() method:

$xp = new DOMXPath($dom);
$v = $xp->evaluate("string(/etc[1]/@stringValue)");

Note: important to limit XPath returns to 1 item (the first a in this case), and cast the value with string() or round(), etc.


So, in a set of multiple items, using your foreach code,

 $names = $domXpath->query("//td[@class='name']/");
 foreach($names as $contextNode) {
    $text = $domXpath->evaluate("string(./a[1])",$contextNode);
    $href = $domXpath->evaluate("string(./a[1]/@href)",$contextNode);
 }

PS: this example is only for evaluate's illustration... When the information already exists at the node, use what offers best performance, as methods getAttribute(), saveXML(), etc. and properties as $nodeValue, $textContent, etc. supplied by DOMNode.
See @Gordon's answer for this particular problem.
The XPath subquery (at context) is good for complex cases — or symplify your code, avoiding to check hasChildNodes() + loop for $childNodes, etc. with no significative gain in performance.

Share:
37,069
Marko Jovanović
Author by

Marko Jovanović

Not much to tell really. Interests: Web development, C#, PHP, scala, programming in general, skydiving.

Updated on July 11, 2020

Comments

  • Marko Jovanović
    Marko Jovanović almost 4 years

    So I have a HTML string like this:

    <td class="name">
       <a href="/blah/somename23123">Some Name</a>
    </td>
    <td class="name">
       <a href="/blah/somename28787">Some Name2</a>
    </td>
    

    Using XPath I'm able to get value of href attribute using this Xpath query:

     $domXpath = new \DOMXPath($this->domPage);
     $hrefs = $domXpath->query("//td[@class='name']/a/@href");
     foreach($hrefs as $href) {...}
    

    And It's even easier to get a text value, like this:

     // Xpath auto. strips any html tags so we are 
     // left with clean text value of a element
     $domXpath = new \DOMXPath($this->domPage);
     $names = $domXpath->query("//td[@class='name']/");
     foreach($names as $name) {...}
    

    Now I'm curious to know, how can I combine those two queries to get both values with only one query (If it's something like that even posible?).

  • Marko Jovanović
    Marko Jovanović almost 13 years
    Thank you for solution, and for side note about union operator (It will come in handy).
  • Marko Jovanović
    Marko Jovanović almost 13 years
    Thanks for the solution. I gave you vote up, but Gordon gets vote up and accepted answer (after all he was the first to post the answer :)).
  • Peter Krauss
    Peter Krauss over 3 years
    evaluate here is not a "dangerous thing" like Javascript's eval. It is only an alternate method name for "subquery" (in XPath or DOM context). As it's definition say, "Evaluates the given XPath expression and returns a typed result if possible".