Simple HTML DOM getting all attributes from a tag

43,780

Solution 1

To grab all those attributes, you should before investigate the parsed element, like this:

foreach($html->find('div[class=bar] a') as $a){
  var_dump($a->attr);
}

...and see if those attributes exist. They don't seem to be valid HTML, so maybe the parser discards them.

If they exist, you can read them like this:

foreach($html->find('div[class=bar] a') as $a){
  $article = array($a->href, $a->innertext);
  if (isset($a->attr['data1'])) {
    $article['data1'] = $a->attr['data1'];
  }
  if (isset($a->attr['data2'])) {
    $article['data2'] = $a->attr['data2'];
  }
  //...
  $articles[] = $article;
}

To get both classes you can use a multiple selector, separated by a comma:

foreach($html->find('div[class=bar] a, div[class=bar2] a') as $a){
...

Solution 2

I know this question is old, but the OP asked how they could get all the attributes in one statement. I just did this for a project I'm working on.

You can get all the attributes for an element with the getAllAttributes() method. The results are automatically stored in an array property called attr.

In the example below I am grabbing all links but you can use this with whatever you want. NOTE: This also works with data- attributes. So if there is an attribute called data-url it will be accessible with $e->attr['data-url'] after you run the getAllAttributes method.

In your case the attributes your looking for will be $e->attr['data1'] and $e->attr['data2']. Hope this helps someone if not the OP.

Get all Attributes

$html = file_get_html('somefile.html');
foreach ($html->find('a') as $e) {   //used a tag here, but use whatever you want
    $e->getAllAttributes();

    //testing that it worked
    print_r($e->attr);
}

Solution 3

$data1 = $html->find('.bar > a', 0)->attr['data1'];
$data2 = $html->find('.bar > a', 0)->attr['data2'];

Solution 4

Check this code

<?php
$html = file_get_html('somefile.html');
foreach ($html->find('a') as $e) { 
$filter = $e->getAttribute('data-filter-string');
}
?>
Share:
43,780
TheEditor
Author by

TheEditor

Just a systems/network guy, dreaming of being a programmer.

Updated on August 02, 2022

Comments

  • TheEditor
    TheEditor almost 2 years

    Sort of a two part question but maybe one answers the other. I'm trying to get a piece of information out of an

    <div id="foo">
    <div class="bar"><a data1="xxxx" data2="xxxx" href="http://foo.bar">Inner text"</a>
    <div class="bar2"><a data3="xxxx" data4="xxxx" href="http://foo.bar">more text"</a>
    

    Here is what I'm using now.

    $articles = array();
    $html=file_get_html('http://foo.bar');
    foreach($html->find('div[class=bar] a') as $a){
        $articles[] = array($a->href,$a->innertext);
    }
    

    This works perfectly to grab the href and the inner text from the first div class. I tried adding a $a->data1 to the foreach but that didn't work.

    How do I grab those inner data tags at the same time I grab the href and innertext.

    Also is there a good way to get both classes with one statement? I assume I could build the find off of the id and grab all the div information.

    Thanks

  • Fabian Schmengler
    Fabian Schmengler over 11 years
    Addition: if the classes really both start with the same prefix ("bar" in your example), this might work too for you: div[class^=bar] (meaning: class begins with "bar")
  • TheEditor
    TheEditor over 11 years
    Thanks much. Not to be dense though..How do I assign that. What i mean is once I do the find that gets multiple div's what do I do with that? Do I still do assign that " as $a"? Sorry to be dense here.
  • TheEditor
    TheEditor over 11 years
    Thanks ermanbob. The multiple selector worked great. I still can't seem to pull that custom <a> attribute, which is actually data-content-id. I looked at stackoverflow.com/questions/11830933/… but I'm not sure what to do with that last example to actually display the attributes. I also was looking at stackoverflow.com/questions/11773940/… but using that returns an error. For both I'm feeding it with my $html=file_get_html('foo.bar'); line.
  • ermannob
    ermannob over 11 years
    The first link you provided is very useful. I edited my answer again, to show you how to read those attributes.
  • TheEditor
    TheEditor over 11 years
    Ermannob thank you so much! I didn't realize I could use the $a->attr!
  • TheEditor
    TheEditor over 11 years
    I wanted to update this with what I hope is an easy follow up questions. In the example above, foreach($html->find('div[class=bar] a, div[class=bar2] a') as $a){, I'm getting the data I want. I'm writing it to an array with $article = array($a->href, $a->innertext); ( I'm scraping Metafilter here just fyi. ) For the first selector there is only ever 1 match, but the second there can me multiple. Since I'm writing it to a single array its not really usable because I cant associate the second selector items with the first. Multidimensional array? I've succeeded in just confusing myself. THanks
  • Stephen Ostermiller
    Stephen Ostermiller almost 7 years
    A code only answer is not high quality. Please edit your answer to explain your code. Show how and why it answers the question. Add links to any relevant documentation.
  • Nico Haase
    Nico Haase almost 5 years
    Please add some explanation to your code such that others can learn from it
  • PJunior
    PJunior about 4 years
    Hooray! The 'attr' was returning me a PHP fatal error. With 'getAttribute' I got it working.
  • tony gil
    tony gil about 2 years
    clean solution, works as well