Simple HTML DOM getting all attributes from a tag
Solution 1
To grab all those attributes, you should before investigate the parsed element, like this:
foreach($html->find('div[class=bar] a') as $a){
var_dump($a->attr);
}
...and see if those attributes exist. They don't seem to be valid HTML, so maybe the parser discards them.
If they exist, you can read them like this:
foreach($html->find('div[class=bar] a') as $a){
$article = array($a->href, $a->innertext);
if (isset($a->attr['data1'])) {
$article['data1'] = $a->attr['data1'];
}
if (isset($a->attr['data2'])) {
$article['data2'] = $a->attr['data2'];
}
//...
$articles[] = $article;
}
To get both classes you can use a multiple selector, separated by a comma:
foreach($html->find('div[class=bar] a, div[class=bar2] a') as $a){
...
Solution 2
I know this question is old, but the OP asked how they could get all the attributes in one statement. I just did this for a project I'm working on.
You can get all the attributes for an element with the getAllAttributes()
method. The results are automatically stored in an array property called attr
.
In the example below I am grabbing all links but you can use this with whatever you want. NOTE: This also works with data-
attributes. So if there is an attribute called data-url
it will be accessible with $e->attr['data-url']
after you run the getAllAttributes
method.
In your case the attributes your looking for will be $e->attr['data1']
and $e->attr['data2']
. Hope this helps someone if not the OP.
Get all Attributes
$html = file_get_html('somefile.html');
foreach ($html->find('a') as $e) { //used a tag here, but use whatever you want
$e->getAllAttributes();
//testing that it worked
print_r($e->attr);
}
Solution 3
$data1 = $html->find('.bar > a', 0)->attr['data1'];
$data2 = $html->find('.bar > a', 0)->attr['data2'];
Solution 4
Check this code
<?php
$html = file_get_html('somefile.html');
foreach ($html->find('a') as $e) {
$filter = $e->getAttribute('data-filter-string');
}
?>
TheEditor
Just a systems/network guy, dreaming of being a programmer.
Updated on August 02, 2022Comments
-
TheEditor almost 2 years
Sort of a two part question but maybe one answers the other. I'm trying to get a piece of information out of an
<div id="foo"> <div class="bar"><a data1="xxxx" data2="xxxx" href="http://foo.bar">Inner text"</a> <div class="bar2"><a data3="xxxx" data4="xxxx" href="http://foo.bar">more text"</a>
Here is what I'm using now.
$articles = array(); $html=file_get_html('http://foo.bar'); foreach($html->find('div[class=bar] a') as $a){ $articles[] = array($a->href,$a->innertext); }
This works perfectly to grab the href and the inner text from the first div class. I tried adding a $a->data1 to the foreach but that didn't work.
How do I grab those inner data tags at the same time I grab the href and innertext.
Also is there a good way to get both classes with one statement? I assume I could build the find off of the id and grab all the div information.
Thanks
-
Fabian Schmengler over 11 yearsAddition: if the classes really both start with the same prefix ("bar" in your example), this might work too for you:
div[class^=bar]
(meaning: class begins with "bar") -
TheEditor over 11 yearsThanks much. Not to be dense though..How do I assign that. What i mean is once I do the find that gets multiple div's what do I do with that? Do I still do assign that " as $a"? Sorry to be dense here.
-
TheEditor over 11 yearsThanks ermanbob. The multiple selector worked great. I still can't seem to pull that custom <a> attribute, which is actually data-content-id. I looked at stackoverflow.com/questions/11830933/… but I'm not sure what to do with that last example to actually display the attributes. I also was looking at stackoverflow.com/questions/11773940/… but using that returns an error. For both I'm feeding it with my $html=file_get_html('foo.bar'); line.
-
ermannob over 11 yearsThe first link you provided is very useful. I edited my answer again, to show you how to read those attributes.
-
TheEditor over 11 yearsErmannob thank you so much! I didn't realize I could use the $a->attr!
-
TheEditor over 11 yearsI wanted to update this with what I hope is an easy follow up questions. In the example above, foreach($html->find('div[class=bar] a, div[class=bar2] a') as $a){, I'm getting the data I want. I'm writing it to an array with $article = array($a->href, $a->innertext); ( I'm scraping Metafilter here just fyi. ) For the first selector there is only ever 1 match, but the second there can me multiple. Since I'm writing it to a single array its not really usable because I cant associate the second selector items with the first. Multidimensional array? I've succeeded in just confusing myself. THanks
-
Stephen Ostermiller almost 7 yearsA code only answer is not high quality. Please edit your answer to explain your code. Show how and why it answers the question. Add links to any relevant documentation.
-
Nico Haase almost 5 yearsPlease add some explanation to your code such that others can learn from it
-
PJunior about 4 yearsHooray! The 'attr' was returning me a PHP fatal error. With 'getAttribute' I got it working.
-
tony gil about 2 yearsclean solution, works as well