Get DIV content from external Website

php html domdocument

96,008

Solution 1

This is what I always use:

$url = 'https://somedomain.com/somesite/';
$content = file_get_contents($url);
$first_step = explode( '<div id="thediv">' , $content );
$second_step = explode("</div>" , $first_step[1] );

echo $second_step[0];

Solution 2

This may be a little overkill, but you'll get the gist.

<?php 

$doc = new DOMDocument;

// We don't want to bother with white spaces
$doc->preserveWhiteSpace = false;

// Most HTML Developers are chimps and produce invalid markup...
$doc->strictErrorChecking = false;
$doc->recover = true;

$doc->loadHTMLFile('http://www.isitdownrightnow.com/check.php?domain=youtube.com');

$xpath = new DOMXPath($doc);

$query = "//div[@class='statusup']";

$entries = $xpath->query($query);
var_dump($entries->item(0)->textContent);

?>

Solution 3

I used the xpath method proposed by @mightyuhu and it worked great with his addition of the assignment. Depending on the web page you get the info from and the availability of an 'id' or 'class' which identifies the tag you wish to get, you will have to change the query you use. If the tag has an 'id' assigned to it, you can use this (the sample is for extracting the USD exchange rate):

$query = "//div[@id='USD']";

However, the site developers won't make it so easy for us, so there will be several more 'unnamed' tags to dig into, in my example:

<div id="USD" class="tab">
  <table cellspacing="0" cellpadding="0">
    <tbody>
     <tr>
        <td>Ask Rate</td>
        <td align="right">1.77400</td>
     </tr>
     <tr class="even">
        <td>Bid Rate</td>
        <td align="right">1.70370</td>
     </tr>
     <tr>
        <td>BNB Fixing</td>
        <td align="right">1.735740</td>
     </tr>
   </tbody>
  </table>
</div>

So I had to change the query to get the 'Ask Rate':

$doc->loadHTMLFile('http://www.fibank.bg/en');
$xpath = new DOMXPath($doc);
$query = "//div[@id='USD']/table/tbody/tr/td";

So, I used the query above, but changed the item to 1 instead of 0 to get the second column where the exchange rate is (the first column contains the text 'Ask Rate'):

$entries = $xpath->query($query);
$usdrate = $entries->item(1)->textContent;

Another method is to reference the value directly within the query, which when you don't have names or styles should be done with indexing the tags, which was something I received as knowledge from my Maxthon browser and its "Inspect element' feature combined with the "Copy XPath" right menu option (neat, yeah?):

"//*[@id="USD"]/table/tbody/tr[1]/td[2]"

Notice it also inserts an asterisk (*) after the //, which I have not digged into. In this case you should again get the value with item(0), since there will be no other values.

If you need, you can make any changes to the string you extracted, for example changing the number format to match your preference:

$usdrate = number_format($usdrate, 5, ',', ' ');

I hope someone will find this helpful, as I found the answers above, and will spare this someone time in searching for the correct query and syntax.

96,008

Kallewallex

Updated on March 24, 2020

Comments

Kallewallex over 4 years
I want to get a DIV from an external website with pure PHP.

External website: http://www.isitdownrightnow.com/youtube.com.html

Div text I want from isitdownrightnow (statusup div): <div class="statusup">The website is probably down just for you...</div>

I already tried file_get_contents with DOMDocument and str_get_html, but I could not get it to work.

For example this
```
$page = file_get_contents('http://css-tricks.com/forums/topic/jquery-selector-div-variable/');
    $doc = new DOMDocument();
    $doc->loadHTML($page);
    $divs = $doc->getElementsByTagName('div');
    foreach($divs as $div) {
        // Loop through the DIVs looking for one withan id of "content"
        // Then echo out its contents (pardon the pun)
        if ($div->getAttribute('class') === 'bbp-template-notice') {
             echo $div->nodeValue;
        }
    }
```
It will just display an error in the console:

Failed to load resource: the server responded with a status of 500 (Internal Server Error)
- markasoftware over 10 years
  
  well it has to load...so im guessing it is dynamically generated with JS...which makes this very difficult
- Mike over 10 years
  
  If you tried file_get_contents et al, please show your code and explain what didn't work.
- PeeHaa over 10 years
  
  @Markasoftware why would that be very difficult? requestable.pieterhordijk.com/cBg2b
- PeeHaa over 10 years
  
  @OP you really need to show us what the specific problem is you are having or you cannot be helped. "I could not get it to work." is not a valid problem description.
- Darragh Enright over 10 years
  
  You could curl the page, save its contents, load the content into a DOMDocument object and traverse the tree with DOMXPath.
- markasoftware over 10 years
  
  @PeeHaa that is for a different url. It he did that, it would work, but the exact url in the question wouldn't
- PeeHaa over 10 years
  
  OP doesn't say he wants to use that URI. He just wants the result.
- Kallewallex over 10 years
  
  Thank you guys for answering. Actually I just choose this site as an example, since I myself don't have anything on the web. It could also be any other site, even a simple html file. @PeeHaa I deleted it because I got really messy, mostly if I would echo my result it was just blank.
- PeeHaa over 10 years
  
  You still need to tell us your problem... Related: sscce.org
- Kallewallex over 10 years
  
  Yes, just give me a minute I'll reproduce it and update the post
- PeeHaa over 10 years
  
  Check the error log to find out why it is throwing a 500 error.
- worenga over 10 years
  
  The element you are trying to fetch is actually reloaded by an ajax call (isitdownrightnow.com/check.php?domain=youtube.com) so this is kinda pointless on this url.
- Kallewallex over 10 years
  
  @mightyuhu what about the second one I added (css-tricks.com) ...it can be any url. I am not working on a project or something like that. Just trying to learn a bit php
- worenga over 10 years
  
  Works for me (phpfiddle.org/main/code/8i4-0vb), check your server configuration.
- worenga over 10 years
  
  link update phpfiddle.org/main/code/278-fki If you get an error 500 while running your script, your display_error configuration should be adjusted, see php.net/manual/en/errorfunc.configuration.php
Kallewallex over 10 years

This actually works. Awesome. How do I get it without the "string(XX)" and just get the text in a var?
worenga over 10 years

change var_dump to an assignment like $var = $entries->item(0)->textContent
Kallewallex over 10 years

Thank you very much. That did it. I played around with it..... but I really have trouble using it on other websites, sometimes it works sometimes it does not. For example I am trying to get a div <h2 id="place-one" class="success">Yes.</h2> But using "//h2[@class='success']"; did not work.
worenga over 10 years

hard to say without any further details about the specific url.
ThW over 10 years

$var = $xpath->evaluate('string(//div[@class="startup"])'); would return the text content directly as string.
Kallewallex over 10 years

It does work for me on some sites. However on the site that I am trying to get it does not work... any Idea?
FlyingLemon over 10 years

I can't tell without the domain. But it is possible that the content you are trying to get is not generated when using this instead of visiting the domain. You can experiment by using a HTTP client/debugger. I am using Paw http. Just try a request and change the header informations. You can then see the output and check if your divs content gets displayed.
Kallewallex over 10 years

Finally. Okay. I tried it. It only displays the div if I modify the header. Thanks a lot.
user2718671 almost 10 years

It works just fine but I get a lot of warnings when using it: " htmlParseEntityRef: expecting ';'", "ID ... already defined in ...", "htmlParseEntityRef: no name" and "Unexpected end tag" - is there a workaround for this without disabling the error messages?
worenga almost 10 years

see stackoverflow.com/questions/1148928/…
Phil Sturgeon over 9 years

There are so many better ways to do this than string manipulation. If they add a new class to that HTML, or make any sort of minor tweak then you're screwed. Try goutte github.com/FriendsOfPHP/Goutte
Hiren Kubavat about 9 years

its ok but what about child content if they have multiple div and it has also multiple closign div (Code is correct but only for single div)
Sjon almost 9 years

Why in the world do you use fopen/fwrite/require_once? Also; you are duplicating the accepted answer..?
Maximillian Laumeister almost 9 years

Thank you for posting an answer to this question! Code-only answers are discouraged on Stack Overflow, because it can be difficult for the original poster (or future readers) to understand the logic behind them. Please, edit your question and include an explanation of your code so that others can benefit from your answer. Thanks!
Mr. Bhosale about 7 years

@worenga how can fetch all item(0) to item([last]) values here ?