Get DIV content from external Website

96,008

Solution 1

This is what I always use:

$url = 'https://somedomain.com/somesite/';
$content = file_get_contents($url);
$first_step = explode( '<div id="thediv">' , $content );
$second_step = explode("</div>" , $first_step[1] );

echo $second_step[0];

Solution 2

This may be a little overkill, but you'll get the gist.

<?php 

$doc = new DOMDocument;

// We don't want to bother with white spaces
$doc->preserveWhiteSpace = false;

// Most HTML Developers are chimps and produce invalid markup...
$doc->strictErrorChecking = false;
$doc->recover = true;

$doc->loadHTMLFile('http://www.isitdownrightnow.com/check.php?domain=youtube.com');

$xpath = new DOMXPath($doc);

$query = "//div[@class='statusup']";

$entries = $xpath->query($query);
var_dump($entries->item(0)->textContent);

?>

Solution 3

I used the xpath method proposed by @mightyuhu and it worked great with his addition of the assignment. Depending on the web page you get the info from and the availability of an 'id' or 'class' which identifies the tag you wish to get, you will have to change the query you use. If the tag has an 'id' assigned to it, you can use this (the sample is for extracting the USD exchange rate):

$query = "//div[@id='USD']";

However, the site developers won't make it so easy for us, so there will be several more 'unnamed' tags to dig into, in my example:

<div id="USD" class="tab">
  <table cellspacing="0" cellpadding="0">
    <tbody>
     <tr>
        <td>Ask Rate</td>
        <td align="right">1.77400</td>
     </tr>
     <tr class="even">
        <td>Bid Rate</td>
        <td align="right">1.70370</td>
     </tr>
     <tr>
        <td>BNB Fixing</td>
        <td align="right">1.735740</td>
     </tr>
   </tbody>
  </table>
</div>

So I had to change the query to get the 'Ask Rate':

$doc->loadHTMLFile('http://www.fibank.bg/en');
$xpath = new DOMXPath($doc);
$query = "//div[@id='USD']/table/tbody/tr/td";

So, I used the query above, but changed the item to 1 instead of 0 to get the second column where the exchange rate is (the first column contains the text 'Ask Rate'):

$entries = $xpath->query($query);
$usdrate = $entries->item(1)->textContent;

Another method is to reference the value directly within the query, which when you don't have names or styles should be done with indexing the tags, which was something I received as knowledge from my Maxthon browser and its "Inspect element' feature combined with the "Copy XPath" right menu option (neat, yeah?):

"//*[@id="USD"]/table/tbody/tr[1]/td[2]"

Notice it also inserts an asterisk (*) after the //, which I have not digged into. In this case you should again get the value with item(0), since there will be no other values.

If you need, you can make any changes to the string you extracted, for example changing the number format to match your preference:

$usdrate = number_format($usdrate, 5, ',', ' ');

I hope someone will find this helpful, as I found the answers above, and will spare this someone time in searching for the correct query and syntax.

Share:
96,008

Related videos on Youtube

Kallewallex
Author by

Kallewallex

Updated on March 24, 2020

Comments

  • Kallewallex
    Kallewallex over 4 years

    I want to get a DIV from an external website with pure PHP.

    External website: http://www.isitdownrightnow.com/youtube.com.html

    Div text I want from isitdownrightnow (statusup div): <div class="statusup">The website is probably down just for you...</div>

    I already tried file_get_contents with DOMDocument and str_get_html, but I could not get it to work.

    For example this

    $page = file_get_contents('http://css-tricks.com/forums/topic/jquery-selector-div-variable/');
        $doc = new DOMDocument();
        $doc->loadHTML($page);
        $divs = $doc->getElementsByTagName('div');
        foreach($divs as $div) {
            // Loop through the DIVs looking for one withan id of "content"
            // Then echo out its contents (pardon the pun)
            if ($div->getAttribute('class') === 'bbp-template-notice') {
                 echo $div->nodeValue;
            }
        }
    

    It will just display an error in the console:

    Failed to load resource: the server responded with a status of 500 (Internal Server Error)

    • markasoftware
      markasoftware over 10 years
      well it has to load...so im guessing it is dynamically generated with JS...which makes this very difficult
    • Mike
      Mike over 10 years
      If you tried file_get_contents et al, please show your code and explain what didn't work.
    • PeeHaa
      PeeHaa over 10 years
      @Markasoftware why would that be very difficult? requestable.pieterhordijk.com/cBg2b
    • PeeHaa
      PeeHaa over 10 years
      @OP you really need to show us what the specific problem is you are having or you cannot be helped. "I could not get it to work." is not a valid problem description.
    • Darragh Enright
      Darragh Enright over 10 years
      You could curl the page, save its contents, load the content into a DOMDocument object and traverse the tree with DOMXPath.
    • markasoftware
      markasoftware over 10 years
      @PeeHaa that is for a different url. It he did that, it would work, but the exact url in the question wouldn't
    • PeeHaa
      PeeHaa over 10 years
      OP doesn't say he wants to use that URI. He just wants the result.
    • Kallewallex
      Kallewallex over 10 years
      Thank you guys for answering. Actually I just choose this site as an example, since I myself don't have anything on the web. It could also be any other site, even a simple html file. @PeeHaa I deleted it because I got really messy, mostly if I would echo my result it was just blank.
    • PeeHaa
      PeeHaa over 10 years
      You still need to tell us your problem... Related: sscce.org
    • Kallewallex
      Kallewallex over 10 years
      Yes, just give me a minute I'll reproduce it and update the post
    • PeeHaa
      PeeHaa over 10 years
      Check the error log to find out why it is throwing a 500 error.
    • worenga
      worenga over 10 years
      The element you are trying to fetch is actually reloaded by an ajax call (isitdownrightnow.com/check.php?domain=youtube.com) so this is kinda pointless on this url.
    • Kallewallex
      Kallewallex over 10 years
      @mightyuhu what about the second one I added (css-tricks.com) ...it can be any url. I am not working on a project or something like that. Just trying to learn a bit php
    • worenga
      worenga over 10 years
      Works for me (phpfiddle.org/main/code/8i4-0vb), check your server configuration.
    • worenga
      worenga over 10 years
      link update phpfiddle.org/main/code/278-fki If you get an error 500 while running your script, your display_error configuration should be adjusted, see php.net/manual/en/errorfunc.configuration.php
  • Kallewallex
    Kallewallex over 10 years
    This actually works. Awesome. How do I get it without the "string(XX)" and just get the text in a var?
  • worenga
    worenga over 10 years
    change var_dump to an assignment like $var = $entries->item(0)->textContent
  • Kallewallex
    Kallewallex over 10 years
    Thank you very much. That did it. I played around with it..... but I really have trouble using it on other websites, sometimes it works sometimes it does not. For example I am trying to get a div <h2 id="place-one" class="success">Yes.</h2> But using "//h2[@class='success']"; did not work.
  • worenga
    worenga over 10 years
    hard to say without any further details about the specific url.
  • ThW
    ThW over 10 years
    $var = $xpath->evaluate('string(//div[@class="startup"])'); would return the text content directly as string.
  • Kallewallex
    Kallewallex over 10 years
    It does work for me on some sites. However on the site that I am trying to get it does not work... any Idea?
  • FlyingLemon
    FlyingLemon over 10 years
    I can't tell without the domain. But it is possible that the content you are trying to get is not generated when using this instead of visiting the domain. You can experiment by using a HTTP client/debugger. I am using Paw http. Just try a request and change the header informations. You can then see the output and check if your divs content gets displayed.
  • Kallewallex
    Kallewallex over 10 years
    Finally. Okay. I tried it. It only displays the div if I modify the header. Thanks a lot.
  • user2718671
    user2718671 almost 10 years
    It works just fine but I get a lot of warnings when using it: " htmlParseEntityRef: expecting ';'", "ID ... already defined in ...", "htmlParseEntityRef: no name" and "Unexpected end tag" - is there a workaround for this without disabling the error messages?
  • worenga
    worenga almost 10 years
  • Phil Sturgeon
    Phil Sturgeon over 9 years
    There are so many better ways to do this than string manipulation. If they add a new class to that HTML, or make any sort of minor tweak then you're screwed. Try goutte github.com/FriendsOfPHP/Goutte
  • Hiren Kubavat
    Hiren Kubavat about 9 years
    its ok but what about child content if they have multiple div and it has also multiple closign div (Code is correct but only for single div)
  • Sjon
    Sjon almost 9 years
    Why in the world do you use fopen/fwrite/require_once? Also; you are duplicating the accepted answer..?
  • Maximillian Laumeister
    Maximillian Laumeister almost 9 years
    Thank you for posting an answer to this question! Code-only answers are discouraged on Stack Overflow, because it can be difficult for the original poster (or future readers) to understand the logic behind them. Please, edit your question and include an explanation of your code so that others can benefit from your answer. Thanks!
  • Mr. Bhosale
    Mr. Bhosale about 7 years
    @worenga how can fetch all item(0) to item([last]) values here ?