Php cURL Web Scraping

21,078

First, you forgot to instantiate the DOMDocument class, (at least on the code you have in this question).

$curl = curl_init('http://www.flipkart.com/apple-iphone-5s/p/itmdv6f75dyxhmt4?pid=MOBDPPZZDX8WSPAT');
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($curl,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');

$page = curl_exec($curl);

if(!empty($curl)) { //if any html is actually returned

    $pokemon_doc = new DOMDocument;
    libxml_use_internal_errors(true);
    $pokemon_doc->loadHTML($page);
    libxml_clear_errors();

    $pokemon_xpath = new DOMXPath($pokemon_doc);

    $price = $pokemon_xpath->evaluate('string(//div[@class="prices"]/meta[@itemprop="price"]/@content)');
    echo $price;

    $rupees = $pokemon_xpath->evaluate('string(//div[@class="prices"]/div/span)');
    echo $rupees;
}
else {
    print "Not found";
}

Sample Output

Share:
21,078
Jiteen
Author by

Jiteen

I am developer and Designer who likes to experiment with code and Colors. My area of interest is PHP, JS, Jquery and also Photoshop and Illustrator. I use my time in exploring new things, watching movies and staying stylish and awesome. Wanna know more, then contact me via Phone : +91 8802312680 or Facebook : fb/jiteen.khera or visit me www.jiteenkhera.com

Updated on January 28, 2020

Comments

  • Jiteen
    Jiteen over 4 years

    I want to scrape the Price of the mobile phone from the website url : http://www.flipkart.com/apple-iphone-5s/p/itmdv6f75dyxhmt4?pid=MOBDPPZZDX8WSPAT

    If you view the code, the price is placed in the following SPAN

    <div class="pricing line">
            <div class="prices" itemprop="offers" itemscope="" itemtype="http://schema.org/Offer">
                        <div>
                            <span class="selling-price omniture-field" data-omnifield="eVar48" data-eVar48="37500">Rs. 37,500</span> // Fetch this price
                        </div>
                        <span class="sticky-message">Selling Price</span>
                <meta itemprop="price" content="37,500"> 
                <meta itemprop="priceCurrency" content="INR">
            </div>
    </div>
    

    My code to fetch this so far is :

    <?php
    $curl = curl_init('http://www.flipkart.com/apple-iphone-5s/p/itmdv6f75dyxhmt4?pid=MOBDPPZZDX8WSPAT');
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
    
    $page = curl_exec($curl);
    
    if(!empty($curl)){ //if any html is actually returned
    
        $pokemon_doc->loadHTML($curl);
        libxml_clear_errors(); //remove errors for yucky html
    
        $pokemon_xpath = new DOMXPath($pokemon_doc);
    
        //get all the h2's with an id
        $pokemon_row = $pokemon_xpath->query('//h2[@id]');
    
        if($pokemon_row->length > 0){
            foreach($pokemon_row as $row){
                echo $row->nodeValue . "<br/>";
            }
        }
    }
    
    else
        print "Not found";
    ?>
    

    This shows an error :

    Fatal error: Call to a member function loadHTML() on a non-object in D:\xampp\htdocs\jiteen\php-scrape\phpScrape.php on line 9

    What shall I do, I can't trace the error

  • Jiteen
    Jiteen over 9 years
    Thanks for your time and advice. It removed the error, but now its a Blank page and I do not have the desired output i.e the Price from the URL. Please help me in that as well.
  • Kevin
    Kevin over 9 years
    @Jiteen which price is it anyway? the rupees?
  • Jiteen
    Jiteen over 9 years
    Yes, it is in Rs. I beleive it was US DOllar we could have searched on the basis of '$' sign, I guess. As you may have noticed, I just want to fetch the value in the following line : ** <span class="selling-price omniture-field" data-omnifield="eVar48" data-eVar48="37500">Rs. 37,500</span>**
  • Kevin
    Kevin over 9 years
    @Jiteen oh okay, just as i thought, check out my revision, there's also a sample output
  • Jiteen
    Jiteen over 9 years
    This works GREAT. Sorry to bother you again, I want to fetch the price of SnapDeal as well. Check this link: snapdeal.com/product/apple-iphone-5s-16-gb/… The Price lies in : <span id="selling-price-id" itemprop="price">36920</span> I want to know how shall I change the $price = $pokemon_xpath->evaluate('string(//div[@class="prices"]/meta‌​[@itemprop="price"]/‌​@content)'); to fetch the desired data.
  • Kevin
    Kevin over 9 years
    @Jiteen just the same concept, instead of class just use id, glad this helped
  • Jiteen
    Jiteen over 9 years
    I am facing issue in adding that. Please check, did I wrote this pice of code correct ? $price = $pokemon_xpath->evaluate('string(//div[@class="prodbuy-price‌​"]/span[@itemprop="p‌​rice"]/@)'); In the DIV with Class 'prodbuy-price', search for SPAN with attribute itemprop = 'price' and get its value.
  • Kevin
    Kevin over 9 years
    @Jiteen just right click -> inspect element on the price, you'll see the markup, its actually id="selling-price-id"
  • Jiteen
    Jiteen over 9 years
    I tried that but wasn't much successful. Can you please provide me the correct code for the line : $price = $pokemon_xpath->evaluate('string(//div[@class="prodbuy-price‌​"]/span[@id=selling-‌​price-id]/)'); I want to fetch the value of the price. Please check what;s the error and what can be done to correct it. I really appreciate you help so far. Thanks!
  • Kevin
    Kevin over 9 years
    @Jiteen no need for the div, just point to it directly, string(//span[@id="id here"])
  • Jiteen
    Jiteen over 9 years