Get the content (text) of an URL after Javascript has run with PHP

36,981

Solution 1

Update 2 Adds more details on how to use phantomjs from PHP.

Update 1 (after clarification that javascript on target page need to run first)

Method 1:Use phantomjs(will execute javascript);

1. Download phantomjs and place the executable in a path that your PHP binary can reach.

2. Place the following 2 files in the same directory:

get-website.php

<?php
    
    $phantom_script= dirname(__FILE__). '/get-website.js'; 


    $response =  exec ('phantomjs ' . $phantom_script);

    echo  htmlspecialchars($response);
    ?>

get-website.js

var webPage = require('webpage');
var page = webPage.create();

page.open('http://google.com/', function(status) {
 console.log(page.content);
  phantom.exit();
});

3. Browse to get-website.php and the target site, http://google.com contents will return after executing inline javascript. You can also call this from a command line using php /path/to/get-website.php.

Method 2:Use Ajax with PHP (No phantomjs so won't run javascript);

/get-website.php

<?php
    
    $html=file_get_contents('http://google.com');
    echo $html;
    ?>

test.html

<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>on demo</title>
<style>
p {
color: red;
}
span {
color: blue;
}
</style>
<script src="https://code.jquery.com/jquery-1.10.2.js"></script>
</head>
<body>
<button id='click_me'>Click me</button>
<span style="display:none;"></span>
<script>

$( "#click_me" ).click(function () {
    $.get("/get-website.php", function(data) {
        var json = {
            html: JSON.stringify(data),
            delay: 1
        };
        alert(json.html);
        });
});
</script>
</body>
</html>

Solution 2

I found a fantastic page on this, it's an entire tutorial on how to process the DOM of a page in PHP which is entirely created using javascript.

https://www.jacobward.co.uk/using-php-to-scrape-javascript-jquery-json-websites/ "PhantomJS development is suspended until further notice" so that option isn't a good one.

Share:
36,981
Victor Ferreira
Author by

Victor Ferreira

Web developer, majored in Information Systems and Informatics, now learning how to program for Android. In love with flat design.

Updated on September 25, 2020

Comments

  • Victor Ferreira
    Victor Ferreira over 3 years

    Is it possible to get the content of a URL with PHP (using some sort of function like file_get_contents or header) but only after the execution of some JavaScript code?

    Example:

    mysite.com has a script that does loadUrlAfterJavascriptExec('http://exampletogetcontent.com/') and prints/echoes the content. imagine that some jQuery runs on http://exampletogetcontent.com/ that changes DOM, and loadUrlAfterJavascriptExec will get the resulting HTML

    Can we do that?

    Just to be clear, what I want is to get the content of a page through a URL, but only after JavaScript runs on the target page (the one PHP is getting its content).

    I am aware PHP runs before the page is sent to the client, and JS only after that, but thought that maybe there was an expert workaround.

  • AndrewD
    AndrewD about 9 years
    @victor-ferreira Did you have a chance to look at this solution?
  • Adamantus
    Adamantus over 5 years
    This is out of date and PhantomJS is no longer in production.
  • valepu
    valepu over 2 years
    The article does not seem to be available anymore but it's available on waybackmachine