Get the content (text) of an URL after Javascript has run with PHP
Solution 1
Update 2 Adds more details on how to use phantomjs
from PHP.
Update 1 (after clarification that javascript on target page need to run first)
Method 1:Use phantomjs(will execute javascript);
1. Download phantomjs and place the executable in a path that your PHP binary can reach.
2. Place the following 2 files in the same directory:
get-website.php
<?php
$phantom_script= dirname(__FILE__). '/get-website.js';
$response = exec ('phantomjs ' . $phantom_script);
echo htmlspecialchars($response);
?>
get-website.js
var webPage = require('webpage');
var page = webPage.create();
page.open('http://google.com/', function(status) {
console.log(page.content);
phantom.exit();
});
3. Browse to get-website.php
and the target site, http://google.com
contents will return after executing inline javascript. You can also call this from a command line using php /path/to/get-website.php
.
Method 2:Use Ajax with PHP (No phantomjs so won't run javascript);
/get-website.php
<?php
$html=file_get_contents('http://google.com');
echo $html;
?>
test.html
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>on demo</title>
<style>
p {
color: red;
}
span {
color: blue;
}
</style>
<script src="https://code.jquery.com/jquery-1.10.2.js"></script>
</head>
<body>
<button id='click_me'>Click me</button>
<span style="display:none;"></span>
<script>
$( "#click_me" ).click(function () {
$.get("/get-website.php", function(data) {
var json = {
html: JSON.stringify(data),
delay: 1
};
alert(json.html);
});
});
</script>
</body>
</html>
Solution 2
I found a fantastic page on this, it's an entire tutorial on how to process the DOM of a page in PHP which is entirely created using javascript.
https://www.jacobward.co.uk/using-php-to-scrape-javascript-jquery-json-websites/ "PhantomJS development is suspended until further notice" so that option isn't a good one.
Victor Ferreira
Web developer, majored in Information Systems and Informatics, now learning how to program for Android. In love with flat design.
Updated on September 25, 2020Comments
-
Victor Ferreira over 3 years
Is it possible to get the content of a URL with PHP (using some sort of function like
file_get_contents
orheader
) but only after the execution of some JavaScript code?Example:
mysite.com has a script that does
loadUrlAfterJavascriptExec('http://exampletogetcontent.com/')
and prints/echoes the content. imagine that some jQuery runs onhttp://exampletogetcontent.com/
that changes DOM, andloadUrlAfterJavascriptExec
will get the resulting HTMLCan we do that?
Just to be clear, what I want is to get the content of a page through a URL, but only after JavaScript runs on the target page (the one PHP is getting its content).
I am aware PHP runs before the page is sent to the client, and JS only after that, but thought that maybe there was an expert workaround.
-
AndrewD about 9 years@victor-ferreira Did you have a chance to look at this solution?
-
Adamantus over 5 yearsThis is out of date and PhantomJS is no longer in production.
-
valepu over 2 yearsThe article does not seem to be available anymore but it's available on waybackmachine