Scraping dynamic page content phantomjs

13,879

Solution 1

You can use page.content to get the full HTML DOM

Solution 2

I would recommend pjscrape http://nrabinowitz.github.com/pjscrape/ if you want to scrape using PhantomJS

Share:
13,879
user985590
Author by

user985590

Updated on June 04, 2022

Comments

  • user985590
    user985590 almost 2 years

    My company is using a website that hosts all of our FAQ and customer questions. We have plans to go through and wipe out all of the old data and input new and the service does not have a backup, or archive option for questions we don't want to appear anymore.

    I've gone through and tried to scape the site using perl and mechanize, but I'm missing the customer comments on the page as they are loaded through ajax. I have looked at phantomjs and can get the pages to save to an image using an example page, however, I'd like to get an full page html dump of the page, but can't figure out how. I used this example code on our site

    var page = new WebPage();
    
    page.open('http://espn.go.com/nfl/', function (status) {
    //once page loaded, include jQuery from cdn
    page.includeJs("http://ajax.googleapis.com/ajax/libs/jquery/1.6.1/jquery.min.js", function() {
    //once jQuery loaded, run some code
    //inserts our custom text into the page
    page.evaluate(function(){$("h2").html('Many NFL Players Scared that Chad Moon Will Enter League');});
    //take screenshot and exit
    page.render('espn.png');
    phantom.exit();
    
    });
    
    });
    

    Is there a way using phantomjs that I can just get a full page dump of the data, similar to if I did a view source in chrome? I can do this with perl + mechanize, but don't see how to do this using phantomjs.