Get page generated with Javascript in Python

javascript python html download urllib2

32,376

#!/usr/bin/env python
from contextlib import closing
from selenium.webdriver import Firefox # pip install selenium
from selenium.webdriver.support.ui import WebDriverWait

# use firefox to get page with javascript generated content
with closing(Firefox()) as browser:
     browser.get(url)
     button = browser.find_element_by_name('button')
     button.click()
     # wait for the page to load
     WebDriverWait(browser, timeout=10).until(
         lambda x: x.find_element_by_id('someId_that_must_be_on_new_page'))
     # store it to string variable
     page_source = browser.page_source
print(page_source)

32,376

xralf

Updated on January 23, 2020

Comments

xralf over 4 years

I'd like to download web page generated by Javascript and store it to string variable in Python code. The page is generated when you click on button.

If I would know the resulting URL I would use urllib2 but this is not the case.

thank you
- e-satis over 12 years
  
  Is this generated completly in js or just built from an ajax call ?
- xralf over 12 years
  
  @e-satis I think that it's completely in js
- e-satis over 12 years
  
  Then I'd got with J.F solution, or with python webkit. Just keep in mind they require a display server to be running so if you plan to make it run on a headless server, you'll need to hack a little bit.
xralf over 12 years

is the WebDriverWait with someId_that_must_be_on_new_page neccessary? Could it be done only with some sleep or delay function? And is it possible to set the user-agent string?
xralf over 12 years

There is one problem yet. On the web page is select element and something have to be selected. If nothing is selected the button won't work. And is neccessary to open and close firefox? Without guit this won't work?
jfs over 12 years

you could use any condition you like e.g., x.title == 'New Title'. You probably could modify user-agent by using appropriate firefox profile.
jfs over 12 years

here's an example on how to select option. .quit() is not necessary.
xralf over 12 years

The method select_option(self, selector, value) takes selector parameter. I'm not sure what this parameter should be. Let's say I want to click on option with value = 100 of select with id = 'sel_id' and name = 'sel_name'. Could this be expressed in CSS?
jfs over 12 years

@xralf: select_option('select#sel_id', '100'). You could pass an element instead select_option(browser.find_element_by_id('sel_id'), '100').
xralf over 12 years

Thanks. I already used options = browser.find_elements_by_tag_name('option') for option in options: if option.get_attribute('value') == "100": option.click() and worked too.
alper over 3 years

Can this done by opening Firefox window on the background?
jfs over 2 years

@alper yes, there headless options