Take a screenshot from a website from commandline or with python

22,397

Solution 1

Sometimes you need extra http headers such User-Agent to get downloads to work. In python 2.7, you can:

import urllib2
request = urllib2.Request(
    r'http://books.google.de/books?id=gikDAAAAMBAJ&pg=PA1&img=1&w=2500',
    headers={'User-Agent':'Mozilla/5.0 (X11; U; Linux i686) Gecko/20071127 firefox/2.0.0.11'})
page = urllib2.urlopen(request)

with open('somefile.png','wb') as f:
    f.write(page.read())

Or you can look at the params for adding http headers in wget or curl.

Solution 2

You can use ghost.py if you like. https://github.com/jeanphix/Ghost.py

Here is an example of how to use it.

from ghost import Ghost
ghost = Ghost(wait_timeout=4)
ghost.open('http://www.google.com')
ghost.capture_to('screen_shot.png')

The last line saves the image in your current directory.

Hope this helps

Solution 3

I had difficulty getting Ghost to take a screenshot consistently on a headless Centos VM. Selenium and PhantomJS worked for me:

from selenium import webdriver
br = webdriver.PhantomJS()
br.get('http://www.stackoverflow.com')
br.save_screenshot('screenshot.png')
br.quit
Share:
22,397
danbruegge
Author by

danbruegge

console.log('Hello World')

Updated on July 29, 2022

Comments

  • danbruegge
    danbruegge almost 2 years

    i will take a screenshot from this page: http://books.google.de/books?id=gikDAAAAMBAJ&pg=PA1&img=1&w=2500 or save the image that it outputs.

    But i can't find a way. With wget/curl i get an "unavailable error" and also with others tools like webkit2png/wkhtmltoimage/wkhtmltopng.

    Is there a clean way to do it with python or from commandline?

    Best regards!

  • danbruegge
    danbruegge about 11 years
    Nice one. Looks realy good, but i don't want to install Qt. :/
  • Ashish Gupta
    Ashish Gupta over 9 years
    I am getting this error when running this:Traceback (most recent call last): File "C:\bunker\Lib\site-packages\custom_selenium.py", line 2, in <module> br = webdriver.PhantomJS() File "C:\bunker\Lib\site-packages\selenium\webdriver\phantomjs\we‌​bdriver.py", line 49, in __init__ service_args=service_args,log_path=service_log_path) TypeError: __init__() got an unexpected keyword argument 'log_path'
  • billrichards
    billrichards over 9 years
    hmm, not sure but i wonder what happens if you edit webdriver.py init and remove the log_path argument
  • Pant
    Pant over 8 years
    Yet it will not produce an image of the captured website. The image will be broken.
  • tdelaney
    tdelaney over 8 years
    @SarvagyaPant I ran this script and verified that a non-broken image is downloaded. This took me less than a minute. Can you please put a minimum of work in before making unsubstantiated claims.
  • Pant
    Pant over 8 years
    It will make correct image only when the url is direct link to image. For other html based web-page, this won't work. Moreover, one can directly use urllib.urlretrieve if the url is guaranteed to be an image.
  • tdelaney
    tdelaney over 8 years
    It works for any single resource such as an image, a web page, an mp3, pdf and etc... It doesn't follow links or build a composite web page, but that's not what the user was after. He showed us a url to an image and said he wanted a "screenshot" of the image. But the "screenshot" is just the image file itself. There are multiple ways to download web content - my example is a perfectly normal accepted way.