Python Selenium download images (jpeg, png) or PDF using ChromeDriver

12,247

Solution 1

Instead of relying in specific browser / driver options I would implement a more generic solution using the image url to perform the download.

You can get the image URL using similar code:

driver.find_element_by_id("your-image-id").get_attribute("src")

And then I would download the image using, for example, urllib.

Here's some pseudo-code for Python2:

import urllib

url = driver.find_element_by_id("your-image-id").get_attribute("src")
urllib.urlretrieve(url, "local-filename.jpg")

Here's the same for Python3:

import urllib.request

url = driver.find_element_by_id("your-image-id").get_attribute("src")
urllib.request.urlretrieve(url, "local-filename.jpg")

Edit after the comment, just another example about how to download a file once you know its URL:

import requests
from PIL import Image
from io import StringIO

image_name = 'image.jpg'
url = 'http://example.com/image.jpg'

r = requests.get(url)

i = Image.open(StringIO(r.content))
i.save(image_name)

Solution 2

Here is another simple way, but @Pitto's answer above is slightly more succinct.

import requests

webelement_img = ff.find_element(By.XPATH, '//img')
url = webelement_img.get_attribute('src') or 'https://someimages.com/path-to-image.jpg'
data = requests.get(url).content
local_filename = 'filename_on_your_computer.jpg'

with open (local_filename, 'wb') as f:
    f.write(data)
Share:
12,247
animesharma
Author by

animesharma

Updated on June 28, 2022

Comments

  • animesharma
    animesharma almost 2 years

    I have a Selenium script in Python (using ChromeDriver on Windows) that fetches the download links of various attachments(of different file types) from a page and then opens these links to download the attachments. This works fine for the file types which ChromeDriver can't preview as they get downloaded by default. But images(JPEG, PNG) and PDFs are previewed by default and hence aren't automatically downloaded.

    The ChromeDriver options I am currently using (work for non preview-able files) :

    chrome_options = webdriver.ChromeOptions()
    prefs = {'download.default_directory' : 'custom_download_dir'}
    chrome_options.add_experimental_option('prefs', prefs)
    driver = webdriver.Chrome("./chromedriver.exe", chrome_options=chrome_options)
    

    This downloads the files to 'custom_download_dir', no issues. But the preview-able files are just previewed in the ChromeDriver instance and not downloaded.

    Are there any ChromeDriver Settings that can disable this preview behavior and directly download all files irrespective of the extensions?

    If not, can this be done using Firefox for instance?

  • animesharma
    animesharma about 6 years
    The problem is to view the image I require authentication. I tried with the Python Requests library and it requires Kerberos Authentication, I tried supplying credentials and using the Python Kerberos library but it just doesn't work. I can view it on Selenium WebDriver, so I am looking for a way to download via the WebDriver instance itself.
  • Pitto
    Pitto about 6 years
    What about disabling the auto-open for images on Google Chrome? That could trigger the automatic download... presentermedia.com/blog/2013/10/…
  • animesharma
    animesharma about 6 years
    Is there an option to disable auto-open using the Chrome WebDriver settings in Python?
  • oldboy
    oldboy almost 3 years
    @halfer apparently, urlretrieve is legacy. is there a newer, better way to do this?
  • halfer
    halfer almost 3 years
    No probs @oldboy. Pitto, thanks for making an edit - don't forget to draw people's attention to them. People will only see the change if they subscribe to your answer, so it might have been missed in this case.