How to download images from BeautifulSoup?

23,539

Solution 1

You need to download and write to disk:

import requests
from os.path  import basename

r = requests.get("xxx")
soup = BeautifulSoup(r.content)

for link in links:
    if "http" in link.get('src'):
        lnk = link.get('src')
        with open(basename(lnk), "wb") as f:
            f.write(requests.get(lnk).content)

You can also use a select to filter your tags to only get the ones with http links:

for link in soup.select("img[src^=http]"):
        lnk = link["src"]
        with open(basename(lnk)," wb") as f:
            f.write(requests.get(lnk).content)

Solution 2

While the other answers are perfectly correct.

I found it really slow to download and don't know the progress with really high resolution images.

So, made this one.

from bs4 import BeautifulSoup
import requests
import subprocess

url = "https://example.site/page/with/images"
html = requests.get(url).text # get the html
soup = BeautifulSoup(html, "lxml") # give the html to soup

# get all the anchor links with the custom class 
# the element or the class name will change based on your case
imgs = soup.findAll("a", {"class": "envira-gallery-link"})
for img in imgs:
    imgUrl = img['href'] # get the href from the tag
    cmd = [ 'wget', imgUrl ] # just download it using wget.
    subprocess.Popen(cmd) # run the command to download
    # if you don't want to run it parallel;
    # and wait for each image to download just add communicate
    subprocess.Popen(cmd).communicate()

Warning: It won't work on win/mac as it uses wget.

Bonus: You can see the progress of each image if you are not using communicate.

Share:
23,539

Related videos on Youtube

Fist Heart
Author by

Fist Heart

Updated on February 24, 2021

Comments

  • Fist Heart
    Fist Heart about 3 years

    Image http://i.imgur.com/OigSBjF.png

    import requests from bs4 import BeautifulSoup

    r = requests.get("xxxxxxxxx")
    soup = BeautifulSoup(r.content)
    
    for link in links:
        if "http" in link.get('src'):
           print link.get('src')
    

    I get the printed URL but don't know how to work with it.

    • Alex Hall
      Alex Hall almost 8 years
      BeautifulSoup is for parsing HTML, requests is for making requests over HTTP. Downloading falls into the latter category. requests.get that URL and then check the documentation on how to save the body of the response.
  • Padraic Cunningham
    Padraic Cunningham almost 8 years
    No worries, using the select approach would be the best approach over all to filter the tags
  • Corey Goldberg
    Corey Goldberg almost 6 years
    launching a subprocess for wget shouldn't be faster than using a python lib for http.
  • Farhang Amaji
    Farhang Amaji about 3 years
    I can't run this code and I get FileNotFoundError: [WinError 2] The system cannot find the file specified related to run subprocess.Popen(cmd) or subprocess.Popen(cmd).communicate()