How to download images from BeautifulSoup?

python python-2.7 beautifulsoup scrape

23,539

Solution 1

You need to download and write to disk:

import requests
from os.path  import basename

r = requests.get("xxx")
soup = BeautifulSoup(r.content)

for link in links:
    if "http" in link.get('src'):
        lnk = link.get('src')
        with open(basename(lnk), "wb") as f:
            f.write(requests.get(lnk).content)

You can also use a select to filter your tags to only get the ones with http links:

for link in soup.select("img[src^=http]"):
        lnk = link["src"]
        with open(basename(lnk)," wb") as f:
            f.write(requests.get(lnk).content)

Solution 2

While the other answers are perfectly correct.

I found it really slow to download and don't know the progress with really high resolution images.

So, made this one.

from bs4 import BeautifulSoup
import requests
import subprocess

url = "https://example.site/page/with/images"
html = requests.get(url).text # get the html
soup = BeautifulSoup(html, "lxml") # give the html to soup

# get all the anchor links with the custom class 
# the element or the class name will change based on your case
imgs = soup.findAll("a", {"class": "envira-gallery-link"})
for img in imgs:
    imgUrl = img['href'] # get the href from the tag
    cmd = [ 'wget', imgUrl ] # just download it using wget.
    subprocess.Popen(cmd) # run the command to download
    # if you don't want to run it parallel;
    # and wait for each image to download just add communicate
    subprocess.Popen(cmd).communicate()

Warning: It won't work on win/mac as it uses wget.

Bonus: You can see the progress of each image if you are not using communicate.

23,539

Fist Heart

Updated on February 24, 2021

Comments

Fist Heart about 3 years
Image http://i.imgur.com/OigSBjF.png

import requests from bs4 import BeautifulSoup
```
r = requests.get("xxxxxxxxx")
soup = BeautifulSoup(r.content)

for link in links:
    if "http" in link.get('src'):
       print link.get('src')
```
I get the printed URL but don't know how to work with it.
- Alex Hall almost 8 years
  
  BeautifulSoup is for parsing HTML, requests is for making requests over HTTP. Downloading falls into the latter category. requests.get that URL and then check the documentation on how to save the body of the response.
Padraic Cunningham almost 8 years

No worries, using the select approach would be the best approach over all to filter the tags
Corey Goldberg almost 6 years

launching a subprocess for wget shouldn't be faster than using a python lib for http.
Farhang Amaji about 3 years

I can't run this code and I get FileNotFoundError: [WinError 2] The system cannot find the file specified related to run subprocess.Popen(cmd) or subprocess.Popen(cmd).communicate()