How to download images from BeautifulSoup?
23,539
Solution 1
You need to download and write to disk:
import requests
from os.path import basename
r = requests.get("xxx")
soup = BeautifulSoup(r.content)
for link in links:
if "http" in link.get('src'):
lnk = link.get('src')
with open(basename(lnk), "wb") as f:
f.write(requests.get(lnk).content)
You can also use a select to filter your tags to only get the ones with http links:
for link in soup.select("img[src^=http]"):
lnk = link["src"]
with open(basename(lnk)," wb") as f:
f.write(requests.get(lnk).content)
Solution 2
While the other answers are perfectly correct.
I found it really slow to download and don't know the progress with really high resolution images.
So, made this one.
from bs4 import BeautifulSoup
import requests
import subprocess
url = "https://example.site/page/with/images"
html = requests.get(url).text # get the html
soup = BeautifulSoup(html, "lxml") # give the html to soup
# get all the anchor links with the custom class
# the element or the class name will change based on your case
imgs = soup.findAll("a", {"class": "envira-gallery-link"})
for img in imgs:
imgUrl = img['href'] # get the href from the tag
cmd = [ 'wget', imgUrl ] # just download it using wget.
subprocess.Popen(cmd) # run the command to download
# if you don't want to run it parallel;
# and wait for each image to download just add communicate
subprocess.Popen(cmd).communicate()
Warning: It won't work on win/mac as it uses wget.
Bonus: You can see the progress of each image if you are not using communicate.
Related videos on Youtube
Author by
Fist Heart
Updated on February 24, 2021Comments
-
Fist Heart about 3 years
Image http://i.imgur.com/OigSBjF.png
import requests from bs4 import BeautifulSoup
r = requests.get("xxxxxxxxx") soup = BeautifulSoup(r.content) for link in links: if "http" in link.get('src'): print link.get('src')
I get the printed URL but don't know how to work with it.
-
Alex Hall almost 8 yearsBeautifulSoup is for parsing HTML,
requests
is for making requests over HTTP. Downloading falls into the latter category.requests.get
that URL and then check the documentation on how to save the body of the response.
-
-
Padraic Cunningham almost 8 yearsNo worries, using the select approach would be the best approach over all to filter the tags
-
Corey Goldberg almost 6 yearslaunching a subprocess for wget shouldn't be faster than using a python lib for http.
-
Farhang Amaji about 3 yearsI can't run this code and I get
FileNotFoundError: [WinError 2] The system cannot find the file specified
related to runsubprocess.Popen(cmd)
orsubprocess.Popen(cmd).communicate()