BeautifulSoup: Extract img alt data
13,239
Inside your for
loop, you can obtain that by simply doing
image.get('alt', '')
This is explained in BeautifulSoup's documentation ("The attributes of Tags").
Comments
-
add-semi-colons almost 2 years
I have following image html and I am trying to parse information that is in alt. Currently I am able to successfully extract images.
html (What I currently parse
<img class="rslp-p" alt="Sony Cyber-shot DSC-W570 16.1 MP Digital Camera - Silver" src="http://i.ebayimg.com/00/$(KGrHqZ,!j!E5dyh0jTpBO(3yE7Wg!~~_26.JPG?set_id=89040003C1" itemprop="image" />
I construct the image name from what I parse:
Current Code
def main(url, output_folder="~/images"): """Download the images at url""" soup = bs(urlopen(url)) parsed = list(urlparse.urlparse(url)) count = 0 for image in soup.findAll("img"): print image count += 1 print count print "Image: %(src)s" % image image_url = urlparse.urljoin(url, image['src']) filename = image["src"].split("/")[-1].split("?")[0].replace("$",'').replace(".JPG",".jpg").replace("~~_26",str(count)).lstrip("(") parsed[2] = image["src"] outpath = os.path.join(output_folder, filename) urlretrieve(image_url, outpath)
What I would like to do is extract is
alt="Sony Cyber-shot DSC-W570 16.1 MP Digital Camera - Silver"
also I want to use alt data as the file name when I extract the image.
-
larissa almost 12 yearskey error means that a particular img tag doesn't have an alt attribute. are you sure every image on the page has alt text associated with it?
-
Gonzalo almost 12 yearsedited answer, it should work for the case @anyaMairead mentions
-
add-semi-colons almost 12 yearsactually some don't have i am trying to avoid those that don't have
-
add-semi-colons almost 12 years@GonzaloDelgado thanks how can i add the alt information as filename..?
-
Gonzalo almost 12 yearsdepends on how you want the filename to look like, you can just mix it in the filename construct of your sample code, though there's plenty of room for improvement there, I'd say you ask about that at Code Reviews codereview.stackexchange.com