BeautifulSoup: Extract img alt data

13,239

Inside your for loop, you can obtain that by simply doing

image.get('alt', '')

This is explained in BeautifulSoup's documentation ("The attributes of Tags").

Share:
13,239
add-semi-colons
Author by

add-semi-colons

Find missing Semicolons;

Updated on June 21, 2022

Comments

  • add-semi-colons
    add-semi-colons almost 2 years

    I have following image html and I am trying to parse information that is in alt. Currently I am able to successfully extract images.

    html (What I currently parse

    <img class="rslp-p" alt="Sony Cyber-shot DSC-W570 16.1 MP Digital Camera - Silver" src="http://i.ebayimg.com/00/$(KGrHqZ,!j!E5dyh0jTpBO(3yE7Wg!~~_26.JPG?set_id=89040003C1" itemprop="image" />
    

    I construct the image name from what I parse:

    Current Code

    def main(url, output_folder="~/images"):
             """Download the images at url"""
             soup = bs(urlopen(url))
             parsed = list(urlparse.urlparse(url))
             count = 0
             for image in soup.findAll("img"):
                 print image
                 count += 1
                 print count
                 print "Image: %(src)s" % image
                 image_url = urlparse.urljoin(url, image['src'])
                 filename = image["src"].split("/")[-1].split("?")[0].replace("$",'').replace(".JPG",".jpg").replace("~~_26",str(count)).lstrip("(")
                 parsed[2] = image["src"]
                 outpath = os.path.join(output_folder, filename)
                 urlretrieve(image_url, outpath)
    

    What I would like to do is extract is

    alt="Sony Cyber-shot DSC-W570 16.1 MP Digital Camera - Silver"
    

    also I want to use alt data as the file name when I extract the image.

  • larissa
    larissa almost 12 years
    key error means that a particular img tag doesn't have an alt attribute. are you sure every image on the page has alt text associated with it?
  • Gonzalo
    Gonzalo almost 12 years
    edited answer, it should work for the case @anyaMairead mentions
  • add-semi-colons
    add-semi-colons almost 12 years
    actually some don't have i am trying to avoid those that don't have
  • add-semi-colons
    add-semi-colons almost 12 years
    @GonzaloDelgado thanks how can i add the alt information as filename..?
  • Gonzalo
    Gonzalo almost 12 years
    depends on how you want the filename to look like, you can just mix it in the filename construct of your sample code, though there's plenty of room for improvement there, I'd say you ask about that at Code Reviews codereview.stackexchange.com