Downloading only images using curl or wget?

9,862

wget returns a non-zero exit code on error; it specifically sets exit status == 8 if the remote issued a 4xx or 5xx status. So, you can modify your bash loop to unlink the file if wget doesn't exit with success:

for i in {1..10}
do
    wget --accept .jpg,.jpeg --cookies=on --load-cookies=cookies.txt -p "http://www.someforum.com/attachment.php&id=${i}" -O "image${i}.jpg" || rm "image${i}.jpg"
done

Similarly, curl has a --fail option, with which it wont save the file and returns exit status 22 when the http status is >= 400.

Share:
9,862

Related videos on Youtube

Ashley
Author by

Ashley

Developer Productivity Engineer at Hashicorp

Updated on September 18, 2022

Comments

  • Ashley
    Ashley over 1 year

    UPDATED:

    I've found using this Bash script fixes the problem of having GIF files with a .jpg extension.


    I'm attempting to download images from a forum who's url uses the following format:

    http://www.someforum.com/attachment.php&id=XXX

    I wrote a bash script that uses wget to retrieve these images:

    for i in {1..10}
    
    do
        wget --accept .jpg,.jpeg --cookies=on --load-cookies=cookies.txt -p "http://www.someforum.com/attachment.php&id=${i}" -O "image${i}.jpg"
    
    done
    

    It works and downloads the images. However if there isn't an image it still downloads the resulting HTML and stuffs it in XX.jpg.

    Curl does the same:

    for i in {1..10}
    
    do
        curl --cookie cookies.txt "http://www.someforum.com/attachment.php&id=${i}" -o "image${i}.jpg"
    
    done
    

    Is there anyway to reject results that are not /image/*? Right now I am assuming that the images are jpeg, it would be nice to detect the MIME/TYPE and use the appropriate filename.

    Finally, wget is giving 500 response codes when an image isn't found, if I can filter 200 response codes this may yield a solution.

    Bash, Ruby, Python answers are acceptable.

  • Ashley
    Ashley about 12 years
    This works, thank you. I'm going to leave the question open for a while to see if anybody comments regarding MIME/Content-Type as I'm still (understandably) getting GIF files saved as .jpg
  • dbenhur
    dbenhur about 12 years
    Ah, I didn't realize you had files saved with wrong extension problem too. I would have suggested file --mime to map the appropriate real extension too.
  • Ashley
    Ashley about 12 years
    I was hoping to avoid post processing but having looked through the entire wget manual it seems as though it's the only way.