Python Script to detect broken images

11,198

Solution 1

I have added another SO answer here that extends the PIL solution to better detect broken images. I also implemented this solution in my Python script here on GitHub.

I also verified that damaged files (jpg) frequently are not 'broken' images i.e, a damaged picture file sometimes remains a legit picture file, the original image is lost or altered but you are still able to load it.

I quote the other answer for completeness:

You can use Python Pillow(PIL) module, with most image formats, to check if a file is a valid and intact image file.

In the case you aim at detecting also broken images, @Nadia Alramli correctly suggests the im.verify() method, but this does not detect all the possible image defects, e.g., im.verify does not detect truncated images (that most viewer often load with a greyed area).

Pillow is able to detect these type of defects too, but you have to apply image manipulation or image decode/recode in or to trigger the check. Finally I suggest to use this code:

try:
  im = Image.load(filename)
  im.verify() #I perform also verify, don't know if he sees other types o defects
  im.close() #reload is necessary in my case
  im = Image.load(filename) 
  im.transpose(PIL.Image.FLIP_LEFT_RIGHT)
  im.close()
except: 
  #manage excetions here

In case of image defects this code will raise an exception. Please consider that im.verify is about 100 times faster than performing the image manipulation (and I think that flip is one of the cheaper transformations). With this code you are going to verify a set of images at about 10 MBytes/sec (modern 2.5Ghz x86_64 CPU).

For the other formats psd,xcf,.. you can use Imagemagick wrapper Wand, the code is as follows:

im = wand.image.Image(filename=filename)
temp = im.flip;
im.close()

But, from my experiments Wand does not detect truncated images, I think it loads lacking parts as greyed area without prompting.

I red that Imagemagick has an external command identify that could make the job, but I have not found a way to invoke that function programmatically and I have not tested this route.

I suggest to always perform a preliminary check, check the filesize to not be zero (or very small), is a very cheap idea:

statfile = os.stat(filename)
filesize = statfile.st_size
if filesize == 0:
  #manage here the 'faulty image' case

Solution 2

try the below: It worked fine for me. It identifies the bad/corrupted image and remove them as well. Or if you want you can only print the bad/corrupted file name and remove the final script to delete the file.

for filename in listdir('/Users/ajinkyabobade/Desktop/2/'):
    if filename.endswith('.JPG'):
        try:
            img = Image.open('/Users/ajinkyabobade/Desktop/2/'+filename)  # open the image file
            img.verify()  # verify that it is, in fact an image
        except (IOError, SyntaxError) as e:
            print(filename)
            os.remove('/Users/ajinkyabobade/Desktop/2/'+filename)

Solution 3

You are building a bad path with

img=Image.open('/Users/ajinkyabobade/Desktop/2'+filename)      

Try the following instead (by adding / to the end of the directory path)

img=Image.open('/Users/ajinkyabobade/Desktop/2/'+filename)      

or

img=Image.open(os.path.join('/Users/ajinkyabobade/Desktop/2', filename))
Share:
11,198
Ajinkya
Author by

Ajinkya

Updated on June 04, 2022

Comments

  • Ajinkya
    Ajinkya almost 2 years

    I wrote a python script to detect broken images and count them, The problem in my script is it detects all the images and does not detect broken images. How to fix this. I refered :

    How to check if a file is a valid image file? for my code

    My code

    import os
    from os import listdir
    from PIL import Image
    count=0
    for filename in os.listdir('/Users/ajinkyabobade/Desktop/2'):
        if filename.endswith('.JPG'):
         try:
          img=Image.open('/Users/ajinkyabobade/Desktop/2'+filename)
          img.verify()
         except(IOError,SyntaxError)as e:
             print('Bad file  :  '+filename)
             count=count+1
             print(count)