How can I identify the images with 'Possibly corrupt EXIF data'

12,338

Solution 1

The easiest way that comes to mind is to modify your code to handle one image at a time, then iterate over each images and check which one generates the warning.

Solution 2

Edit: To raise Warnings as errors which you can catch, take a look at Justas comment below.


Even if this question is over a year old, i want to show my solution cause i was running into the same problem.

I was editing the error messages. The output shows where to find the file on your system and also the line number. For example i changed following:

if len(data) != size:
    warnings.warn("Possibly corrupt EXIF data.  "
                  "Expecting to read %d bytes but only got %d."
                  " Skipping tag %s" % (size, len(data), tag))
    continue

to

if len(data) != size:
    raise ValueError('Corrupt Exif data')
    warnings.warn("Possibly corrupt EXIF data.  "
                  "Expecting to read %d bytes but only got %d."
                  " Skipping tag %s" % (size, len(data), tag))
    continue

My code to catch the ValueError is shown below. The code gives you the advantage that PIL is interrupted and is not showing an useless message. Also you can catch this one and use it, e.g. to delete the corresponding file via the 'except' part.

import os
from PIL import Image

imageFolder = /Path/To/Image/Folder
listImages = os.listdir(imageFolder)

for img in listImages:
    imgPath = os.path.join(imageFolder,img)
            
    try:
        img = Image.open(imgPath)
        exif_data = img._getexif()
    except ValueError as err:
        print(err)
        print("Error on image: ", img)

I know adding the ValueError part is quick and dirty, but it's better than get confronted with all the useless warning messages.

Share:
12,338
user3768495
Author by

user3768495

Updated on June 04, 2022

Comments

  • user3768495
    user3768495 almost 2 years

    I am working on a image classification Kaggle competition and download some training images from Kaggle.com. Then I am using transfer learning with ResNet50 to work on these images, within Keras 2.0 and Tensorflow as background (and Python 3).

    However, 258 out the total 1281 train images are having 'Possibly corrupt EXIF data' and been ignored when loaded to the ResNet model, very likely due to a Pillow issue.

    The output messages are like:

    /home/shi/anaconda3/lib/python3.6/site-packages/PIL/TiffImagePlugin.py:692: UserWarning: Possibly corrupt EXIF data.  Expecting to read 524288 bytes but only got 0. Skipping tag 3
      "Skipping tag %s" % (size, len(data), tag))
    /home/shi/anaconda3/lib/python3.6/site-packages/PIL/TiffImagePlugin.py:692: UserWarning: Possibly corrupt EXIF data.  Expecting to read 393216 bytes but only got 0. Skipping tag 3
      "Skipping tag %s" % (size, len(data), tag))
    /home/shi/anaconda3/lib/python3.6/site-packages/PIL/TiffImagePlugin.py:692: UserWarning: Possibly corrupt EXIF data.  Expecting to read 33554432 bytes but only got 0. Skipping tag 4
      "Skipping tag %s" % (size, len(data), tag))
    /home/shi/anaconda3/lib/python3.6/site-packages/PIL/TiffImagePlugin.py:692: UserWarning: Possibly corrupt EXIF data.  Expecting to read 25165824 bytes but only got 0. Skipping tag 4
      "Skipping tag %s" % (size, len(data), tag))
    /home/shi/anaconda3/lib/python3.6/site-packages/PIL/TiffImagePlugin.py:692: UserWarning: Possibly corrupt EXIF data.  Expecting to read 131072 bytes but only got 0. Skipping tag 3
      "Skipping tag %s" % (size, len(data), tag))
    (more to come ...)
    

    Based on the output messages, I only know they are there, but don't know which ones they are...

    My question is: how can I identify these 258 images so that I can manually remove them out of the data set?

  • Serġan
    Serġan about 5 years
    Doesnt work on PNG files, getting: AttributeError: 'PngImageFile' object has no attribute '_getexif'
  • Clown77
    Clown77 about 5 years
    @Serġan that's because png has no such information. See EXIF. You should normally find this information just in the .JPG, .TIF, .WAV formats.
  • Justas
    Justas over 3 years
    To catch UserWarning like an Exception you can use this instead.
  • Clown77
    Clown77 over 3 years
    @Justas Thank you, I was not aware of this possibility.