如何使用“可能损坏的EXIF数据”识别图像'

时间:2017-06-06 23:13:57

标签: python-3.x tensorflow python-imaging-library keras exif

我正在进行图像分类Kaggle比赛并从Kaggle.com下载一些训练图像。然后我使用ResNet50的转移学习来处理这些图像,在Keras 2.0和Tensorflow中作为背景(和Python 3)。

然而,总共1281张火车图像中有258张可能会损坏EXIF数据'并且在加载到ResNet模型时被忽略,很可能是由于Pillow issue

输出消息如下:

/home/shi/anaconda3/lib/python3.6/site-packages/PIL/TiffImagePlugin.py:692: UserWarning: Possibly corrupt EXIF data.  Expecting to read 524288 bytes but only got 0. Skipping tag 3
  "Skipping tag %s" % (size, len(data), tag))
/home/shi/anaconda3/lib/python3.6/site-packages/PIL/TiffImagePlugin.py:692: UserWarning: Possibly corrupt EXIF data.  Expecting to read 393216 bytes but only got 0. Skipping tag 3
  "Skipping tag %s" % (size, len(data), tag))
/home/shi/anaconda3/lib/python3.6/site-packages/PIL/TiffImagePlugin.py:692: UserWarning: Possibly corrupt EXIF data.  Expecting to read 33554432 bytes but only got 0. Skipping tag 4
  "Skipping tag %s" % (size, len(data), tag))
/home/shi/anaconda3/lib/python3.6/site-packages/PIL/TiffImagePlugin.py:692: UserWarning: Possibly corrupt EXIF data.  Expecting to read 25165824 bytes but only got 0. Skipping tag 4
  "Skipping tag %s" % (size, len(data), tag))
/home/shi/anaconda3/lib/python3.6/site-packages/PIL/TiffImagePlugin.py:692: UserWarning: Possibly corrupt EXIF data.  Expecting to read 131072 bytes but only got 0. Skipping tag 3
  "Skipping tag %s" % (size, len(data), tag))
(more to come ...)

根据输出信息,我只知道它们在那里,但不知道它们是哪一个......

我的问题是:如何识别这258张图片,以便我可以手动将它们从数据集中删除?

2 个答案:

答案 0 :(得分:2)

即使这个问题已经存在一年多了,我也想表明我的解决方案,因为我遇到了同样的问题。

我正在编辑错误消息。输出显示在系统上的哪里找到文件以及行号。 例如,我更改了以下内容:

if len(data) != size:
    warnings.warn("Possibly corrupt EXIF data.  "
                  "Expecting to read %d bytes but only got %d."
                  " Skipping tag %s" % (size, len(data), tag))
    continue

if len(data) != size:
    raise ValueError('Corrupt Exif data')
    warnings.warn("Possibly corrupt EXIF data.  "
                  "Expecting to read %d bytes but only got %d."
                  " Skipping tag %s" % (size, len(data), tag))
    continue

我捕获ValueError的代码如下所示。该代码为您提供了PIL中断且不会显示无用消息的优点。您也可以抓住并使用它,例如通过“除外”部分删除相应的文件。

import os
from PIL import Image

imageFolder = /Path/To/Image/Folder
listImages = os.listdir(imageFolder)

for img in listImages:
    imgPath = os.path.join(imageFolder,img)

    try:
        img = Image.open(imgPath)
        exif_data = img._getexif()
    except ValueError as err:
        print(err)
        print("Error on image: ", img)

我知道添加ValueError部分既快速又肮脏,但是比面对所有无用的警告消息要好。

答案 1 :(得分:0)

最简单的方法是修改代码以一次处理一个图像,然后迭代每个图像并检查哪个图像生成警告。