Python: How to read images from zip file in memory?

13,445

Solution 1

Turns out the problem was there was an extra empty element in namelist() due to the images being zipped inside a direcotory insde the zip file. Here is the full code that will check for that and iterate through the 100 images.

import zipfile
from StringIO import StringIO
from PIL import Image
import imghdr

imgzip = open('100-Test.zip')
zippedImgs = zipfile.ZipFile(imgzip)

for i in xrange(len(zippedImgs.namelist())):
    print "iter", i, " ",
    file_in_zip = zippedImgs.namelist()[i]
    if (".jpg" in file_in_zip or ".JPG" in file_in_zip):
        print "Found image: ", file_in_zip, " -- ",
        data = zippedImgs.read(file_in_zip)
        dataEnc = StringIO(data)
        img = Image.open(dataEnc)
        print img
    else:
        print ""

Thanks guys!

Solution 2

There is no need to use StringIO. zipfile can read image file in memory. The following loops through all images in your .zip file:

import zipfile
from PIL import Image

imgzip = zipfile.ZipFile("100-Test.zip")
inflist = imgzip.infolist()

for f in inflist:
    ifile = imgzip.open(f)
    img = Image.open(ifile)
    print(img)
    # display(img)

Solution 3

I have the same issue, thanks for @alfredox, I modified the answer, use io.BytesIO not StringIo in python3.

z = zipfile.ZipFile(zip_file)
for i in range(len(z.namelist())):

    file_in_zip = z.namelist()[i]
    if (".jpg" in file_in_zip or ".JPG" in file_in_zip):

        data = z.read(file_in_zip)
        dataEnc = io.BytesIO(data)
        img = Image.open(dataEnc)
        print(img)

Solution 4

If you need to work on pixel data then you can load an image stream data from zip file as numpy array keeping the original data shape (i.e. 32x32 RGB) following the steps:

  1. use zipfile to get the ZipExtFile format
  2. use PIL.Image to convert ZipExtFile into image like data structure
  3. convert PIL.image into numpy array

No need to reshape numpy array with original data shape because PIL.Image already has the information. So the output will be a numpy array with shape=(32,32,3)

import numpy as np
import zipfile
from PIL import Image

with zipfile.ZipFile(zip_data_path, "r") as zip_data:
    content_list = zip_data.namelist()
    for name_file in content_list:
        img_bytes = zip_data.open(name_file)          # 1
        img_data = Image.open(img_bytes)              # 2
        # ndarray with shape=(32,32,3)
        image_as_array = np.array(img_data, np.uint8) # 3
Share:
13,445
alfredox
Author by

alfredox

Hobbies: Soccer, Mountain Biking, Backpacking/Camping, Traveling Tech Interests: Robotics, Autonomous Driving, Artificial Intelligence, Distributed Computing, Machine Learning Languages: Python, C#, C++, SQL, Arduino, VHDL Experience: Windows application development, Linux admin, Caffe, Deep Learning, Spark, Computer Vision

Updated on June 17, 2022

Comments

  • alfredox
    alfredox almost 2 years

    I have seen variations of this question, but not in this exact context. What I have is a file called 100-Test.zip which contains 100 .jpg images. I want to open this file in memory and process each file doing PIL operations. The rest of the code is already written, I just want to concentrate on getting from the zip file to the first PIL image. This is what the code looks like now from suggestions I've gathered from reading other questions, but it's not working. Can you guys take a look and help?

    import zipfile
    from StringIO import StringIO
    from PIL import Image
    
    imgzip = open('100-Test.zip', 'rb')
    z = zipfile.ZipFile(imgzip)
    data = z.read(z.namelist()[0])
    dataEnc = StringIO(data)
    img = Image.open(dataEnc)
    
    print img
    

    But I am getting this error when I run it:

     IOError: cannot identify image file <StringIO.StringIO instance at
     0x7f606ecffab8>
    

    Alternatives: I have seen other sources saying to use this instead:

    image_file = StringIO(open("test.jpg",'rb').read())
    im = Image.open(image_file)
    

    But the problem is I'm not opening a file, it's already in memory inside the data variable. I also tried using dataEnc = StringIO.read(data) but got this error:

    TypeError: unbound method read() must be called with StringIO instance as 
    first argument (got str instance instead)
    
  • Ilan Tal
    Ilan Tal almost 3 years
    I just tried this idea on a large compressed TIFF file (1024*768*116). While it may well be good for small files, it is FAR too SLOW for large files. (>20 min vs 6 sec for uncompressed) Nice idea though...