Python: How to read images from zip file in memory?
Solution 1
Turns out the problem was there was an extra empty element in namelist() due to the images being zipped inside a direcotory insde the zip file. Here is the full code that will check for that and iterate through the 100 images.
import zipfile
from StringIO import StringIO
from PIL import Image
import imghdr
imgzip = open('100-Test.zip')
zippedImgs = zipfile.ZipFile(imgzip)
for i in xrange(len(zippedImgs.namelist())):
print "iter", i, " ",
file_in_zip = zippedImgs.namelist()[i]
if (".jpg" in file_in_zip or ".JPG" in file_in_zip):
print "Found image: ", file_in_zip, " -- ",
data = zippedImgs.read(file_in_zip)
dataEnc = StringIO(data)
img = Image.open(dataEnc)
print img
else:
print ""
Thanks guys!
Solution 2
There is no need to use StringIO. zipfile
can read image file in memory. The following loops through all images in your .zip file:
import zipfile
from PIL import Image
imgzip = zipfile.ZipFile("100-Test.zip")
inflist = imgzip.infolist()
for f in inflist:
ifile = imgzip.open(f)
img = Image.open(ifile)
print(img)
# display(img)
Solution 3
I have the same issue, thanks for @alfredox, I modified the answer, use io.BytesIO not StringIo in python3.
z = zipfile.ZipFile(zip_file)
for i in range(len(z.namelist())):
file_in_zip = z.namelist()[i]
if (".jpg" in file_in_zip or ".JPG" in file_in_zip):
data = z.read(file_in_zip)
dataEnc = io.BytesIO(data)
img = Image.open(dataEnc)
print(img)
Solution 4
If you need to work on pixel data then you can load an image stream data from zip file as numpy array keeping the original data shape (i.e. 32x32 RGB) following the steps:
- use zipfile to get the ZipExtFile format
- use PIL.Image to convert ZipExtFile into image like data structure
- convert PIL.image into numpy array
No need to reshape numpy array with original data shape because PIL.Image already has the information. So the output will be a numpy array with shape=(32,32,3)
import numpy as np
import zipfile
from PIL import Image
with zipfile.ZipFile(zip_data_path, "r") as zip_data:
content_list = zip_data.namelist()
for name_file in content_list:
img_bytes = zip_data.open(name_file) # 1
img_data = Image.open(img_bytes) # 2
# ndarray with shape=(32,32,3)
image_as_array = np.array(img_data, np.uint8) # 3
alfredox
Hobbies: Soccer, Mountain Biking, Backpacking/Camping, Traveling Tech Interests: Robotics, Autonomous Driving, Artificial Intelligence, Distributed Computing, Machine Learning Languages: Python, C#, C++, SQL, Arduino, VHDL Experience: Windows application development, Linux admin, Caffe, Deep Learning, Spark, Computer Vision
Updated on June 17, 2022Comments
-
alfredox almost 2 years
I have seen variations of this question, but not in this exact context. What I have is a file called 100-Test.zip which contains 100 .jpg images. I want to open this file in memory and process each file doing PIL operations. The rest of the code is already written, I just want to concentrate on getting from the zip file to the first PIL image. This is what the code looks like now from suggestions I've gathered from reading other questions, but it's not working. Can you guys take a look and help?
import zipfile from StringIO import StringIO from PIL import Image imgzip = open('100-Test.zip', 'rb') z = zipfile.ZipFile(imgzip) data = z.read(z.namelist()[0]) dataEnc = StringIO(data) img = Image.open(dataEnc) print img
But I am getting this error when I run it:
IOError: cannot identify image file <StringIO.StringIO instance at 0x7f606ecffab8>
Alternatives: I have seen other sources saying to use this instead:
image_file = StringIO(open("test.jpg",'rb').read()) im = Image.open(image_file)
But the problem is I'm not opening a file, it's already in memory inside the data variable. I also tried using
dataEnc = StringIO.read(data)
but got this error:TypeError: unbound method read() must be called with StringIO instance as first argument (got str instance instead)
-
Ilan Tal almost 3 yearsI just tried this idea on a large compressed TIFF file (1024*768*116). While it may well be good for small files, it is FAR too SLOW for large files. (>20 min vs 6 sec for uncompressed) Nice idea though...