Reading csv zipped files in python
65,363
Solution 1
I used the zipfile
module to import the ZIP directly to pandas dataframe.
Let's say the file name is "intfile" and it's in .zip named "THEZIPFILE":
import pandas as pd
import zipfile
zf = zipfile.ZipFile('C:/Users/Desktop/THEZIPFILE.zip')
df = pd.read_csv(zf.open('intfile.csv'))
Solution 2
If you aren't using Pandas it can be done entirely with the standard lib. Here is Python 3.7 code:
import csv
from io import TextIOWrapper
from zipfile import ZipFile
with ZipFile('yourfile.zip') as zf:
with zf.open('your_csv_inside_zip.csv', 'r') as infile:
reader = csv.reader(TextIOWrapper(infile, 'utf-8'))
for row in reader:
# process the CSV here
print(row)
Solution 3
A quick solution can be using below code!
import pandas as pd
#pandas support zip file reads
df = pd.read_csv("/path/to/file.csv.zip")
Solution 4
zipfile also supports the with statement.
So adding onto yaron's answer of using pandas:
with zipfile.ZipFile('file.zip') as zip:
with zip.open('file.csv') as myZip:
df = pd.read_csv(myZip)
Solution 5
Thought Yaron had the best answer but thought I would add a code that iterated through multiple files inside a zip folder. It will then append the results:
import os
import pandas as pd
import zipfile
curDir = os.getcwd()
zf = zipfile.ZipFile(curDir + '/targetfolder.zip')
text_files = zf.infolist()
list_ = []
print ("Uncompressing and reading data... ")
for text_file in text_files:
print(text_file.filename)
df = pd.read_csv(zf.open(text_file.filename)
# do df manipulations
list_.append(df)
df = pd.concat(list_)
Author by
Elyza Agosta
Updated on August 17, 2021Comments
-
Elyza Agosta over 2 years
I'm trying to get data from a zipped csv file. Is there a way to do this without unzipping the whole files? If not, how can I unzip the files and read them efficiently?
-
PSNR over 6 yearsthis is the most helpful (and concise) one on this topic. thank you!
-
Ken Ingram almost 4 yearsI tried doing this not realizing that I needed io.TextIOWrapper. How could I have known?
-
Gian Arauz about 3 yearsOutstanding answer! I check that using this same solution without the ".csv" extension also works:
df = pd.read_csv("/path/to/file.zip")
-
Dimitri_Fu almost 3 years@KenIngram ZipFile.open() give a zipfile.ZipExtFile object. The built-in function open() function returns a _io.TextIOWrapper object directly
-
Ken Ingram almost 3 yearsCool. Thanks for the info.