Reading csv zipped files in python

65,363

Solution 1

I used the zipfile module to import the ZIP directly to pandas dataframe. Let's say the file name is "intfile" and it's in .zip named "THEZIPFILE":

import pandas as pd
import zipfile

zf = zipfile.ZipFile('C:/Users/Desktop/THEZIPFILE.zip') 
df = pd.read_csv(zf.open('intfile.csv'))

Solution 2

If you aren't using Pandas it can be done entirely with the standard lib. Here is Python 3.7 code:

import csv
from io import TextIOWrapper
from zipfile import ZipFile

with ZipFile('yourfile.zip') as zf:
    with zf.open('your_csv_inside_zip.csv', 'r') as infile:
        reader = csv.reader(TextIOWrapper(infile, 'utf-8'))
        for row in reader:
            # process the CSV here
            print(row)

Solution 3

A quick solution can be using below code!

import pandas as pd

#pandas support zip file reads
df = pd.read_csv("/path/to/file.csv.zip")

Solution 4

zipfile also supports the with statement.

So adding onto yaron's answer of using pandas:

with zipfile.ZipFile('file.zip') as zip:
    with zip.open('file.csv') as myZip:
        df = pd.read_csv(myZip) 

Solution 5

Thought Yaron had the best answer but thought I would add a code that iterated through multiple files inside a zip folder. It will then append the results:

import os
import pandas as pd
import zipfile

curDir = os.getcwd()
zf = zipfile.ZipFile(curDir + '/targetfolder.zip')
text_files = zf.infolist()
list_ = []

print ("Uncompressing and reading data... ")

for text_file in text_files:
    print(text_file.filename)
    df = pd.read_csv(zf.open(text_file.filename)
    # do df manipulations
    list_.append(df)

df = pd.concat(list_)
Share:
65,363
Elyza Agosta
Author by

Elyza Agosta

Updated on August 17, 2021

Comments

  • Elyza Agosta
    Elyza Agosta over 2 years

    I'm trying to get data from a zipped csv file. Is there a way to do this without unzipping the whole files? If not, how can I unzip the files and read them efficiently?

  • PSNR
    PSNR over 6 years
    this is the most helpful (and concise) one on this topic. thank you!
  • Ken Ingram
    Ken Ingram almost 4 years
    I tried doing this not realizing that I needed io.TextIOWrapper. How could I have known?
  • Gian Arauz
    Gian Arauz about 3 years
    Outstanding answer! I check that using this same solution without the ".csv" extension also works: df = pd.read_csv("/path/to/file.zip")
  • Dimitri_Fu
    Dimitri_Fu almost 3 years
    @KenIngram ZipFile.open() give a zipfile.ZipExtFile object. The built-in function open() function returns a _io.TextIOWrapper object directly
  • Ken Ingram
    Ken Ingram almost 3 years
    Cool. Thanks for the info.