Reading large text files with Pandas
15,926
A solution for a similar question was given here some time after the posting of this question. Basically, it suggests to read the file in chunks
by doing the following:
chunksize = 10 ** 6 # number of rows per chunk
for chunk in pd.read_csv(filename, chunksize=chunksize):
process(chunk)
You should specify the chunksize
parameter accordingly to your machine's capabilities (that is, make sure it can process the chunk).
Author by
marillion
Updated on June 12, 2022Comments
-
marillion almost 2 years
I have been trying to read a few large text files (sizes around 1.4GB - 2GB) with Pandas, using the
read_csv
function, with no avail. Below are the versions I am using:- Python 2.7.6
- Anaconda 1.9.2 (64-bit) (default, Nov 11 2013, 10:49:15) [MSC v.1500 64 bit (AMD64)]
- IPython 1.1.0
- Pandas 0.13.1
I tried the following:
df = pd.read_csv(data.txt')
and it crashed Ipython with a message:
Kernel died, restarting
.Then I tried using an iterator:
tp = pd.read_csv('data.txt', iterator = True, chunksize=1000)
again, I got the
Kernel died, restarting
error.Any ideas? Or any other way to read big text files?
Thank you!