Python load 2GB of text file to memory
16,960
If you use mmap, you'll be able to load the entire file into memory immediately.
import mmap
with open('dump.xml', 'rb') as f:
# Size 0 will read the ENTIRE file into memory!
m = mmap.mmap(f.fileno(), 0, prot=mmap.PROT_READ) #File is open read-only
# Proceed with your code here -- note the file is already in memory
# so "readine" here will be as fast as could be
data = m.readline()
while data:
# Do stuff
data = m.readline()
Comments
-
pckben almost 2 years
In Python 2.7, when I load all data from a text file of 2.5GB into memory for quicker processing like this:
>>> f = open('dump.xml','r') >>> dump = f.read()
I got the following error:
Python(62813) malloc: *** mmap(size=140521659486208) failed (error code=12) *** error: can't allocate region *** set a breakpoint in malloc_error_break to debug Traceback (most recent call last): File "<stdin>", line 1, in <module> MemoryError
Why did Python try to allocate
140521659486208
bytes memory for2563749237
bytes data? How do I fix the code to make it loads all the bytes?I'm having around 3GB RAM free. The file is a Wiktionary xml dump.
-
pckben almost 12 yearsmy code is only 2 lines as given for loading data into the memory, there's no other living object for garbage collection.
-
pckben almost 12 yearsI got
mmap.error: [Errno 13] Permission denied
for the line withm = mmap.mmap(..)
, how do i fix it? -
Thomas Orozco almost 12 years@pckben That's because the file is open in read-only mode and mmap will try to map read-write: add
prot=mmap.PROT_READ
in yourmmap.mmap
call, and you'll be fine. -
Alfe almost 12 yearsNice answer if you really have to read the contents of a file completely. In this case I don't think that this is the best solution for pckben's situation.
-
Alfe almost 12 yearsmmap is memory mapping of a file. Accessing the memory at the allocated place will access the file instead. Whether the OS buffers the whole file beforehand or only on access, is part of the configuration ;-)
-
Thomas Orozco almost 12 years@pckben Using
open('myfile', 'rb')
opens the file in read-only mode, butmmap
will try to map it read-write, which causes the error.