Python load 2GB of text file to memory

16,960

If you use mmap, you'll be able to load the entire file into memory immediately.

import mmap

with open('dump.xml', 'rb') as f:
  # Size 0 will read the ENTIRE file into memory!
  m = mmap.mmap(f.fileno(), 0, prot=mmap.PROT_READ) #File is open read-only

  # Proceed with your code here -- note the file is already in memory
  # so "readine" here will be as fast as could be
  data = m.readline()
  while data:
    # Do stuff
    data = m.readline()
Share:
16,960
pckben
Author by

pckben

Hello, World!

Updated on June 15, 2022

Comments

  • pckben
    pckben almost 2 years

    In Python 2.7, when I load all data from a text file of 2.5GB into memory for quicker processing like this:

    >>> f = open('dump.xml','r')
    >>> dump = f.read()
    

    I got the following error:

    Python(62813) malloc: *** mmap(size=140521659486208) failed (error code=12)
    *** error: can't allocate region
    *** set a breakpoint in malloc_error_break to debug
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    MemoryError
    

    Why did Python try to allocate 140521659486208 bytes memory for 2563749237 bytes data? How do I fix the code to make it loads all the bytes?

    I'm having around 3GB RAM free. The file is a Wiktionary xml dump.

  • pckben
    pckben almost 12 years
    my code is only 2 lines as given for loading data into the memory, there's no other living object for garbage collection.
  • pckben
    pckben almost 12 years
    I got mmap.error: [Errno 13] Permission denied for the line with m = mmap.mmap(..), how do i fix it?
  • Thomas Orozco
    Thomas Orozco almost 12 years
    @pckben That's because the file is open in read-only mode and mmap will try to map read-write: add prot=mmap.PROT_READ in your mmap.mmap call, and you'll be fine.
  • Alfe
    Alfe almost 12 years
    Nice answer if you really have to read the contents of a file completely. In this case I don't think that this is the best solution for pckben's situation.
  • Alfe
    Alfe almost 12 years
    mmap is memory mapping of a file. Accessing the memory at the allocated place will access the file instead. Whether the OS buffers the whole file beforehand or only on access, is part of the configuration ;-)
  • Thomas Orozco
    Thomas Orozco almost 12 years
    @pckben Using open('myfile', 'rb') opens the file in read-only mode, but mmap will try to map it read-write, which causes the error.