Python subprocess.Popen erroring with OSError: [Errno 12] Cannot allocate memory after period of time

24,113

Solution 1

when you use popen you need to hand in close_fds=True if you want it to close extra file descriptors.

creating a new pipe, which occurs in the _get_handles function from the back trace, creates 2 file descriptors, but your current code never closes them and your eventually hitting your systems max fd limit.

Not sure why the error you're getting indicates an out of memory condition: it should be a file descriptor error as the return value of pipe() has an error code for this problem.

Solution 2

You've perhaps got a memory leak bounded by some resource limit (RLIMIT_DATA, RLIMIT_AS?) inherited by your python script. Check your *ulimit(1)*s before you run your script, and profile the script's memory usage, as others have suggested.

What do you do with the variable ps after the code snippet you show us? Do you keep a reference to it, never to be freed? Quoting the subprocess module docs:

Note: The data read is buffered in memory, so do not use this method if the data size is large or unlimited.

... and ps aux can be verbose on a busy system...

Update

You can check rlimits from with your python script using the resource module:

import resource
print resource.getrlimit(resource.RLIMIT_DATA) # => (soft_lim, hard_lim)
print resource.getrlimit(resource.RLIMIT_AS)

If these return "unlimited" -- (-1, -1) -- then my hypothesis is incorrect and you may move on!

See also resource.getrusage, esp. the ru_??rss fields, which can help you to instrument for memory consumption from with the python script, without shelling out to an external program.

Solution 3

If you're running a background process, chances are that you've redirected your processes stdin/stdout/stderr.

In that case, append the option "close_fds=True" to your Popen call, which will prevent the child process from inheriting your redirected output. This may be the limit you're bumping into.

Solution 4

Have you watched your process over time?

  • lsof
  • ps -aux | grep -i pname
  • top

All should give interesting information. I am thinking that the process is tying up resources that should be freed up. Is there a chance that it is tying up resource handles (memory blocks, streams, file handles, thread or process handles)? stdin, stdout, stderr from the spawned "ps". Memory handles, ... from many small incremental allocations. I would be very interested in seeing what the above commands display for your process when it has just finished launching and running for the first time and after 24 hours of "sitting" there launching the sub-process regularly.

Since it dies after a few days, you could have it run for only a few loops, and then restart it once a day as a workaround. That would help you in the meantime.

Jacob

Solution 5

You might want to actually wait for all of those PS processes to finish before adding swap space.

It's not at all clear what "running as a background process executing every 60 seconds" means.

But your call to subprocess.Popen is forking a new process each time.

Update.

I'd guess that you're somehow leaving all those processes running or hung in a zombie state. However, the communicate method should clean up the spawned subprocesses.

Share:
24,113
davidmytton
Author by

davidmytton

Co-founder of Console (the best tools for developers). Researching sustainable computing at Imperial College London & Uptime Institute. Previously Co-founder, Server Density (acquired by StackPath).

Updated on August 02, 2022

Comments

  • davidmytton
    davidmytton almost 2 years

    Note: This question has been re-asked with a summary of all debugging attempts here.


    I have a Python script that is running as a background process executing every 60 seconds. Part of that is a call to subprocess.Popen to get the output of ps.

    ps = subprocess.Popen(['ps', 'aux'], stdout=subprocess.PIPE).communicate()[0]
    

    After running for a few days, the call is erroring with:

    File "/home/admin/sd-agent/checks.py", line 436, in getProcesses
    File "/usr/lib/python2.4/subprocess.py", line 533, in __init__
    File "/usr/lib/python2.4/subprocess.py", line 835, in _get_handles
    OSError: [Errno 12] Cannot allocate memory
    

    However the output of free on the server is:

    $ free -m
                      total       used       free     shared     buffers    cached
    Mem:                894        345        549          0          0          0
    -/+ buffers/cache:  345        549
    Swap:                 0          0          0
    

    I have searched around for the problem and found this article which says:

    Solution is to add more swap space to your server. When the kernel is forking to start the modeler or discovery process, it first ensures there's enough space available on the swap store the new process if needed.

    I note that there is no available swap from the free output above. Is this likely to be the problem and/or what other solutions might there be?

    Update 13th Aug 09 The code above is called every 60 seconds as part of a series of monitoring functions. The process is daemonized and the check is scheduled using sched. The specific code for the above function is:

    def getProcesses(self):
        self.checksLogger.debug('getProcesses: start')
    
        # Memory logging (case 27152)
        if self.agentConfig['debugMode'] and sys.platform == 'linux2':
            mem = subprocess.Popen(['free', '-m'], stdout=subprocess.PIPE).communicate()[0]
            self.checksLogger.debug('getProcesses: memory before Popen - ' + str(mem))
    
        # Get output from ps
        try:
            self.checksLogger.debug('getProcesses: attempting Popen')
    
            ps = subprocess.Popen(['ps', 'aux'], stdout=subprocess.PIPE).communicate()[0]
    
        except Exception, e:
            import traceback
            self.checksLogger.error('getProcesses: exception = ' + traceback.format_exc())
            return False
    
        self.checksLogger.debug('getProcesses: Popen success, parsing')
    
        # Memory logging (case 27152)
        if self.agentConfig['debugMode'] and sys.platform == 'linux2':
            mem = subprocess.Popen(['free', '-m'], stdout=subprocess.PIPE).communicate()[0]
            self.checksLogger.debug('getProcesses: memory after Popen - ' + str(mem))
    
        # Split out each process
        processLines = ps.split('\n')
    
        del processLines[0] # Removes the headers
        processLines.pop() # Removes a trailing empty line
    
        processes = []
    
        self.checksLogger.debug('getProcesses: Popen success, parsing, looping')
    
        for line in processLines:
            line = line.split(None, 10)
            processes.append(line)
    
        self.checksLogger.debug('getProcesses: completed, returning')
    
        return processes
    

    This is part of a bigger class called checks which is initialised once when the daemon is started.

    The entire checks class can be found at http://github.com/dmytton/sd-agent/blob/82f5ff9203e54d2adeee8cfed704d09e3f00e8eb/checks.py with the getProcesses function defined from line 442. This is called by doChecks() starting at line 520.

  • davidmytton
    davidmytton almost 15 years
    "running as a background process executing every 60 seconds" means the code gets called every 60 seconds as part of a continually running process. If I don't call communicate() then I can't actually get the output of ps.
  • Vinay Sajip
    Vinay Sajip almost 15 years
    communicate() waits for the spawned process to terminate and spins up threads which read its stdout and stderr streams.
  • user1066101
    user1066101 almost 15 years
    @DavidM: "the code"? "gets called"? Which code? The subprocess.Popen? It forks a new process every 60 seconds? Is that what you're saying? And it never waits for a single child to finish?
  • user1066101
    user1066101 almost 15 years
    @Vinay Sajip: While Communicate allegedly waits for the subprocess, I'm not easily convinced that it's the same as the proper wait method. The application sounds like it's overrunning the system with subprocesses.
  • davidmytton
    davidmytton almost 15 years
    Yes, "the code" as in the only line of code that is in my original question. My understanding is that once ps returns, having called communicate(), the subprocess closes. Perhaps you could provide a few lines of sample code showing how you would implement this then?
  • Vinay Sajip
    Vinay Sajip almost 15 years
    @S. Lott: I checked the source code from Python 2.4.6 on Ubuntu - communicate does call self.wait(). Is that not the proper wait method?
  • user1066101
    user1066101 almost 15 years
    @Vinay Sajip: I had doubts, but you have proof. Communicate does call wait. I have no idea what's wrong. I'll delete this answer.
  • davidmytton
    davidmytton almost 15 years
    I can add the shell=True in. What does that do exactly? The documentation says "If shell is True, the specified command will be executed through the shell." but that doesn't really explain what the difference is.
  • Vinay Sajip
    Vinay Sajip almost 15 years
    When you specify shell=True, the shell program (e.g. bash on Linux, cmd.exe on Windows) is spawned which in turn runs the actual program you want to spawn. This is not suggested as a route to lower memory usage - but rather as an additional diagnostic tool to see how the behaviour changes. I would expect to see more useful input from logging memory conditions on every spawn and seeing how failed calls and successful calls correlate with the state of memory, swap etc.
  • davidmytton
    davidmytton over 14 years
    Do you have any suggestions for how to log memory usage as the script runs? I have found code.activestate.com/recipes/286222 which seems to do the job.
  • Vinay Sajip
    Vinay Sajip over 14 years
    It's not about how much memory the Python process is using - it's about logging what free -m returns for all spawns of ps. You can use subprocess to spawn free -m and log the results to a file.
  • davidmytton
    davidmytton over 14 years
    I put in a call to mem = subprocess.Popen(['free', '-m'], stdout=subprocess.PIPE).communicate()[0] and logging the output before and after each Popen call and the memory usage seems to remain fairly constant i.e. the memory does not slowly get depleted. It's always around 894/344/549 (total/used/free). Swap always remains 0 but aparrantly this is expected and there is actually swap available, it's just not shown in the free output.
  • davidmytton
    davidmytton over 14 years
    I have updated the question to include some more detail about the function call that ultimately calls the Popen. Nothing specific is done to the ps variable after the code snippet - the function returns with the processed result.
  • pilcrow
    pilcrow over 14 years
    @DavidM, thanks for the update. That pushes my question out one layer -- what then happens to processes, is it ever destroyed, etc.? I'll presently update with a more pythonic way to check resource limits...
  • davidmytton
    davidmytton over 14 years
    The rlimits showed (-1, -1) on both RLIMIT_DATA and RLIMIT_AS. processes is returned and then used to send that data back to a monitoring system. It is not destroyed. I've updated the Q with some more info about the entire daemon.
  • davidmytton
    davidmytton over 14 years
    The daemon is running with strace attached now. Will comment back when it next crashes (takes a few days).
  • Marbal
    Marbal over 14 years
    Popen.communicate() calls Popen.wait() which calls os.waitpid() for you. There is no need to call os.waitpid() manually.
  • Vinay Sajip
    Vinay Sajip over 14 years
    I think that's only to close extra descriptors while the subprocess is running. When the subprocess exits, it will close all its descriptors anyway, won't it?
  • pilcrow
    pilcrow over 14 years
    @Vinay Sajip, yes, this answer seems off-base. "close_fds" has to do with the subprocesses' inherited fds (like Perl's $^F), and the subprocess module/communicate() takes care of closing the pipe between parent and child intelligently. It also seems improbable that your ENOMEM is actually ENFILE/EMFILE in disguise.
  • Mark
    Mark over 14 years
    looked deeper into the code and the pipe FD's are closed correctly. When the fork occurs with close_fds=False all the FD's from the parent process are copied into the child, in this case all the FD's of the python process, as this code is part of some larger script there could be lots open. According to POSIX these should be closed when child process exits but its rather common for something to cause this to not occur (quick google search for fd leak will provide references). I still think fd's are the problem. Could OP confirm if this solved the issue?
  • davidmytton
    davidmytton over 14 years
    This did not solve the problem. I reposted the question at stackoverflow.com/questions/1367373/…