Python subprocess.Popen "OSError: [Errno 12] Cannot allocate memory"
Solution 1
As a general rule (i.e. in vanilla kernels), fork
/clone
failures with ENOMEM
occur specifically because of either an honest to God out-of-memory condition (dup_mm
, dup_task_struct
, alloc_pid
, mpol_dup
, mm_init
etc. croak), or because security_vm_enough_memory_mm
failed you while enforcing the overcommit policy.
Start by checking the vmsize of the process that failed to fork, at the time of the fork attempt, and then compare to the amount of free memory (physical and swap) as it relates to the overcommit policy (plug the numbers in.)
In your particular case, note that Virtuozzo has additional checks in overcommit enforcement. Moreover, I'm not sure how much control you truly have, from within your container, over swap and overcommit configuration (in order to influence the outcome of the enforcement.)
Now, in order to actually move forward I'd say you're left with two options:
- switch to a larger instance, or
- put some coding effort into more effectively controlling your script's memory footprint
NOTE that the coding effort may be all for naught if it turns out that it's not you, but some other guy collocated in a different instance on the same server as you running amock.
Memory-wise, we already know that subprocess.Popen
uses fork
/clone
under the hood, meaning that every time you call it you're requesting once more as much memory as Python is already eating up, i.e. in the hundreds of additional MB, all in order to then exec
a puny 10kB executable such as free
or ps
. In the case of an unfavourable overcommit policy, you'll soon see ENOMEM
.
Alternatives to fork
that do not have this parent page tables etc. copy problem are vfork
and posix_spawn
. But if you do not feel like rewriting chunks of subprocess.Popen
in terms of vfork
/posix_spawn
, consider using suprocess.Popen
only once, at the beginning of your script (when Python's memory footprint is minimal), to spawn a shell script that then runs free
/ps
/sleep
and whatever else in a loop parallel to your script; poll the script's output or read it synchronously, possibly from a separate thread if you have other stuff to take care of asynchronously -- do your data crunching in Python but leave the forking to the subordinate process.
HOWEVER, in your particular case you can skip invoking ps
and free
altogether; that information is readily available to you in Python directly from procfs
, whether you choose to access it yourself or via existing libraries and/or packages. If ps
and free
were the only utilities you were running, then you can do away with subprocess.Popen
completely.
Finally, whatever you do as far as subprocess.Popen
is concerned, if your script leaks memory you will still hit the wall eventually. Keep an eye on it, and check for memory leaks.
Solution 2
Looking at the output of free -m
it seems to me that you actually do not have swap memory available. I am not sure if in Linux the swap always will be available automatically on demand, but I was having the same problem and none of the answers here really helped me. Adding some swap memory however, fixed the problem in my case so since this might help other people facing the same problem, I post my answer on how to add a 1GB swap (on Ubuntu 12.04 but it should work similarly for other distributions.)
You can first check if there is any swap memory enabled.
$sudo swapon -s
if it is empty, it means you don't have any swap enabled. To add a 1GB swap:
$sudo dd if=/dev/zero of=/swapfile bs=1024 count=1024k
$sudo mkswap /swapfile
$sudo swapon /swapfile
Add the following line to the fstab
to make the swap permanent.
$sudo vim /etc/fstab
/swapfile none swap sw 0 0
Source and more information can be found here.
Solution 3
For an easy fix, you could
echo 1 > /proc/sys/vm/overcommit_memory
if you're sure that your system has enough memory. See Linux over commit heuristic.
Solution 4
swap may not be the red herring previously suggested. How big is the python process in question just before the ENOMEM
?
Under kernel 2.6, /proc/sys/vm/swappiness
controls how aggressively the kernel will turn to swap, and overcommit*
files how much and how precisely the kernel may apportion memory with a wink and a nod. Like your facebook relationship status, it's complicated.
...but swap is actually available on demand (according to the web host)...
but not according to the output of your free(1)
command, which shows no swap space recognized by your server instance. Now, your web host may certainly know much more than I about this topic, but virtual RHEL/CentOS systems I've used have reported swap available to the guest OS.
Adapting Red Hat KB Article 15252:
A Red Hat Enterprise Linux 5 system will run just fine with no swap space at all as long as the sum of anonymous memory and system V shared memory is less than about 3/4 the amount of RAM. .... Systems with 4GB of ram or less [are recommended to have] a minimum of 2GB of swap space.
Compare your /proc/sys/vm
settings to a plain CentOS 5.3 installation. Add a swap file. Ratchet down swappiness
and see if you live any longer.
Solution 5
I continue to suspect that your customer/user has some kernel module or driver loaded which
is interfering with the clone()
system call (perhaps some obscure security enhancement,
something like LIDS but more obscure?) or is somehow filling up some of the kernel data
structures that are necessary for fork()
/clone()
to operate (process table, page
tables, file descriptor tables, etc).
Here's the relevant portion of the fork(2)
man page:
ERRORS EAGAIN fork() cannot allocate sufficient memory to copy the parent's page tables and allocate a task structure for the child. EAGAIN It was not possible to create a new process because the caller's RLIMIT_NPROC resource limit was encountered. To exceed this limit, the process must have either the CAP_SYS_ADMIN or the CAP_SYS_RESOURCE capability. ENOMEM fork() failed to allocate the necessary kernel structures because memory is tight.
I suggest having the user try this after booting into a stock, generic kernel and with only a minimal set of modules and drivers loaded (minimum necessary to run your application/script). From there, assuming it works in that configuration, they can perform a binary search between that and the configuration which exhibits the issue. This is standard sysadmin troubleshooting 101.
The relevant line in your strace
is:
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb7f12708) = -1 ENOMEM (Cannot allocate memory)
... I know others have talked about swap and memory availability (and I would recommend that you set up at least a small swap partition, ironically even if it's on a RAM disk ... the code paths through the Linux kernel when it has even a tiny bit of swap available have been exercised far more extensively than those (exception handling paths) in which there is zero swap available.
However I suspect that this is still a red herring.
The fact that free
is reporting 0 (ZERO) memory in use by the cache and buffers is very disturbing. I suspect that the free
output ... and possibly your application issue here, are caused by some proprietary kernel module which is interfering with the memory allocation in some way.
According to the man pages for fork()/clone() the fork() system call should return EAGAIN if your call would cause a resource limit violation (RLIMIT_NPROC) ... however, it doesn't say if EAGAIN is to be returned by other RLIMIT* violations. In any event if your target/host has some sort of weird Vormetric or other security settings (or even if your process is running under some weird SELinux policy) then it might be causing this -ENOMEM failure.
It's pretty unlikely to be a normal run-of-the-mill Linux/UNIX issue. You've got something non-standard going on there.
davidmytton
Co-founder of Console (the best tools for developers). Researching sustainable computing at Imperial College London & Uptime Institute. Previously Co-founder, Server Density (acquired by StackPath).
Updated on August 23, 2022Comments
-
davidmytton over 1 year
Note: This question was originally asked here but the bounty time expired even though an acceptable answer was not actually found. I am re-asking this question including all details provided in the original question.
A python script is running a set of class functions every 60 seconds using the sched module:
# sc is a sched.scheduler instance sc.enter(60, 1, self.doChecks, (sc, False))
The script is running as a daemonised process using the code here.
A number of class methods that are called as part of doChecks use the subprocess module to call system functions in order to get system statistics:
ps = subprocess.Popen(['ps', 'aux'], stdout=subprocess.PIPE).communicate()[0]
This runs fine for a period of time before the entire script crashing with the following error:
File "/home/admin/sd-agent/checks.py", line 436, in getProcesses File "/usr/lib/python2.4/subprocess.py", line 533, in __init__ File "/usr/lib/python2.4/subprocess.py", line 835, in _get_handles OSError: [Errno 12] Cannot allocate memory
The output of free -m on the server once the script has crashed is:
$ free -m total used free shared buffers cached Mem: 894 345 549 0 0 0 -/+ buffers/cache: 345 549 Swap: 0 0 0
The server is running CentOS 5.3. I am unable to reproduce on my own CentOS boxes nor with any other user reporting the same problem.
I have tried a number of things to debug this as suggested in the original question:
Logging the output of free -m before and after the Popen call. There is no significant change in memory usage i.e. memory is not gradually being used up as the script runs.
I added close_fds=True to the Popen call but this made no difference - the script still crashed with the same error. Suggested here and here.
I checked the rlimits which showed (-1, -1) on both RLIMIT_DATA and RLIMIT_AS as suggested here.
An article suggested the having no swap space might be the cause but swap is actually available on demand (according to the web host) and this was also suggested as a bogus cause here.
The processes are being closed because that is the behaviour of using .communicate() as backed up by the Python source code and comments here.
The entire checks can be found at on GitHub here with the getProcesses function defined from line 442. This is called by doChecks() starting at line 520.
The script was run with strace with the following output before the crash:
recv(4, "Total Accesses: 516662\nTotal kBy"..., 234, 0) = 234 gettimeofday({1250893252, 887805}, NULL) = 0 write(3, "2009-08-21 17:20:52,887 - checks"..., 91) = 91 gettimeofday({1250893252, 888362}, NULL) = 0 write(3, "2009-08-21 17:20:52,888 - checks"..., 74) = 74 gettimeofday({1250893252, 888897}, NULL) = 0 write(3, "2009-08-21 17:20:52,888 - checks"..., 67) = 67 gettimeofday({1250893252, 889184}, NULL) = 0 write(3, "2009-08-21 17:20:52,889 - checks"..., 81) = 81 close(4) = 0 gettimeofday({1250893252, 889591}, NULL) = 0 write(3, "2009-08-21 17:20:52,889 - checks"..., 63) = 63 pipe([4, 5]) = 0 pipe([6, 7]) = 0 fcntl64(7, F_GETFD) = 0 fcntl64(7, F_SETFD, FD_CLOEXEC) = 0 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb7f12708) = -1 ENOMEM (Cannot allocate memory) write(2, "Traceback (most recent call last"..., 35) = 35 open("/usr/bin/sd-agent/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory) open("/usr/bin/sd-agent/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory) open("/usr/lib/python24.zip/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory) open("/usr/lib/python2.4/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory) open("/usr/lib/python2.4/plat-linux2/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory) open("/usr/lib/python2.4/lib-tk/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory) open("/usr/lib/python2.4/lib-dynload/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory) open("/usr/lib/python2.4/site-packages/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory) write(2, " File \"/usr/bin/sd-agent/agent."..., 52) = 52 open("/home/admin/sd-agent/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory) open("/usr/bin/sd-agent/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory) open("/usr/lib/python24.zip/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory) open("/usr/lib/python2.4/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory) open("/usr/lib/python2.4/plat-linux2/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory) open("/usr/lib/python2.4/lib-tk/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory) open("/usr/lib/python2.4/lib-dynload/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory) open("/usr/lib/python2.4/site-packages/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory) write(2, " File \"/home/admin/sd-agent/dae"..., 60) = 60 open("/usr/bin/sd-agent/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory) open("/usr/bin/sd-agent/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory) open("/usr/lib/python24.zip/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory) open("/usr/lib/python2.4/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory) open("/usr/lib/python2.4/plat-linux2/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory) open("/usr/lib/python2.4/lib-tk/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory) open("/usr/lib/python2.4/lib-dynload/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory) open("/usr/lib/python2.4/site-packages/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory) write(2, " File \"/usr/bin/sd-agent/agent."..., 54) = 54 open("/usr/lib/python2.4/sched.py", O_RDONLY|O_LARGEFILE) = 8 write(2, " File \"/usr/lib/python2.4/sched"..., 55) = 55 fstat64(8, {st_mode=S_IFREG|0644, st_size=4054, ...}) = 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7d28000 read(8, "\"\"\"A generally useful event sche"..., 4096) = 4054 write(2, " ", 4) = 4 write(2, "void = action(*argument)\n", 25) = 25 close(8) = 0 munmap(0xb7d28000, 4096) = 0 open("/usr/bin/sd-agent/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory) open("/usr/bin/sd-agent/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory) open("/usr/lib/python24.zip/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory) open("/usr/lib/python2.4/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory) open("/usr/lib/python2.4/plat-linux2/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory) open("/usr/lib/python2.4/lib-tk/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory) open("/usr/lib/python2.4/lib-dynload/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory) open("/usr/lib/python2.4/site-packages/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory) write(2, " File \"/usr/bin/sd-agent/checks"..., 60) = 60 open("/usr/bin/sd-agent/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory) open("/usr/bin/sd-agent/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory) open("/usr/lib/python24.zip/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory) open("/usr/lib/python2.4/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory) open("/usr/lib/python2.4/plat-linux2/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory) open("/usr/lib/python2.4/lib-tk/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory) open("/usr/lib/python2.4/lib-dynload/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory) open("/usr/lib/python2.4/site-packages/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory) write(2, " File \"/usr/bin/sd-agent/checks"..., 64) = 64 open("/usr/lib/python2.4/subprocess.py", O_RDONLY|O_LARGEFILE) = 8 write(2, " File \"/usr/lib/python2.4/subpr"..., 65) = 65 fstat64(8, {st_mode=S_IFREG|0644, st_size=39931, ...}) = 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7d28000 read(8, "# subprocess - Subprocesses with"..., 4096) = 4096 read(8, "lso, the newlines attribute of t"..., 4096) = 4096 read(8, "code < 0:\n print >>sys.st"..., 4096) = 4096 read(8, "alse does not exist on 2.2.0\ntry"..., 4096) = 4096 read(8, " p2cread\n # c2pread <-"..., 4096) = 4096 write(2, " ", 4) = 4 write(2, "errread, errwrite)\n", 19) = 19 close(8) = 0 munmap(0xb7d28000, 4096) = 0 open("/usr/lib/python2.4/subprocess.py", O_RDONLY|O_LARGEFILE) = 8 write(2, " File \"/usr/lib/python2.4/subpr"..., 71) = 71 fstat64(8, {st_mode=S_IFREG|0644, st_size=39931, ...}) = 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7d28000 read(8, "# subprocess - Subprocesses with"..., 4096) = 4096 read(8, "lso, the newlines attribute of t"..., 4096) = 4096 read(8, "code < 0:\n print >>sys.st"..., 4096) = 4096 read(8, "alse does not exist on 2.2.0\ntry"..., 4096) = 4096 read(8, " p2cread\n # c2pread <-"..., 4096) = 4096 read(8, "table(self, handle):\n "..., 4096) = 4096 read(8, "rrno using _sys_errlist (or siml"..., 4096) = 4096 read(8, " p2cwrite = None, None\n "..., 4096) = 4096 write(2, " ", 4) = 4 write(2, "self.pid = os.fork()\n", 21) = 21 close(8) = 0 munmap(0xb7d28000, 4096) = 0 write(2, "OSError", 7) = 7 write(2, ": ", 2) = 2 write(2, "[Errno 12] Cannot allocate memor"..., 33) = 33 write(2, "\n", 1) = 1 unlink("/var/run/sd-agent.pid") = 0 close(3) = 0 munmap(0xb7e0d000, 4096) = 0 rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x589978}, {0xb89a60, [], SA_RESTORER, 0x589978}, 8) = 0 brk(0xa022000) = 0xa022000 exit_group(1) = ?
-
pilcrow over 14 yearsYes, but we see from the OP's strace that the first syscall failure -- from clone() -- is ENOMEM as is reported. This error is preserved along python's low-memory stumble through traceback construction, even though the C-library
errno
is reset many times along the way. -
davidmytton over 14 yearsWhat's the best way to check the size of the python process? ps?
-
pilcrow over 14 yearssomething like
ps -o user,pid,vsz="Mem(Kb)" -o cmd $PYTHON_PID
, or top(1), should do it. -
davidmytton over 14 yearsThe server is running on a Media Template (dv) base which uses Virtuozzo for virtualisation.
-
Jim Dennis over 11 yearsTry searching the Virtuozzo message boards and bug tracking system and, perhaps, looking for upgrades to the Virtuozzo subsystem itself.
-
letmaik almost 10 yearsI found that running
gc.collect()
just beforesubprocess.Popen
helps in the cases when the garbage collector hadn't run for a while. -
Seán Hayes over 9 yearsI wrote a deamon to handle the helper script strategy: github.com/SeanHayes/errand-boy I'm using it in production with one of my clients and our "Cannot allocate memory" troubles are gone.
-
Dima Tisnek almost 9 yearsDid that fix the same problem or some other?
-
Dima Tisnek almost 9 yearsI'd appreciate a simple diagnostic, e.g. following
/proc/fd/maps
to determine if overcommitted memory is in fact the issue -
Ruslan Abuzant about 8 yearsThis did it for me at CentOS 6.4. Faced error while installing awstats, thanks.
-
philmaweb almost 8 yearsAlthough this let me execute the Code, it didn't really address the problem, which probably lies in a library I use.
-
sscirrus over 6 yearsYou fixed my problem. Thanks! +1
-
igolkotek about 5 yearsThank you so much! Such an easy solution, you saved my day )