How to prevent a process from being killed?

19,836

Solution 1

Why would it be killed off?

Because it's not automatic that something is killed. Once you answer that, and explain why something would be selected for destruction, you might be able to come up with a solution.

Given you're talking about Rails' rake command, I'm guessing that this is a process running on a server. That you're worried about it being killed suggests it's being killed by the server host for using too many resources. In cases like this, there aren't (nor should there be) ways of stopping your process for being killed.

If you have a resource-expensive task, buy more resources. Use your own server time. Or come to an arrangement with the host that allows you to run it on their dime.

Solution 2

You can not prevent root from killing a process. Or for that matter: you can not prevent the server from killing a process that eats up all your resources.

What you can do is fork the command so it restarts itself when killed.

Example using code:

Solution 3

Now, I understand this is an old question, but since both answers ignore the obvious - or at best scratch the surface -, I felt prompted to write up my own. Given the wording of the question the very first thing that popped into my mind was "the OOM killer!". One of the other answers even makes the claim "it's not automatic that something is killed" which is preposterous from the user perspective. What is the OOM killer if not an automatism?

The OOM killer is your biggest enemy for scenarios like the one described, as the linked article will show.

Now it depends what the exact scenario is (build machine, some server ...), but in general I do want my OS to use the resources of my machine to the extent possible. That's why I purchased those in the first place.

Your question, broken down:

Is there anyway to prevent a process from being killed no matter what?

No, fortunately not. For example the kernel will kill misbehaving processes (e.g. by sending SIGSEGV). This will also apply if your task misbehaves on account of running into resource limits (see limits.conf, getrlimit/setrlimit). That is, if something inside your rake task (which in all likelihood will use other processes to do some work) dereferences a null pointer, you're still out of luck and that part will fail, which subsequently may fail the task.

Root will also in all likelihood be able to send signals to your process. And even if you somehow managed to protect your process from anything related to userspace, root would still be able to load a kernel module and undermine those efforts from kernel (perhaps with the exception of an active kernel lockdown).

I know about nice but I'm not sure if giving a task such as a long running memory-intensive rake task the highest priority will prevent it from being killed off: [...]

It won't prevent it, but it will be used as one of several heuristics for the OOM killer. So yeah, actually the nice value will help ... a bit. The LWN article which I already linked above gives the following heuristics:

  • if the task has nice value above zero, its score doubles
  • superuser or direct hardware access tasks (CAP_SYS_ADMIN, CAP_SYS_RESOURCE or CAP_SYS_RAWIO) have their score divided by 4. This is cumulative, i.e., a super-user task with hardware access would have its score divided by 16.
  • if OOM condition happened in one cpuset and checked task does not belong to that set, its score is divided by 8.
  • the resulting score is multiplied by two to the power of oom_adj (i.e. points <<= oom_adj when it is positive and points >>= -(oom_adj) otherwise)

Aside from the nice value you can also go further by either running this as root (or with the given capabilities) or, if you are root, you could make sure your process won't be prone to being killed by the OOM killer by (the article has the full details) creating a cgroup:

  1. mount -t cgroup -o oom oom /mnt/oom-killer
  2. mkdir /mnt/oom-killer/invincibles
  3. echo 0 > /mnt/oom-killer/invincibles/oom.priority
  4. echo <pid> > /mnt/oom-killer/invincibles/tasks, where <pid> is the process ID of your rake task ...

So there you go. You can make certain groups of processes exempt from the OOM killer's wrath.

However, I am not sure this sledgehammer method is the first best thing to do. I think you should start by tinkering with oom_adj to see whether that helps your process survive the competition with other processes. Especially if this is a server, the overall service may be more important than a particular task which may not even be vital to the service. So use with caution. In addition you may want to monitor for memory hogs (sysstat and friends should help). If you do thta via a time series database and plot the graphs, you may even catch on to memory leaks.

If none of that works, you should head over to Brendan Gregg's website and start measuring different performance indicators; also see if you can grab one of his books. For example it's possible you do have something like a runaway situation in regards to memory allocations inside your rake task. Because you emphasize long running and memory-intensive but these are not necessarily connected. BPF and friends will allow you to gain insights you won't get otherwise.

Share:
19,836
Simpleton
Author by

Simpleton

Updated on September 18, 2022

Comments

  • Simpleton
    Simpleton over 1 year

    Is there anyway to prevent a process from being killed no matter what? I know about nice but I'm not sure if giving a task such as a long running memory-intensive rake task the highest priority will prevent it from being killed off:

    nice -n -20 rake xyz
    

    Edit: The original poster most likely wants it to be high priority even if the server is low on resources as well, so much so that other processes get killed first.

  • 0xC0000022L
    0xC0000022L about 4 years
    Uhm ... this answer is really ignoring a whole lot of nuances, especially the OOM killer which comes into play for processes - quote - _such as a long running memory-intensive rake task _. -1.