Faster forking of large processes on Linux?

20,180

Solution 1

Outcome: I was going to go down the early-spawned helper subprocess route as suggested by other answers here, but then I came across this re using huge page support to improve fork performance.

Having tried it myself using libhugetlbfs to simply make all my app's mallocs allocate huge pages, I'm now getting around 2400 forks/s regardless of the process size (over the range I'm interested in anyway). Amazing.

Solution 2

On Linux, you can use posix_spawn(2) with the POSIX_SPAWN_USEVFORK flag to avoid the overhead of copying page tables when forking from a large process.

See Minimizing Memory Usage for Creating Application Subprocesses for a good summary of posix_spawn(2), its advantages and some examples.

To take advantage of vfork(2), make sure you #define _GNU_SOURCE before #include <spawn.h> and then simply posix_spawnattr_setflags(&attr, POSIX_SPAWN_USEVFORK)

I can confirm that this works on Debian Lenny, and provides a massive speed-up when forking from a large process.

benchmarking the various spawns over 1000 runs at 100M RSS
                            user     system      total        real
fspawn (fork/exec):     0.100000  15.460000  40.570000 ( 41.366389)
pspawn (posix_spawn):   0.010000   0.010000   0.540000 (  0.970577)

Solution 3

Did you actually measure how much time forks take? Quoting the page you linked,

Linux never had this problem; because Linux used copy-on-write semantics internally, Linux only copies pages when they changed (actually, there are still some tables that have to be copied; in most circumstances their overhead is not significant)

So the number of forks doesn't really show how big the overhead will be. You should measure the time consumed by forks, and (which is a generic advice) consumed only by the forks you actually perform, not by benchmarking maximum performance.

But if you really figure out that forking a large process is a slow, you may spawn a small ancillary process, pipe master process to its input, and receive commands to exec from it. The small process will fork and exec these commands.

posix_spawn()

This function, as far as I understand, is implemented via fork/exec on desktop systems. However, in embedded systems (particularly, in those without MMU on board), processes are spawned via a syscall, interface to which is posix_spawn or a similar function. Quoting the informative section of POSIX standard describing posix_spawn:

  • Swapping is generally too slow for a realtime environment.

  • Dynamic address translation is not available everywhere that POSIX might be useful.

  • Processes are too useful to simply option out of POSIX whenever it must run without address translation or other MMU services.

Thus, POSIX needs process creation and file execution primitives that can be efficiently implemented without address translation or other MMU services.

I don't think that you will benefit from this function on desktop if your goal is to minimize time consumption.

Solution 4

If you know the number of subprocess ahead of time, it might be reasonable to pre-fork your application on startup then distribute the execv information via a pipe. Alternatively, if there is some sort of "lull" in your program it might be reasonable to fork ahead of time a subprocess or two for quick turnaround at a later time. Neither of these options would directly solve the problem but if either approach is suitable to your app, it might allow you to side-step the issue.

Solution 5

I've come across this blog post: http://blog.famzah.net/2009/11/20/a-much-faster-popen-and-system-implementation-for-linux/

pid = clone(fn, stack_aligned, CLONE_VM | SIGCHLD, arg);

Excerpt:

The system call clone() comes to the rescue. Using clone() we create a child process which has the following features:

  • The child runs in the same memory space as the parent. This means that no memory structures are copied when the child process is created. As a result of this, any change to any non-stack variable made by the child is visible by the parent process. This is similar to threads, and therefore completely different from fork(), and also very dangerous – we don’t want the child to mess up the parent.
  • The child starts from an entry function which is being called right after the child was created. This is like threads, and unlike fork().
  • The child has a separate stack space which is similar to threads and fork(), but entirely different to vfork().
  • The most important: This thread-like child process can call exec().

In a nutshell, by calling clone in the following way, we create a child process which is very similar to a thread but still can call exec():

However I think it may still be subject to the setuid problem:

http://ewontfix.com/7/ "setuid and vfork"

Now we get to the worst of it. Threads and vfork allow you to get in a situation where two processes are both sharing memory space and running at the same time. Now, what happens if another thread in the parent calls setuid (or any other privilege-affecting function)? You end up with two processes with different privilege levels running in a shared address space. And this is A Bad Thing.

Consider for example a multi-threaded server daemon, running initially as root, that’s using posix_spawn, implemented naively with vfork, to run an external command. It doesn’t care if this command runs as root or with low privileges, since it’s a fixed command line with fixed environment and can’t do anything harmful. (As a stupid example, let’s say it’s running date as an external command because the programmer couldn’t figure out how to use strftime.)

Since it doesn’t care, it calls setuid in another thread without any synchronization against running the external program, with the intent to drop down to a normal user and execute user-provided code (perhaps a script or dlopen-obtained module) as that user. Unfortunately, it just gave that user permission to mmap new code over top of the running posix_spawn code, or to change the strings posix_spawn is passing to exec in the child. Whoops.

Share:
20,180
timday
Author by

timday

"The most amazing achievement of the computer software industry is its continuing cancellation of the steady and staggering gains made by the computer hardware industry." - Henry Petroski "What if we didn't take it to our limit...wouldn't we be forever dissatisfied ?" - Doug Scott "Problems are inevitable. Problems are soluble." - David Deutsch "The incremental increase in systemic complexity is rarely if ever recognized as a problem that additional complexity can't solve." - Charles Hugh Smith (OfTwoMinds blog) "If you don’t make mistakes, you’re not working on hard enough problems. And that’s a big mistake." - Frank Wilczek "Only those that risk going too far can possibly find out how far one can go." – T.S. Eliot "Engineers turn dreams into reality" - Giovanni Caproni (in Hayao Miyazaki's The Wind Rises) "When an Oxford man walks into the room, he walks in like he owns it. When a Cambridge man walks into the room, he walks in like he doesn't care who owns it." - my grandmother "The greatest scientific discovery was the discovery of ignorance" - Yuval Noah Harari. "Always train your doubt most strongly on those ideas that you really want to be true." - Sean Carroll "The first principle is that you must not fool yourself — and you are the easiest person to fool" - Richard Feynman "On the plains of hesitation lie the blackened bones of countless millions who at the dawn of victory lay down to rest, and in resting died." - Adlai E. Stevenson "Therefore Simplicio, come either with arguments and demonstrations and bring us no more Texts and authorities, for our disputes are about the Sensible World, and not one of Paper." - Salviati to Simplicio in Galileo's Dialogue On Two World Systems (1632) "The larger the island of knowledge, the longer the shoreline of wonder." - Ralph W. Sockman "I never enlighten anyone who has not been driven to distraction by trying to understand a difficulty or who has not got into a frenzy trying to put his ideas into words. When I have pointed out one corner of a square to anyone and he does not come back with the other three, I will not point it out to him a second time." - Confucius "The way to bring about the new age of peace and enlightenment is to assume it has already started" - ?

Updated on August 07, 2020

Comments

  • timday
    timday over 3 years

    What's the fastest, best way on modern Linux of achieving the same effect as a fork-execve combo from a large process ?

    My problem is that the process forking is ~500MByte big, and a simple benchmarking test achieves only about 50 forks/s from the process (c.f ~1600 forks/s from a minimally sized process) which is too slow for the intended application.

    Some googling turns up vfork as having being invented as the solution to this problem... but also warnings about not to use it. Modern Linux seems to have acquired related clone and posix_spawn calls; are these likely to help ? What's the modern replacement for vfork ?

    I'm using 64bit Debian Lenny on an i7 (the project could move to Squeeze if posix_spawn would help).