Is it safe to fork from within a thread?

c++ linux multithreading process fork

25,224

Solution 1

The problem is that fork() only copies the calling thread, and any mutexes held in child threads will be forever locked in the forked child. The pthread solution was the pthread_atfork() handlers. The idea was you can register 3 handlers: one prefork, one parent handler, and one child handler. When fork() happens prefork is called prior to fork and is expected to obtain all application mutexes. Both parent and child must release all mutexes in parent and child processes respectively.

This isn't the end of the story though! Libraries call pthread_atfork to register handlers for library specific mutexes, for example Libc does this. This is a good thing: the application can't possibly know about the mutexes held by 3rd party libraries, so each library must call pthread_atfork to ensure it's own mutexes are cleaned up in the event of a fork().

The problem is that the order that pthread_atfork handlers are called for unrelated libraries is undefined (it depends on the order that the libraries are loaded by the program). So this means that technically a deadlock can happen inside of a prefork handler because of a race condition.

For example, consider this sequence:

Thread T1 calls fork()
libc prefork handlers are called in T1 (e.g. T1 now holds all libc locks)
Next, in Thread T2, a 3rd party library A acquires its own mutex AM, and then makes a libc call which requires a mutex. This blocks, because libc mutexes are held by T1.
Thread T1 runs prefork handler for library A, which blocks waiting to obtain AM, which is held by T2.

There's your deadlock and its unrelated to your own mutexes or code.

This actually happened on a project I once worked on. The advice I had found at that time was to choose fork or threads but not both. But for some applications that's probably not practical.

Solution 2

It's safe to fork in a multithreaded program as long as you are very careful about the code between fork and exec. You can make only re-enterant (aka asynchronous-safe) system calls in that span. In theory, you are not allowed to malloc or free there, although in practice the default Linux allocator is safe, and Linux libraries came to rely on it End result is that you must use the default allocator.

Solution 3

Back at the Dawn of Time, we called threads "lightweight processes" because while they act a lot like processes, they're not identical. The biggest distinction is that threads by definition live in the same address space of one process. This has advantages: switching from thread to thread is fast, they inherently share memory so inter-thread communications are fast, and creating and disposing of threads is fast.

The distinction here is with "heavyweight processes", which are complete address spaces. A new heavyweight process is created by fork(2). As virtual memory came into the UNIX world, that was augmented with vfork(2) and some others.

A fork(2) copies the entire address space of the process, including all the registers, and puts that process under the control of the operating system scheduler; the next time the scheduler comes around, the instruction counter picks up at the next instruction -- the forked child process is a clone of the parent. (If you want to run another program, say because you're writing a shell, you follow the fork with an exec(2) call, which loads that new address space with a new program, replacing the one that was cloned.)

Basically, your answer is buried in that explanation: when you have a process with many ~~LWPs~~ threads and you fork the process, you will have two independent processes with many threads, running concurrently.

This trick is even useful: in many programs, you have a parent process that may have many threads, some of which fork new child processes. (For example, an HTTP server might do that: each connection to port 80 is handled by a thread, and then a child process for something like a CGI program could be forked; exec(2) would then be called to run the CGI program in place of the parent process close.)

Solution 4

While you can use Linux's NPTL pthreads(7) support for your program, threads are an awkward fit on Unix systems, as you've discovered with your fork(2) question.

Since fork(2) is a very cheap operation on modern systems, you might do better to just fork(2) your process when you have more handling to perform. It depends upon how much data you intend to move back and forth, the share-nothing philosophy of forked processes is good for reducing shared-data bugs but does mean you either need to create pipes to move data between processes or use shared memory (shmget(2) or shm_open(3)).

But if you choose to use threading, you can fork(2) a new process, with the following hints from the fork(2) manpage:

   *  The child process is created with a single thread — the
      one that called fork().  The entire virtual address space
      of the parent is replicated in the child, including the
      states of mutexes, condition variables, and other pthreads
      objects; the use of pthread_atfork(3) may be helpful for
      dealing with problems that this can cause.

Solution 5

My experience of fork()'ing within threads is really bad. The software generally fails pretty quickly.

I've found several solutions to the matter, although you may not like them much, I think these are generally the best way to avoid close to undebuggable errors.

Fork first

Assuming you know the number of external processes you need at the start, you can create them upfront and just have them sit there waiting for an event (i.e. read from a blocking pipe, wait on a semaphore, etc.)

Once you forked enough children you are free to use threads and communicate with those forked processes via your pipes, semaphores, etc. From the time you create a first thread, you cannot call fork anymore. Keep in mind that if you're using 3rd party libraries which may create threads, those have to be used/initialized after the fork() calls happened.

Note that you can then start using threads within the main and fork()'ed processes.
Know your state

In some circumstances, it may be possible for you to stop all of your threads to start a process and then restart your threads. This is somewhat similar to point (1) in the sense that you do not want threads running at the time you call fork(), although it requires a way for you to know about all the threads currently running in your software (something not always possible with 3rd party libraries).

Remember that "stopping a thread" using a wait is not going to work. You have to join with the thread so it is fully exited, because a wait require a mutex and those need to be unlocked when you call fork(). You just cannot know when the wait is going to unlock/re-lock the mutex and that's usually where you get stuck.
Choose one or the other

The other obvious possibility is to choose one or the other and not bother with whether you're going to interfere with one or the other. This is by far the simplest method if at all possible in your software.
Create Threads only when Necessary

In some software, one creates one or more threads in a function, use said threads, then joins all of them when exiting the function. This is somewhat equivalent to point (2) above, only you (micro-)manage threads as required instead of creating threads that sit around and get used when necessary. This will work too, just keep in mind that creating a thread is a costly call. It has to allocate a new task with a stack and its own set of registers... it is a complex function. However, this makes it easy to know when you have threads running and except from within those functions, you are free to call fork().

In my programming, I used all of these solutions. I used Point (2) because the threaded version of log4cplus and I needed to use fork() for some parts of my software.

As mentioned by others, if you are using a fork() to then call execve() then the idea is to use as little as possible between the two calls. That is likely to work 99.999% of the time (many people use system() or popen() with fairly good successes too and these do similar things). The fact is that if you do not hit any of the mutexes held by the other threads, then this will work without issue.

On the other hand, if, like me, you want to do a fork() and never call execve(), then it's not likely to work right while any thread is running.

What is actually happening?

The issue is that fork() create a separate copy of only the current task (a process under Linux is called a task in the kernel).

Each time you create a new thread (pthread_create()), you also create a new task, but within the same process (i.e. the new task shares the process space: memory, file descriptors, ownership, etc.). However, a fork() ignores those extra tasks when duplicating the currently running task.

+-----------------------------------------------+
|                                     Process A |
|                                               |
| +----------+    +----------+    +----------+  |
| | thread 1 |    | thread 2 |    | thread 3 |  |
| +----------+    +----+-----+    +----------+  |
|                      |                        |
+----------------------|------------------------+
                       | fork()
                       |
+----------------------|------------------------+
|                      v              Process B |
|               +----------+                    |
|               | thread 1 |                    |
|               +----------+                    |
|                                               |
+-----------------------------------------------+

So in Process B, we lose thread 1 & thread 3 from Process A. This means that if either or both have a lock on mutexes or something similar, then Process B is going to lock up quickly. The locks are the worst, but any resources that either thread still has at the time the fork() happens are lost (socket connection, memory allocations, device handle, etc.) This is where point (2) above comes in. You need to know your state before the fork(). If you have a very small number of threads or worker threads defined in one place and can easily stop all of them, then it will be easy enough.

View more solutions

25,224

Ælex

I'm an ML/AI researcher who sold his soul to the industry. I love working on RL, ML, CV using Python (PyTorch, Keras, TF, etc). Used to work on C++/ROS/OpenCV for Robotics. I'm not looking for a job, but I'm definitely interested in Startups.

Updated on April 26, 2022

Comments

Ælex about 2 years

Let me explain: I have already been developing an application on Linux which forks and execs an external binary and waits for it to finish. Results are communicated by shm files that are unique to the fork + process. The entire code is encapsulated within a class.

Now I am considering threading the process in order to speed things up. Having many different instances of class functions fork and execute the binary concurrently (with different parameters) and communicate results with their own unique shm files.

Is this thread safe? If I fork within a thread, apart from being safe, is there something I have to watch for? Any advice or help is much appreciated!
- ildjarn almost 13 years
  
  How would you execute code and not be within a thread?
- hammar almost 13 years
  
  The forked process will only contain a copy of the current thread, if that's what you meant.
- Lightness Races in Orbit almost 13 years
  
  @ildjarn: I think he means a child thread of the base process.
- ildjarn almost 13 years
  
  @TomalakGeretkal : I realize that, but my (facetiously-made) point is, if code is executing, you're in a thread whether you explicitly created that thread or not, which makes the question a bit silly on the surface.
- Ælex almost 13 years
  
  I should have clarified. I run a single execution path. For a population of individuals (linear genetic programming optimization), I need to execute an external binary, and thus I use fork & exec. Now this execution path can be run in parallel by using threads (boost threads, pthreads, etc). Will it be safe to do so? Meaning, have the threaded execution part fork itself and execute the binary? The shm object is unique to each fork and it's executed binary, and thus unique to each thread as well. Also, Am I better off using vfork instead of fork (conserving memory, minimizing copying, etc?).
- enthusiasticgeek over 11 years
  
  It may be too late, but one may find these blogs useful linuxprogrammingblog.com/… infohost.nmt.edu/~eweiss/222_book/222_book/0201433079/…
- Ichthyo over 2 years
  
  @Ælex can you please please un-accept the misleading answer and accept the detailed and well explained one?
- Ælex over 2 years
  
  @Ichthyo the one from Kevin?
- Ichthyo over 2 years
  
  That answer has the most votes, is solid and explains well the problems and pitfalls. So yes, that's probably a good pick
- Ichthyo over 2 years
  
  Incidentally, if your intention is to fork and then immediately hand over to another executable within the child process, using some of the exec*() functions... ...then you should probably also have a look at the posix_spawn() function. This function basically encapsulates this sequence and exposes some configuration hooks man7.org/linux/man-pages/man3/posix_spawn.3.html
Ælex almost 13 years

Thank you for the detailed explanation. My question to you is this: I am interested in the main/parent application to use threads (not processes), yet each thread will have to fork and replace its self with another process (execute the binary). Is it safe to do so? Furthermore since fork copies pretty much everything, But I know I do not need everything since the forked process is to be instantly replaced by executing another binary, is there a way to avoid this, or minimize memory copying?
Ælex almost 13 years

Thank you, I am already using forking and exec for the external process, and chose to stick with shm because I believe it will be faster, and the amount of info need to be passed between processes is big. shm objects are unique to each fork + process, and destroyed/unlinked after passing info. I admit you are probably right about sticking with fork, but it is my understanding that it is heavier on the cpu, and the part where I need to use threads instead of forks contain a lot of memory to share between them, so threads would be much easier to use.
Nikolai Fetissov almost 13 years

@Alex, the copying is actually done on demand - most pages are shared between parent and the child and marked copy-on-write.
vladr about 12 years

@Charlie, your statement "you will have two independent processes with many threads, running concurrently" is ambiguous or incorrect. The POSIX-specified behavior for fork() is that only the calling thread is in a non-suspended state in the child process. However, some platforms (e.g. Solaris) implement forkall().
Charlie Martin about 12 years

The sentence reads "Two independent processes ... running concurrently." That's correct. "With many threads" is a subordinate clause referring to what the processes have, not to "running concurrently". It's only ambiguous since they don't teach you young kids how to parse an English sentence any more, he said crankily.
Dima Tisnek over 10 years

CMIIAW, parent keeps its threads, child gets only one thread. Or is that only in Linux?
Keith4G over 10 years

Basically if you have a multiple-process product with any process potentially fork()ing to call exec(), no process should have multiple threads?
Kevin over 10 years

yes, but if you're willing to live dangerously, you're probably mostly fine.
Beni Cherniavsky-Paskin about 9 years

Not only Linux — POSIX says the child is a single-thread process. [On Linux fork() function actually uses clone system call but in fork-equivalent way.]
Alexis Wilke about 4 years

What he's saying is the main app has multiple threads and he wants each thread to call fork() on their own time and that can generate all sorts of problems.
Chris Dodd over 3 years

pthread_atfork handlers are specified as being called in LIFO order, which implies your deadlock scenario is a bug in library A -- because it depends on libc, it must initialize libc first so library A's prefork handler will be called first.
Kevin over 3 years

That's interesting--I ran man (on my mac just now) and see that parent and child handlers are FIFO and prefork is LIFO. Note this answer was written in 2011, about an experience I had in the 2005 timeframe, and my experience (at that time) was with Tru64 Unix. So perhaps something has changed in that time, hard to tell.