How to use fork() in unix? Why not something of the form fork(pointerToFunctionToRun)?

12,388

Solution 1

fork() says "copy the current process state into a new process and start it running from right here." Because the code is then running in two processes, it in fact returns twice: once in the parent process (where it returns the child process's process identifier) and once in the child (where it returns zero).

There are a lot of restrictions on what it is safe to call in the child process after fork() (see below). The expectation is that the fork() call was part one of spawning a new process running a new executable with its own state. Part two of this process is a call to execve() or one of its variants, which specifies the path to an executable to be loaded into the currently running process, the arguments to be provided to that process, and the environment variables to surround that process. (There is nothing to stop you from re-executing the currently running executable and providing a flag that will make it pick up where the parent left off, if that's what you really want.)

The UNIX fork()-exec() dance is roughly the equivalent of the Windows CreateProcess(). A newer function is even more like it: posix_spawn().

As a practical example of using fork(), consider a shell, such as bash. fork() is used all the time by a command shell. When you tell the shell to run a program (such as echo "hello world"), it forks itself and then execs that program. A pipeline is a collection of forked processes with stdout and stdin rigged up appropriately by the parent in between fork() and exec().

If you want to create a new thread, you should use the Posix threads library. You create a new Posix thread (pthread) using pthread_create(). Your CreateNewThread() example would look like this:

#include <pthread.h>

/* Pthread functions are expected to accept and return void *. */ 
void *MyFunctionToRun(void *dummy __unused);

pthread_t thread;
int error = pthread_create(&thread,
        NULL/*use default thread attributes*/,
        MyFunctionToRun,
        (void *)NULL/*argument*/);

Before threads were available, fork() was the closest thing UNIX provided to multithreading. Now that threads are available, usage of fork() is almost entirely limited to spawning a new process to execute a different executable.

below: The restrictions are because fork() predates multithreading, so only the thread that calls fork() continues to execute in the child process. Per POSIX:

A process shall be created with a single thread. If a multi-threaded process calls fork(), the new process shall contain a replica of the calling thread and its entire address space, possibly including the states of mutexes and other resources. Consequently, to avoid errors, the child process may only execute async-signal-safe operations until such time as one of the exec functions is called. [THR] [Option Start] Fork handlers may be established by means of the pthread_atfork() function in order to maintain application invariants across fork() calls. [Option End]

When the application calls fork() from a signal handler and any of the fork handlers registered by pthread_atfork() calls a function that is not asynch-signal-safe, the behavior is undefined.

Because any library function you call could have spawned a thread on your behalf, the paranoid assumption is that you are always limited to executing async-signal-safe operations in the child process between calling fork() and exec().

Solution 2

History aside, there are some fundamental differences with respect to ownership of resource and life time between processes and threads.

When you fork, the new process occupies a completely separate memory space. That's a very important distinction from creating a new thread. In multi-threaded applications you have to consider how you access and manipulate shared resources. Processed that have been forked have to explicitly share resources using inter-process means such as shared memory, pipes, remote procedure calls, semaphores, etc.

Another difference is that fork()'ed children can outlive their parent, where as all threads die when the process terminates.

In a client-server architecture where very, very long uptime is expected, using fork() rather than creating threads could be a valid strategy to combat memory leaks. Rather than worrying about cleaning up memory leaks in your threads, you just fork off a new child process to process each client request, then kill the child when it's done. The only source of memory leaks would then be the parent process that dispatches events.

An analogy: You can think of spawning threads as opening tabs inside a single browser window, while forking is like opening separate browser windows.

Solution 3

It would be more valid to ask why CreateNewThread doesn't just return a thread id like fork() does... after all fork() set a precedent. Your opinion's just coloured by you having seen one before the other. Take a step back and consider that fork() duplicates the process and continues execution... what better place than at the next instruction? Why complicate things by adding a function call into the bargain (and then one what only takes void*)?

Your comment to Mike says "I can't understand is in which contexts you'd want to use it.". Basically, you use it when you want to:

  • run another process using the exec family of functions
  • do some parallel processing independently (in terms of memory usage, signal handling, resources, security, robustness), for example:
    • each process may have intrusive limits of the number of file descriptors they can manage, or on a 32-bit system - the amount of memory: a second process can share the work while getting its own resources
    • web browsers tend to fork distinct processes because they can do some initialisation then call operating system functions to permanently reduce their privileges (e.g. change to a less-trusted user id, change the "root" directory under which they can access files, or make some memory pages read-only); most OSes don't allow the same extent of fine-grained permission-setting on a per-thread basis; another benefit is if a child process seg-faults or similar the parent process can handle that and continue, whereas similar faults in multi-threaded code raise questions about whether memory has been corrupted - or locks have been held - by the crashing thread such that remaining threads are compromised

BTW / using UNIX/Linux doesn't mean you have to give up threads for fork()ing processes... you can use pthread_create() and related functions if you're more comfortable with the threading paradigm.

Solution 4

Letting the difference between spawning a process and a thread set aside for a second: Basically, fork() is a more fundamental primitive. While SpawnNewThread has to do some background work to get the program counter in the right spot, fork does no such work, it just copies (or virtually copies) your program memory and continues the counter.

Solution 5

Fork has been with us for a very, very, long time. Fork was thought of before the idea of 'start a thread running a particular function' was a glimmer in anyone's eye.

People don't use fork because it's 'better,' we use it because it is the one and only unprivileged user-mode process creation function that works across all variations of Linux. If you want to create a process, you have to call fork. And, for some purposes, a process is what you need, not a thread.

You might consider researching the early papers on the subject.

Share:
12,388
devoured elysium
Author by

devoured elysium

Updated on July 27, 2022

Comments

  • devoured elysium
    devoured elysium over 1 year

    I am having some trouble understanding how to use Unix's fork(). I am used to, when in need of parallelization, spawining threads in my application. It's always something of the form

    CreateNewThread(MyFunctionToRun());
    
    void myFunctionToRun() { ... }
    

    Now, when learning about Unix's fork(), I was given examples of the form:

    fork();
    printf("%d\n", 123);
    

    in which the code after the fork is "split up". I can't understand how fork() can be useful. Why doesn't fork() have a similar syntax to the above CreateNewThread(), where you pass it the address of a function you want to run?

    To accomplish something similar to CreateNewThread(), I'd have to be creative and do something like

    //pseudo code
    id = fork();
    
    if (id == 0) { //im the child
        FunctionToRun();
    } else { //im the parent
        wait();
    }
    

    Maybe the problem is that I am so used to spawning threads the .NET way that I can't think clearly about this. What am I missing here? What are the advantages of fork() over CreateNewThread()?

    PS: I know fork() will spawn a new process, while CreateNewThread() will spawn a new thread.

    Thanks

  • devoured elysium
    devoured elysium over 13 years
    I understand what fork does, what I can't understand is in which contexts you'd want to use it.
  • devoured elysium
    devoured elysium over 13 years
    My point is that it is very clear the reason why I am calling CreateThread. I am calling it because I want to run a function in parallel. What are the regular uses of a call to fork?
  • devoured elysium
    devoured elysium over 13 years
    So, maybe what I don't get is why I'd want to craete a new process. My windows-used brain would think that the only interest in creating a process would be to run another executable. Am I missing something?
  • nos
    nos over 13 years
    @devoured elysium To run a new process (e.g. run a new/different program). fork() is the only way to create a new process in *nix. You'd also do it if you're doing parallell processing where the different "threads" are totally independant of eachother - that way a problem/bug/crash in one of the processes doesn't affect the others, whereas with regular thread, a crash in one of them affects all the others.
  • tchrist
    tchrist over 13 years
    @devoured elysium: of course not: camel_case is abhorrent to Unix.
  • nos
    nos over 13 years
    @devoured elysium, not really. It provides exec(programPath..) and similar, but that doesn't create a new process, it replaces the existing process with the image of the programPath. CreateProcess() in win32 creates a new process and loads the program image. In *nix you do that in 2 steps, fork() to create a new process, exec() to load the program.
  • Artem
    Artem over 13 years
    Try implementing a shell or launching a daemon any other way.
  • Tony Delroy
    Tony Delroy over 13 years
    @devoured elysum: re runProgram(path) - rather, fork() is followed by one of the exec family of functions - the latter kind of overwrites the process calling it with the requested image & restarts execution. Things like popen() would use fork and exec inside.
  • Tony Delroy
    Tony Delroy over 13 years
    @devoured elysum: re "What are the regular uses of a call to fork?" - to create a second copy of the process, either to exec a different process OR to work cooperatively with the first, perhaps taking over the control/management of certain I/O streams (e.g. TCP clients), or running some time-consuming processing and coordinating results via e.g. shared memory. Do remember that threads were designed to make it easier/faster to do processing that shares data with the original thread, so fork()ing a process really can seem kind of backwards in certain comparisons.
  • tchrist
    tchrist over 13 years
    What does this Leenooks thing have to with a standard Unix fork (2), anyway?⁠
  • dmckee --- ex-moderator kitten
    dmckee --- ex-moderator kitten over 13 years
    Telling someone who is used to using threads for concurrency that forking gives you concurrency isn't a big win.
  • Tony Delroy
    Tony Delroy over 13 years
    "think of spawning threads as opening tabs inside a single browser window, while forking is like opening separate browser windows" - didn't hear about Chrome? ;-) +1 for a solid point re children able to outlive parents.
  • devoured elysium
    devoured elysium over 13 years
    Ah. I guess I can understand now its use. It's just that I'm too tied to win32.
  • ninjalj
    ninjalj about 10 years
    About the resource cleanup by letting the process die: You don't need it per every client request, I remember Apache 1.3 had a parameter that was the number of requests each process handled before dying.