What Happens When I Call fork() in Unix?

10,552

Solution 1

As it appears to the process, the entire memory is duplicated.

In reality, it uses "copy on write" system. The first time either process changes its memory after fork(), a separate copy is made of the modified page (usually 4kB).

Usually the code segment of a process is not modified, in which case it remains shared.

Solution 2

Logically, a fork creates an identical copy of the original process that is largely independent of the original. For performance reasons, memory is shared with copy-on-write semantics, which means that unmodified memory (such as code) remains shared.

File descriptors are duplicated, so that the forked process could, in principle, take over a database connection on behalf of the parent (or they could even jointly communicate with the database if the programmer is a bit twisted). More commonly, this is used to set up pipes between processes so you can write find -name '*.c' | xargs grep fork.

A bunch of other stuff is shared. See here for details.

One important omission is threads — the child process only inherits the thread that called fork(). This causes no end of trouble in multithreaded programs, since the status of mutexes, etc., that were locked in the parent is implementation-specific (and don't forget that malloc() and printf() use locks internally). The only safe thing to do in the child after fork() returns is to call execve() as soon as possible, and even then you have to be cautious with file descriptors. See here for the full horror story.

Solution 3

  1. They are separate processes i.e. the Child and the Parent will have separate PIDs
  2. The child will inherit all of the open descriptors from the Parent
  3. Internally the pages i.e. the stack/heap regions which can be modified unlike the .text region, will be shared b/w the Parent and the Child until one of them tries to modify the contents. In such cases a new page is created and data specific to the page being modified is copied to this freshly allocated page and mapped to the region corresponding to the one who caused the change - could be either the Parent or Child. This is called COW (mentioned by other members in this forum above in their answers).
  4. The Child can finish execution and until reclaimed by the parent using the wait() or waitpid() calls will be in ZOMBIE state. This will help clear the child's process entry from the process table and allow the child pid to be reused. Usually when a child dies, the SIGCHLD signal is sent out to the parent which would ideally result in a handler being called subsequent to which the wait() call is executed in that handler.
  5. In case the Parent exits without cleaning up the already running or zombie child (via the wait() waitpid calls), the init() process (PID 1) becomes the parent to these now orphan children. This init() process executes wait() or waitpid() calls at regular intervals.

EDIT: typos HTH

Solution 4

Yes, they are separate processes, but with some special "properties". One of them is the child-parent relation.

But more important is the sharing of memory pages in a copy-on-write (COW) manner: until the one of them performs a write (to a global variable or whatever) on a page, the memory pages are shared. When a write is performed, a copy of that page is created by the kernel and mapped at the right address.

The COW magic is done by in the kernel by marking the pages as read-only and using the fault mechanism.

Share:
10,552
Rob P.
Author by

Rob P.

About Me.

Updated on July 02, 2022

Comments

  • Rob P.
    Rob P. almost 2 years

    I've tried to look this up, but I'm struggling a bit to understand the relation between the Parent Process and the Child Process immediately after I call fork().

    Are they completely separate processes, only associated by the id/parent id? Or do they share memory? For example the 'code' section of each process - is that duplicated so that each process has it's own identical copy, or is that 'shared' in some way so that only one exists?

    I hope that makes sense.

    In the name of full disclosure this is 'homework related'; while not a direct question from the book, I have a feeling it's mostly academic and, in practice, I probably don't need to know.

  • Linus Kleen
    Linus Kleen over 12 years
    +1 Nice one. Can you maybe elaborate on what happens when setsid() is called? Code segment-wise?
  • jpa
    jpa over 12 years
    I'm not very familiar with setsid(), but I don't think it affects the copy-on-write function.
  • R.. GitHub STOP HELPING ICE
    R.. GitHub STOP HELPING ICE over 12 years
    setsid has nothing to do with memory.
  • R.. GitHub STOP HELPING ICE
    R.. GitHub STOP HELPING ICE over 12 years
    Note that the omission of threads is not a design flaw in POSIX but a fundamental limitation of the notion of "duplicating" a process. Each lock has an owner, so after duplication, who should own it? This may not matter with process-local mutexes (both the parent and child could end up with a copy, each copy owned by the corresponding thread) but what about process-shared mutexes? The same issues apply with pre-pthreads synchronization objects (file locks, etc.) and fork. Basically the idea of forking without knowing exactly what your whole program does is just unfixably broken.