How does copy-on-write in fork() handle multiple fork?

linux c fork

40,916

Solution 1

Nothing particular happens. All processes are sharing the same set of pages and each one gets its own private copy when it wants to modify a page.

Solution 2

The behavior of fork() depends on whether the *nix system has a MMU or not. On a non-MMU system (like early PDP-11s) the fork() system call copied all of the parent's memory for each child. On a MMU-based *nix system the kernel marks all the non-stack pages as R/O and shares them between parent and child. Then when either process writes to any page, the MMU traps the attempt, the kernel then allocates a writeable page and updates the MMU page-tables to point to the now writeable page. This Copy-on-Write behavior provides a speed-up since initially only a private stack needs to be allocated and cloned for each child process.

If you execute some parent code between each fork() call then the resulting child processes will differ by the pages that have been altered by the parent. On the other hand, if the parent simply issues multiple fork() calls, e.g. in a loop, then the child processes will be almost identical. If a local loop variable is used then that will be different within each child's stack.

40,916

ssgao

Updated on September 18, 2022

Comments

ssgao over 1 year

According to Wikipedia (which could be wrong)

When a fork() system call is issued, a copy of all the pages corresponding to the parent process is created, loaded into a separate memory location by the OS for the child process. But this is not needed in certain cases. Consider the case when a child executes an "exec" system call (which is used to execute any executable file from within a C program) or exits very soon after the fork(). When the child is needed just to execute a command for the parent process, there is no need for copying the parent process' pages, since exec replaces the address space of the process which invoked it with the command to be executed.

In such cases, a technique called copy-on-write (COW) is used. With this technique, when a fork occurs, the parent process's pages are not copied for the child process. Instead, the pages are shared between the child and the parent process. Whenever a process (parent or child) modifies a page, a separate copy of that particular page alone is made for that process (parent or child) which performed the modification. This process will then use the newly copied page rather than the shared one in all future references. The other process (the one which did not modify the shared page) continues to use the original copy of the page (which is now no longer shared). This technique is called copy-on-write since the page is copied when some process writes to it.

It seems that when either of the processes tries to write to the page a new copy of the page gets allocated and assigned to the process that generated the page fault. The original page gets marked writable afterwards.

My question is: what happens if the fork() gets called multiple times before any of the processes made an attempt to write to a shared page?
- Razzlero over 11 years
  
  Wikipedia is right in this case, just more high level.
- where23 about 8 years
  
  Yes, copy-on-write is lazy copying, child process copy the page when try to write it. So basically, after a fork, almost child's memory is shared with parent. However, before any of the processes made, every child process still have some private memory, modified from parent's or new allocating. That means even without any action the forked child process has some private memory. We can verify it with pmap -XX PID or cat /proc/PID/smap.
- esb over 5 years
  
  Regarding - "The original page gets marked writable afterwards.", who will own it? Here the other process who has not tried writing it?
- ed22 over 4 years
  
  This is lovely. Let's start teaching this in kindergartens
Charles Stewart over 11 years

Right. The point is, it's the child process that is special, that has the job of copying if it tries to write to the shared page. Neither the parent nor the other children need know about the change if it is done correctly.
jlliagre over 11 years

The child process is not that special. Both the child and the parent processes have the same set of pages read-only after the fork. As far as these pages are concerned, page handling is symmetrical.
brauliobo over 9 years

which system does this? linux uses a copy-on-write implementation
Razzlero over 9 years

That is how the copy-on-write works...
Celada over 9 years

@DavidKohen that's not how copy-on-write works in any version of it that I've ever heard of. There is no "master" process. If any single process writes the shared pages, its copy gets bumped to a private one while all the other processes continue to share it.
0xC0000022L almost 9 years

I think David Kohen is right to some point. This is one way to implement copy-on-write. The gist would be that with this marking, writing to that page would trigger a page-fault handler that would then take appropriate action, i.e. copy-on-write. Unfortunately this detail (which would be system-specific) is mostly irrelevant for the question. Keep in mind that CoW has two dimensions: the one visible to the process and the one of how the kernel might implement it.