Linux system call for creating process and thread

25,794

Solution 1

Processes are usually created with fork, threads (lightweight processes) are usually created with clone nowadays. However, anecdotically, there exist 1:N thread models, too, which don't do either.

Both fork and clone map to the same kernel function do_fork internally. This function can create a lightweight process that shares the address space with the old one, or a separate process (and many other options), depending on what flags you feed to it. The clone syscall is more or less a direct forwarding of that kernel function (and used by the higher level threading libraries) whereas fork wraps do_fork into the functionality of the 50 year old traditional Unix function.

The important difference is that fork guarantees that a complete, separate copy of the address space is made. This, as Basil points out correctly, is done with copy-on-write nowadays and therefore is not nearly as expensive as one would think.
When you create a thread, it just reuses the original address space and the same memory.

However, one should not assume that creating processes is generally "lightweight" on unix-like systems because of copy-on-write. It is somewhat less heavy than for example under Windows, but it's nowhere near free.
One reason is that although the actual pages are not copied, the new process still needs a copy of the page table. This can be several kilobytes to megabytes of memory for processes that use larger amounts of memory. Another reason is that although copy-on-write is invisible and a clever optimization, it is not free, and it cannot do magic. When data is modified by either process, which inevitably happens, the affected pages fault.

Redis is a good example where you can see that fork is everything but lightweight (it uses fork to do background saves).

Solution 2

The underlying system call to create threads is clone(2) (it is Linux specific). BTW, the list of Linux system calls is on syscalls(2), and you could use the strace(1) command to understand the syscalls done by some process or command. Processes are usually created with fork(2) (or vfork(2), which is not much useful these days). However, you could (and some C standard libraries might do that) create them with some particular form of clone. I guess that the kernel is sharing some code to implement clone, fork etc... (since some functionalities, e.g. management of the virtual address space, are common).

Indeed, process creation (and also thread creation) is generally quite fast on most Unix systems (because they use copy-on-write machinery for the virtual memory), typically a small fraction of a millisecond. But you could have pathological cases (e.g. thrashing) which makes that much longer.

Since most C standard library implementations are free software on Linux, you could study the source code of the one on your system (often GNU glibc, but sometimes musl-libc or something else).

Share:
25,794
atoMerz
Author by

atoMerz

Stackoverflow CV Linkedin profile

Updated on July 09, 2022

Comments

  • atoMerz
    atoMerz almost 2 years

    I read in a paper that the underlying system call to create processes and threads is actually the same, and thus the cost of creating processes over threads is not that great.

    • First, I wanna know what is the system call that creates processes/threads (possibly a sample code or a link?)
    • Second, is the author correct to assume that creating processes instead of threads is inexpensive?

    EDIT:
    Quoting article:

    Replacing pthreads with processes is surprisingly inexpensive, especially on Linux where both pthreads and processes are invoked using the same underlying system call.

  • atoMerz
    atoMerz about 12 years
    So they're different system calls?
  • Basile Starynkevitch
    Basile Starynkevitch about 12 years
    They are different, but AFAIU fork could be implemented with clone (but predated it by dozen of years).
  • atoMerz
    atoMerz about 12 years
    Thank you. I searched for do_fork, I found it's source. Are there any documentations on how to use it?
  • Damon
    Damon about 12 years
    Unless you write kernel code, you won't directly call do_fork at all. You probably don't want to use clone in general either (it's recommended to use the pthreads library built on top of it instead). Anyway, in case you do want to use clone, the documentation is here. Now fork on the other hand, is something you may realistically want to use, the docs are on the same site.
  • atoMerz
    atoMerz about 12 years
    I'm doing a review on this paper, I want to know how things actually work. I found the source code for fork and pthread_create. But I can't find any calls to do_fork.
  • Damon
    Damon about 12 years
    Unluckily Google Code Search has shut down, and Koders is an ordeal to work with... Here are some implementations (there are different ones in different arch subfolders, but probably not much different, if at all) for both sys_fork and sys_clone that I could find with a quick search.
  • atoMerz
    atoMerz about 12 years
    Thanks a lot. Exactly what I was looking for.