Linux system call for creating process and thread
Solution 1
Processes are usually created with fork
, threads (lightweight processes) are usually created with clone
nowadays. However, anecdotically, there exist 1:N thread models, too, which don't do either.
Both fork
and clone
map to the same kernel function do_fork
internally. This function can create a lightweight process that shares the address space with the old one, or a separate process (and many other options), depending on what flags you feed to it. The clone
syscall is more or less a direct forwarding of that kernel function (and used by the higher level threading libraries) whereas fork
wraps do_fork
into the functionality of the 50 year old traditional Unix function.
The important difference is that fork
guarantees that a complete, separate copy of the address space is made. This, as Basil points out correctly, is done with copy-on-write nowadays and therefore is not nearly as expensive as one would think.
When you create a thread, it just reuses the original address space and the same memory.
However, one should not assume that creating processes is generally "lightweight" on unix-like systems because of copy-on-write. It is somewhat less heavy than for example under Windows, but it's nowhere near free.
One reason is that although the actual pages are not copied, the new process still needs a copy of the page table. This can be several kilobytes to megabytes of memory for processes that use larger amounts of memory.
Another reason is that although copy-on-write is invisible and a clever optimization, it is not free, and it cannot do magic. When data is modified by either process, which inevitably happens, the affected pages fault.
Redis is a good example where you can see that fork
is everything but lightweight (it uses fork
to do background saves).
Solution 2
The underlying system call to create threads is clone(2) (it is Linux specific). BTW, the list of Linux system calls is on syscalls(2), and you could use the strace(1) command to understand the syscalls done by some process or command. Processes are usually created with fork(2) (or vfork(2), which is not much useful these days). However, you could (and some C standard libraries might do that) create them with some particular form of clone
. I guess that the kernel is sharing some code to implement clone
, fork
etc... (since some functionalities, e.g. management of the virtual address space, are common).
Indeed, process creation (and also thread creation) is generally quite fast on most Unix systems (because they use copy-on-write machinery for the virtual memory), typically a small fraction of a millisecond. But you could have pathological cases (e.g. thrashing) which makes that much longer.
Since most C standard library implementations are free software on Linux, you could study the source code of the one on your system (often GNU glibc, but sometimes musl-libc or something else).
Comments
-
atoMerz almost 2 years
I read in a paper that the underlying system call to create processes and threads is actually the same, and thus the cost of creating processes over threads is not that great.
- First, I wanna know what is the system call that creates processes/threads (possibly a sample code or a link?)
- Second, is the author correct to assume that creating processes instead of threads is inexpensive?
EDIT:
Quoting article:Replacing pthreads with processes is surprisingly inexpensive, especially on Linux where both pthreads and processes are invoked using the same underlying system call.
-
atoMerz about 12 yearsSo they're different system calls?
-
Basile Starynkevitch about 12 yearsThey are different, but AFAIU
fork
could be implemented withclone
(but predated it by dozen of years). -
atoMerz about 12 yearsThank you. I searched for do_fork, I found it's source. Are there any documentations on how to use it?
-
Damon about 12 yearsUnless you write kernel code, you won't directly call
do_fork
at all. You probably don't want to useclone
in general either (it's recommended to use the pthreads library built on top of it instead). Anyway, in case you do want to useclone
, the documentation is here. Nowfork
on the other hand, is something you may realistically want to use, the docs are on the same site. -
atoMerz about 12 yearsI'm doing a review on this paper, I want to know how things actually work. I found the source code for
fork
andpthread_create
. But I can't find any calls todo_fork
. -
Damon about 12 yearsUnluckily Google Code Search has shut down, and Koders is an ordeal to work with... Here are some implementations (there are different ones in different arch subfolders, but probably not much different, if at all) for both sys_fork and sys_clone that I could find with a quick search.
-
atoMerz about 12 yearsThanks a lot. Exactly what I was looking for.