Threads vs Processes in Linux

linux performance multithreading process

125,898

Solution 1

Linux uses a 1-1 threading model, with (to the kernel) no distinction between processes and threads -- everything is simply a runnable task. *

On Linux, the system call clone clones a task, with a configurable level of sharing, among which are:

CLONE_FILES: share the same file descriptor table (instead of creating a copy)
CLONE_PARENT: don't set up a parent-child relationship between the new task and the old (otherwise, child's getppid() = parent's getpid())
CLONE_VM: share the same memory space (instead of creating a COW copy)

fork() calls clone(least sharing) and pthread_create() calls clone(most sharing). **

forking costs a tiny bit more than pthread_createing because of copying tables and creating COW mappings for memory, but the Linux kernel developers have tried (and succeeded) at minimizing those costs.

Switching between tasks, if they share the same memory space and various tables, will be a tiny bit cheaper than if they aren't shared, because the data may already be loaded in cache. However, switching tasks is still very fast even if nothing is shared -- this is something else that Linux kernel developers try to ensure (and succeed at ensuring).

In fact, if you are on a multi-processor system, not sharing may actually be beneficial to performance: if each task is running on a different processor, synchronizing shared memory is expensive.

* Simplified. CLONE_THREAD causes signals delivery to be shared (which needs CLONE_SIGHAND, which shares the signal handler table).

** Simplified. There exist both SYS_fork and SYS_clone syscalls, but in the kernel, the sys_fork and sys_clone are both very thin wrappers around the same do_fork function, which itself is a thin wrapper around copy_process. Yes, the terms process, thread, and task are used rather interchangeably in the Linux kernel...

Solution 2

Linux (and indeed Unix) gives you a third option.

Option 1 - processes

Create a standalone executable which handles some part (or all parts) of your application, and invoke it separately for each process, e.g. the program runs copies of itself to delegate tasks to.

Option 2 - threads

Create a standalone executable which starts up with a single thread and create additional threads to do some tasks

Option 3 - fork

Only available under Linux/Unix, this is a bit different. A forked process really is its own process with its own address space - there is nothing that the child can do (normally) to affect its parent's or siblings address space (unlike a thread) - so you get added robustness.

However, the memory pages are not copied, they are copy-on-write, so less memory is usually used than you might imagine.

Consider a web server program which consists of two steps:

Read configuration and runtime data
Serve page requests

If you used threads, step 1 would be done once, and step 2 done in multiple threads. If you used "traditional" processes, steps 1 and 2 would need to be repeated for each process, and the memory to store the configuration and runtime data duplicated. If you used fork(), then you can do step 1 once, and then fork(), leaving the runtime data and configuration in memory, untouched, not copied.

So there are really three choices.

Solution 3

That depends on a lot of factors. Processes are more heavy-weight than threads, and have a higher startup and shutdown cost. Interprocess communication (IPC) is also harder and slower than interthread communication.

Conversely, processes are safer and more secure than threads, because each process runs in its own virtual address space. If one process crashes or has a buffer overrun, it does not affect any other process at all, whereas if a thread crashes, it takes down all of the other threads in the process, and if a thread has a buffer overrun, it opens up a security hole in all of the threads.

So, if your application's modules can run mostly independently with little communication, you should probably use processes if you can afford the startup and shutdown costs. The performance hit of IPC will be minimal, and you'll be slightly safer against bugs and security holes. If you need every bit of performance you can get or have a lot of shared data (such as complex data structures), go with threads.

Solution 4

Others have discussed the considerations.

Perhaps the important difference is that in Windows processes are heavy and expensive compared to threads, and in Linux the difference is much smaller, so the equation balances at a different point.

Solution 5

Once upon a time there was Unix and in this good old Unix there was lots of overhead for processes, so what some clever people did was to create threads, which would share the same address space with the parent process and they only needed a reduced context switch, which would make the context switch more efficient.

In a contemporary Linux (2.6.x) there is not much difference in performance between a context switch of a process compared to a thread (only the MMU stuff is additional for the thread). There is the issue with the shared address space, which means that a faulty pointer in a thread can corrupt memory of the parent process or another thread within the same address space.

A process is protected by the MMU, so a faulty pointer will just cause a signal 11 and no corruption.

I would in general use processes (not much context switch overhead in Linux, but memory protection due to MMU), but pthreads if I would need a real-time scheduler class, which is a different cup of tea all together.

Why do you think threads are have such a big performance gain on Linux? Do you have any data for this, or is it just a myth?

View more solutions

125,898

user17918

Updated on March 17, 2022

Comments

user17918 about 2 years

I've recently heard a few people say that in Linux, it is almost always better to use processes instead of threads, since Linux is very efficient in handling processes, and because there are so many problems (such as locking) associated with threads. However, I am suspicious, because it seems like threads could give a pretty big performance gain in some situations.

So my question is, when faced with a situation that threads and processes could both handle pretty well, should I use processes or threads? For example, if I were writing a web server, should I use processes or threads (or a combination)?
- mouviciel about 15 years
  
  Is there a difference with Linux 2.4?
- MarkR about 15 years
  
  The difference between processes and threads under Linux 2.4 is that threads share more parts of their state (address space, file handles etc) than processes, which usually don't. The NPTL under Linux 2.6 makes this a bit clearer by giving them "thread groups" which are a bit like "processes" in win32 and Solaris.
- ephemient about 15 years
  
  Yes, NPTL is nice: it makes things like kill, exec, etc. work as you would expect in a threaded program (the old LinuxThreads behaviors make sense given the implementation, but were icky). OTOH a "thread group" is just a collection of "threads", and doesn't really take up resources itself, so it's a ton lighter-weight than a NT or Solaris process.
- neal aise almost 14 years
  
  httpd.apache.org/docs/2.0/mod/worker.html is the default for apache webserver. its a multi process multi thread configuration.
- Lutz Prechelt over 8 years
  
  Concurrent programming is difficult. Unless you need very high performance, the most important aspect in your tradeoff will often be the difficulty of debugging. Processes make for the much easier solution in this respect, because all communication is explicit (easy to check, to log etc.). In contrast, the shared memory of threads creates gazillions of places where one thread can erroneously impact another.
- iankit over 8 years
  
  @LutzPrechelt - Concurrent programming can be multi-threaded as well as multi-process. I dont see why you are assuming concurrent programming is multi threaded only. It might be because of some particular language limitations but in general it can be both.
- user2692263 about 7 years
  
  I link Lutz merely stated that concurrent programming is difficult whichever is chosen - process or threads - but that concurrent programming using processes makes for easier debugging in many cases.
ephemient about 15 years

I do have the rep to edit, but I don't quite agree. Context switches between processes on Linux is almost as cheap as context switches between threads.
ephemient about 15 years

I used thread-local storage for a some statistics gathering, the last time I was writing a threaded networks program: each thread wrote to its own counters, no locks needed, and only when messaged would each thread combine its stats into the global totals. But yeah, TLS is not very commonly used or necessary. Shared memory, on the other hand... in addition to efficiently sending data, you can also share POSIX semaphores between processes by placing them in shared memory. It's pretty amazing.
user17918 almost 15 years

Yes, I do have some data. I ran a test that creates 100,000 processes and a test that creates 100,000 threads. The thread version ran about 9x faster (17.38 seconds for processes, 1.93 for threads). Now this does only test creation time, but for short-lived tasks, creation time can be key.
Rick Ellison over 14 years

Adam's answer would serve well as an executive briefing. For more detail, MarkR and ephemient provide good explanations. A very detailed explanation with examples may be found at cs.cf.ac.uk/Dave/C/node29.html but it does appear to be a bit dated in parts.
codingfreak about 13 years

@user17918 - Is it possible for you to share the code used by you to calculate above mentioned timings ..
MarkR over 12 years

@Qwertie forking is not that cool, it breaks lots of libraries in subtle ways (if you use them in the parent process). It creates unexpected behaviour which confuses even experienced programmers.
Saurabh about 12 years

I think we are missing 1 point. If you make multiple process for your web server, then you have to write another process to open the socket and pass 'work' to different threads. Threading offers a single process multiple threads, clean design. In many situations thread is just natural and in other situation a new process is just natural. When the problem falls in a gray area the other trade offs as explained by ephemient becomes important.
ephemient about 12 years

@Saurabh Not really. You can easily socket, bind, listen, fork, and then have multiple processes accept connections on the same listening socket. A process can stop accepting if it's busy, and the kernel will route incoming connections to another process (if nobody is listening, kernel will queue or drop, depending on listen backlog). You don't have much more control over work distribution than that, but usually that's good enough!
Ehtesh Choudhury over 11 years

@MarkR could you give some examples or a link of how forking breaks library and creates unexpected behavior?
MarkR over 11 years

If a process forks with an open mysql connection, bad things happen, as the socket is shared between two processes. Even if only one process uses the connection, the other stops it from being closed.
n611x007 almost 11 years

"the data may already be loaded in cache" - what cache exactly?
c4f4t0r over 10 years

one big different, with processes the kernel create page table for every process and theads use only one page tables, so i think is normal the threads are faster then processes
Russell Stuart about 10 years

CyberFonic's is true for Windows. As ephemient says under Linux processes aren't heavier. And under Linux all the mechanisms available for communication between threads (futex's,shared memory, pipes, IPC) is also available for processes and run at the same speed.
Lawrence Jones about 10 years

Naxa, the cache that's being referred to is the page table cache. COW ensures that initially the two threads will share the same memory - ie, each thread will point to the same physical place in memory for it's program data. This means the kernel hasn't had to perform any swapping/paging as the data is already there, probably already loaded into main memory.
Stanimirovv almost 10 years

There is one thing which I do not understand from this answer: If threads and processes are the same to linux, how and when do we achieve shared resources for the threads?
ephemient almost 10 years

@Bloodcount All processes/threads on Linux are created by the same mechanism, which clones an existing process/thread. Flags passed to clone() determine which resources are shared. A task can also unshare() resources at any later point in time.
Karthik Balaguru over 9 years

A single process can contain multiple threads and hence how is it true that the terms process, thread, and task are used rather interchangeably in the Linux kernel. Can you please point where exactly it is claimed in linux ?
Karthik Balaguru over 9 years

Another simple way to look at it is TCB is pretty smaller than PCB and so it is obvious that process context switch that involves PCB will consume bit more time than that of switching of threads.
ephemient over 9 years

@KarthikBalaguru Within the kernel itself, there is a task_struct for each task. This is often called a "process" throughout the kernel code, but it corresponds to each runnable thread. There is no process_struct; if a bunch of task_structs are linked together by their thread_group list, then they're the same "process" to userspace. There's a little bit of special handling of "thread"s, e.g. all sibling threads are stopped on fork and exec, and only the "main" thread shows up in ls /proc. Every thread is accessible via /proc/pid though, whether it's listed in /proc or not.
ephemient over 9 years

@KarthikBalaguru The kernel supports a continuum of behavior between threads and processes; for example, clone(CLONE_THREAD | CLONE_VM | CLONE_SIGHAND)) would give you a new "thread" that doesn't share working directory, files or locks, while clone(CLONE_FILES | CLONE_FS | CLONE_IO) would give you a "process" that does. The underlying system creates tasks by cloning; fork() and pthread_create() are just library functions that invoke clone() differently (as I wrote in this answer).
olegst over 8 years

What do you mean no benefit? How about performing heavy calculations in GUI thread? Moving them to parallel thread will be much better from a point of user experience, no matter how CPU is loaded.
Lie Ryan over 8 years

fork() system call is specified by POSIX (which means it's available on any Unix systems), if you used the underlying Linux API, which is the clone() system call, then you actually have even more choices in Linux than just the three.
Lelanthran almost 7 years

@MarkR The sharing of the socket is by design. Besides, either of the processes can close the socket using linux.die.net/man/2/shutdown before calling close() on the socket.
batbrat over 5 years

You mention that not sharing may be good on multiprocessor systems. However, just using multiprocessing doesn't guarantee that we will not synch. Esp. if we use shared memory and not messaging.
abhiarora over 4 years

Does this answer require modification considering latest version of Linux Kernel?
abhiarora over 4 years

IPC is harder to use but what if someone uses "shared memory"?