Threads vs (Forked) Processes

process fork thread

16,105

Solution 1

The idea behind threads and processes is about the same: You fork the execution path. Otherwise threads and processes differ in things like memory. I.e. processes have different VM space while threads share whatever existed before the split.

Underlying both threading and forking work by using the clone() call (man 2 clone):

Unlike fork(2), clone() allows the child process to share parts of its execution context with the calling process, such as the memory space, the table of file descriptors, and the table of signal handlers. (Note that on this manual page, "calling process" normally corresponds to "parent process". But see the description of CLONE_PARENT below.)

The main use of clone() is to implement threads: multiple threads of control in a program that run concurrently in a shared memory space.

The differences come from the flags that are passed to clone(). As you can see from the man page, fork and threading are just a set of predefined parameters to clone(). However one can also do custom stuff with it.

Solution 2

Most non-Unix multiprocessing operating systems (OSes) use a "spawn()" call or something similar to generate a new OS process or control flow. Spawn() tends to be a very complex call, with lots of options and lots of overhead. One of Unix's innovations was to provide a much lower overhead way of creating processes - fork(). Unix took care of the many necessary options to spawn() by allowing arbitrary amounts of processing before the other half of spawn(), with exec().

As Unix and variants thereof were used more and more, low overhead process creation was found to be useful, and was used. In fact, it was used so much, that people wanted even lower overhead ways to create processes, and so the idea of "threads" was born. Originally, threads were handled completely by the originating process (and programs like the JVM may do this with "green threads"); but handling multi-thread scheduling is tricky and was frequently done incorrectly. So there's an easier, intermediate way of doing threads, where the OS handles the scheduling but some overhead is saved by (typically) sharing address space between threads.

Your question is difficult to answer because there are several different but related concepts that are all "threads," and for detail you need an adjective to describe which one you're referencing. On the other hand, understanding the differences will probably lead you to the specific answer you want. Look up things like "lightweight processes," "user threads," and "rfork()" for more info.

Solution 3

Threads and forking are actually two different concepts, both of which exist in Unix/Linux systems (and both of which can be used in C/C++).

The idea of a fork() is (very basically) a creation of a separate process which has the same execution code as the parent process, and which begins execution at the fork line. The purpose of using forks with exec functions is that exec functions close the process that called them when they end. So, you usually fork, getting the PID of each process (the child's is always 0), and make the parent wait until the child is finished executing the exec function.

Threads are used for parallelism (recall that the parent waits on the child, usually, in a forked program). A thread, such as pthread in C/C++(do a Google search), will run in parallel to the main process, and can share global variables and global functions with the original program. Since Java threads behave similarly, I would imagine that they act more like these threads than like a forking process.

Basically, there is a difference between forking and threading. They do distinctly different things (although seeming similar). These concepts can be difficult to understand, but you can learn them through (extensive) research if you have an honest desire to understand them.

EDIT #1

Please see these examples of how forks and threads can be called and used. Please note the behavior of the exec functions and their effects on the main program.

http://www.jdembrun.com:4352/computerScience/forkVSthread.zip

Solution 4

Both the JVM and Apache MPM rely on the kernel for native threads. That is, they use the OS for scheduling them. Of course both need their own API for keeping track of stuff.

Stackoverflow already has several questions dealing with this:

JVM native threads, check out this answer for more detail.
Apache has two types of MPMs: Prefork, with one process per thread, and Worker, which handles multiple threads: Apache MPMs. Check out the reference to codebucket

Solution 5

If forking, uses the fork + exec to spawn a process, what is the high level version for threading? How does JVM or Worker MPM spawn threads?

That is platform specific, but on linux and I would presume many other POSIX compliant systems they use the local implementation of pthreads, a userland threading API. E.g.:

#include <pthread.h>

pthread_t tid;
pthread_create(&tid, NULL, somefunc, NULL);

Starts a new thread calling somefunc as its first point of execution.

You can also create threads -- distinct from forks in that they share the same global heap memory space of the parent process, instead of getting a duplicate copy of it (but note threads each execute with an independent stack memory of their own) -- with the clone() system call, which is what pthreads is built on top of.

View more solutions

16,105

Gregg Leventhal

Updated on September 18, 2022

Comments

Gregg Leventhal over 1 year

Linux applications generally fork then exec (with execve() ), but Java applications, and certain Apache MPMs use threading. If forking, uses the fork + exec to spawn a process, what is the high level version for threading? How does JVM or Worker MPM spawn threads?
- Admin about 10 years
  
  Check out Stackoverflow. There are several Q&A there that have explained part of this.
Mat about 10 years

Fork (with or without exec) can be used for parallelism too. I'm not sure what you mean by "exec functions close the process that called them when they end", exec is long done finished running when the process ends. Also pthread is an API, not a thread implementation.
Bakuriu about 10 years

"handling multi-thread scheduling is tricky and was frequently done incorrectly" citation needed. Implementing user-space threads isn't a problem. The problem with user-space threads is that if a thread does a blocking syscall all the threads gets blocked. The only way to avoid this is by using system level threads.
jaredad7 about 10 years

On the fork thing, I'm quoting my OS teacher. According to what he has told us, yes, forking could be used to run in parallel, but, if it used an exec function, that would be the last one. As for pthread, it was meant as an example.
0xC0000022L about 10 years

Uhm? What? Please re-read just about every book on the topic, because the separate memory space for processes is kind of a big deal. Also helps "catch" code that crashes, whereas the kernel will simply kill a process where an individual thread goes haywire/trespasses.
Ruslan about 10 years

@0xC0000022L your argument doesn't contradict the answer, as it seems to me.
0xC0000022L about 10 years

@Ruslan: I beg to differ: "The idea [...] is about the same"? The idea behind threads is indeed concurrency, but for processes this is an entirely different story.
Ruslan about 10 years

Interestingly, Windows didn't include this innovation of Unix: it has CreateProcess() but nothing similar to fork().
mpez0 about 10 years

@Bakuriu - look up any of many articles on building multiprocessing schedulers, maintaining fairness, avoiding starvation, handling priorities, etc. Implementing user-space threads is not, as you say a problem. Scheduling non-trivial examples is difficult.
jaredad7 about 10 years

Your comments have prompted me to test these things. I have written some c++ programs which demonstrate the behavior of exec functions and their effects on programs when used in forks vs. threads. Please see the edit above.
Mat about 10 years

I'm afraid most people will not bother to download that. Also your examples don't illustrate the interesting differences between the models, which are mostly related to sharing (or not) the address space.
Izkata about 10 years

@0xC0000022L You missed the important part of V13's answer: "You fork the execution path" - the question is about how threads are spawned, not what the difference between threads and processes are
0xC0000022L about 10 years

@Izkata: not at all. I just hold that this is not a correct claim.
0xC0000022L about 10 years

@Ruslan: one can fork on Windows, it's just not part of the Win32 API. Read "The Windows NT/2000 Native API" by Nebbett. He has an implementation that mimics fork().
V13 about 10 years

@0xC0000022L to clarify thinks a bit: Both threading and forking share the same idea: Fork the program execution. They differ however on what these two execution paths share between them. In general these can share (or not) memory/VM, file descriptors, namespace(s), process ID and other stuff. All of that is customizable using the clone() call in Linux (see the man page). Threading is just a name for a predefined set of things to share and forking is for another.