Are Linux kernel threads really kernel processes?
Solution 1
There is absolutely no difference between a thread and a process on Linux. If you look at clone(2) you will see a set of flags that determine what is shared, and what is not shared, between the threads.
Classic processes are just threads that share nothing; you can share what components you want under Linux.
This is not the case on other OS implementations, where there are much more substantial differences.
Solution 2
The documentation can be pretty confusing, so here is the "real" Linux model:
- inside the Linux kernel, something that can be run (& scheduled) is called a "process",
- each process has a system-unique Process ID (PID), and a Thread Group ID (TGID),
- a "normal" process has PID=TGID and no other process shares this TGID value,
- a "threaded" process is a process which TGID value is shared by other processes,
- several processes sharing the same TGID also share, at least, the same memory space and signal handlers (sometimes more),
- if a "threaded" process has PID=TGID, it can be called "the main thread",
- calling
getpid()
from any process will return its TGID (= "main thread" PID), - calling
gettid()
from any process will return its PID (!), - any kind of process can be created with the
clone(2)
system call, - what is shared between processes is decided by passing specific flags to
clone(2)
, - folders' numeric names you can list with
ls /proc
as/proc/NUMBER
are TGIDs, - folders' numeric names in
/proc/TGID/task
as/proc/TGID/task/NUMBER
are PIDs, - even though you don't see every existing PIDs with
ls /proc
, you can still docd /proc/any_PID
.
Conclusion: from the kernel point of view, only processes exist, each having their own unique PID, and a so-called thread is just a different kind of process (sharing, at least, the same memory space and signal handlers with one or several other-s).
Note: the implementation of the "thread" concept in Linux has led to a vocabulary confusion, and if getpid()
is lying to you does not do what you thought, it is because its behavior follows POSIX compatibility (threads are supposed to share a common PID).
Solution 3
Threads are processes under Linux. They are created with the clone
system call, which returns a process ID that can be sent a signal via the kill
system call, just like a process. Thread processes are visible in ps
output. The clone
call is passed flags which determine how much of the parent process's environment is shared with the thread process.
Solution 4
Previous answers are excellent, pointing out that threads are processes inside the Linux kernel and that you can clone( ) any subset of the process state you like anyway.
But I think it's helpful to remember that it matters how much context can be shared or must be saved uniquely, and how many cycles it may take for a context switch, which may depend on how much is likely to be different, not just as far as the OS is concerned, but also in the hardware, e.g., the TLB. So it matters what is cloned and what is shared.
At the application level, a new thread (as conventionally understood, sharing the memory image, current directory, open file handles, etc.) is always cheaper than a new process that at best only initially shares any of this. Even if the process is forked with copy-on-write, as soon as it writes, you do have to make the copy. This is why, in designing an application, it's a lot more reasonable to create 10,000 threads than 10,000 processes. The reasons to do a new process are to run a different executable or to firewall for security reasons.
Related videos on Youtube
Ellen Spertus
I am a computer science professor at Mills College. I used to work at Google on projects such as App Inventor (which I still contribute to), Blockly, and the Hour of Code. I have done research in computer architecture, compilers, artificial intelligence, information retrieval, and data mining.
Updated on September 18, 2022Comments
-
Ellen Spertus over 1 year
I've read in many places that Linux creates a kernel thread for each user thread in a Java VM. (I see the term "kernel thread" used in two different ways:
- a thread created to do core OS work and
- a thread the OS is aware of and schedules to perform user work.
I am talking about the latter type.)
Is a kernel thread the same as a kernel process, since Linux processes support shared memory spaces between parent and child, or is it truly a different entity?
-
Totor over 7 yearsThe man
pthreads(7)
says that for the current NPTL (Native POSIX Threads Library) implementation, "all of the threads in a process are placed in the same thread group; all members of a thread group share the same PID." In the obsolete LinuxThreads implementation, each "thread" has its own PID. -
Totor about 5 yearsSuggestion: using the word "task" may help referring to something runnable without getting into the process/thread confusion so much.