Are Linux kernel threads really kernel processes?

12,478

Solution 1

There is absolutely no difference between a thread and a process on Linux. If you look at clone(2) you will see a set of flags that determine what is shared, and what is not shared, between the threads.

Classic processes are just threads that share nothing; you can share what components you want under Linux.

This is not the case on other OS implementations, where there are much more substantial differences.

Solution 2

The documentation can be pretty confusing, so here is the "real" Linux model:

  • inside the Linux kernel, something that can be run (& scheduled) is called a "process",
  • each process has a system-unique Process ID (PID), and a Thread Group ID (TGID),
  • a "normal" process has PID=TGID and no other process shares this TGID value,
  • a "threaded" process is a process which TGID value is shared by other processes,
  • several processes sharing the same TGID also share, at least, the same memory space and signal handlers (sometimes more),
  • if a "threaded" process has PID=TGID, it can be called "the main thread",
  • calling getpid() from any process will return its TGID (= "main thread" PID),
  • calling gettid() from any process will return its PID (!),
  • any kind of process can be created with the clone(2) system call,
  • what is shared between processes is decided by passing specific flags to clone(2),
  • folders' numeric names you can list with ls /proc as /proc/NUMBER are TGIDs,
  • folders' numeric names in /proc/TGID/task as /proc/TGID/task/NUMBER are PIDs,
  • even though you don't see every existing PIDs with ls /proc, you can still do cd /proc/any_PID.

Conclusion: from the kernel point of view, only processes exist, each having their own unique PID, and a so-called thread is just a different kind of process (sharing, at least, the same memory space and signal handlers with one or several other-s).

Note: the implementation of the "thread" concept in Linux has led to a vocabulary confusion, and if getpid() is lying to you does not do what you thought, it is because its behavior follows POSIX compatibility (threads are supposed to share a common PID).

Solution 3

Threads are processes under Linux. They are created with the clone system call, which returns a process ID that can be sent a signal via the kill system call, just like a process. Thread processes are visible in ps output. The clone call is passed flags which determine how much of the parent process's environment is shared with the thread process.

Solution 4

Previous answers are excellent, pointing out that threads are processes inside the Linux kernel and that you can clone( ) any subset of the process state you like anyway.

But I think it's helpful to remember that it matters how much context can be shared or must be saved uniquely, and how many cycles it may take for a context switch, which may depend on how much is likely to be different, not just as far as the OS is concerned, but also in the hardware, e.g., the TLB. So it matters what is cloned and what is shared.

At the application level, a new thread (as conventionally understood, sharing the memory image, current directory, open file handles, etc.) is always cheaper than a new process that at best only initially shares any of this. Even if the process is forked with copy-on-write, as soon as it writes, you do have to make the copy. This is why, in designing an application, it's a lot more reasonable to create 10,000 threads than 10,000 processes. The reasons to do a new process are to run a different executable or to firewall for security reasons.

Share:
12,478

Related videos on Youtube

Ellen Spertus
Author by

Ellen Spertus

I am a computer science professor at Mills College. I used to work at Google on projects such as App Inventor (which I still contribute to), Blockly, and the Hour of Code. I have done research in computer architecture, compilers, artificial intelligence, information retrieval, and data mining.

Updated on September 18, 2022

Comments

  • Ellen Spertus
    Ellen Spertus over 1 year

    I've read in many places that Linux creates a kernel thread for each user thread in a Java VM. (I see the term "kernel thread" used in two different ways:

    1. a thread created to do core OS work and
    2. a thread the OS is aware of and schedules to perform user work.

    I am talking about the latter type.)

    Is a kernel thread the same as a kernel process, since Linux processes support shared memory spaces between parent and child, or is it truly a different entity?

  • Totor
    Totor over 7 years
    The man pthreads(7) says that for the current NPTL (Native POSIX Threads Library) implementation, "all of the threads in a process are placed in the same thread group; all members of a thread group share the same PID." In the obsolete LinuxThreads implementation, each "thread" has its own PID.
  • Totor
    Totor about 5 years
    Suggestion: using the word "task" may help referring to something runnable without getting into the process/thread confusion so much.