What is the exact difference between a "subshell" and a "child process"?

9,633

Solution 1

In the POSIX terminology, a subshell environment is linked to the notion of Shell Execution Environment.

A subshell environment is a separate shell execution environment created as a duplicate of the parent environment. That execution environment includes things like opened files, umask, working directory, shell variables/functions/aliases...

Changes to that subshell environment do not affect the parent environment.

Traditionally in the Bourne shell or ksh88 on which the POSIX specification is based, that was done by forking a child process.

The areas where POSIX requires or allows command to run in a subshell environment are those where traditionally ksh88 forked a child shell process.

It doesn't however force implementations to use a child process for that.

A shell can choose instead to implement that separate execution environment any way they like.

For instance, ksh93 does it by saving the attributes of the parent execution environment and restoring them upon termination of the subshell environment in contexts where forking can be avoided (as an optimisation as forking is quite expensive on most systems).

For instance, in:

cd /foo; pwd
(cd /bar; pwd)
pwd

POSIX does require the cd /foo to run in a separate environment and that to output something like:

/foo
/bar
/foo

It doesn't require it to run in a separate process. For instance, if stdout becomes a broken pipe, pwd run in the subshell environment could very well have the SIGPIPE sent to the one and only shell process.

Most shells including bash will implement it by evaluating the code inside (...) in a child process (while the parent process waits for its termination), but ksh93 will instead upon running the code inside (...), all in the same process:

  • remember it is in a subshell environment.
  • upon cd, save the previous working directory (typically on a file descriptor opened with O_CLOEXEC), save the value of the OLDPWD, PWD variables and anything that cd might modify and then do the chdir("/bar")
  • upon returning from the subshell, the current working directory is restored (with a fchdir() on that saved fd), and everything else that the subshell may have modified.

There are contexts where a child process can't be avoided. ksh93 doesn't fork in:

  • var=$(subshell)
  • (subshell)

But does in

  • { subshell; } &
  • { subshell; } | other command

That is, the cases where things have to run in separate processes so they can run concurrently.

ksh93 optimisations go further than that. For instance, while in

var=$(pwd)

most shells would fork a process, have the child run the pwd command with its stdout redirected to a pipe, pwd write the current working directory to that pipe, and the parent process read the result at the other end of the pipe, ksh93 virtualises all that by neither requiring the fork nor the pipe. A fork and pipe would only be used for non-builtin commands.

Note that there are contexts other that subshells for which shells fork a child process. For instance, to run a command that is stored in a separate executable (and that is not a script intended for the same shell interpreter), a shell would have to fork a process to run that command in it as otherwise it wouldn't be able to run more commands after that command returns.

In:

/bin/echo "$((n += 1))"

That is not a subshell, the command will be evaluated in the current shell execution environment, the n variable of the current shell execution environment will be incremented, but the shell will fork a child process to execute that /bin/echo command in it with the expansion of $((n += 1)) as argument.

Many shells implement an optimisation in that they don't fork a child process to run that external command if it's the last command of a script or a subshell (for those subshells that are implemented as child processes). (bash however only does it if that command is the only command of the subshell).

What that means is that, with those shells, if the last command in the subshell is an external command, the subshell doesn't not cause an extra process to be spawned. If you compare:

a=1; /bin/echo "$a"; a=2; /bin/echo "$a"

with

a=1; /bin/echo "$a"; (a=2; /bin/echo "$a")

there will be the same number of processes created, only in the second case, the second fork is done earlier so that the a=2 is run in a subshell environment.

Solution 2

Both (subshell and child shell) are a separate process than the parent shell (both are childs of the parent shell). That is, they have diferent PIDs. And both start with a fork (copy) of the parent shell.

A subshell is a copy of the parent shell in which variables, functions, flags and everything is available as it was in the parent shell. Modifications of such values doesn't affect the parent.

A child shell starts as a fork but it gets reset to the shell default values given by the start configs. It becomes a process used to execute some code (either a shell or a command).

A subshell could access variable values:

$ x=123; ( echo "$x")
123

A child shell could not (un-exported variables):

$ x=234; sh -c 'echo "x=$x"'
x=

Solution 3

Subshell

Child shell is also called subshell. Subshell can be created from parent shell and from another shell. Subshell can be created using:

1. Process List

A process list is command grouping enclosed in parenthesis. Example:

( pwd ; (echo $BASH_SUBSHELL)) 

This will print current working directory and number of spawned shell. NOTE Invoking subshell is expensive.

2. Coprocess

It spawns a subshell in background mode and executes a command within that subshell.

coproc sleep 10

If you type jobs command

[1]+  Running                 coproc COPROC sleep 10 &

you will see sleep as background process running in background.

Forking a Child Process

A child process in computing is a process created by another process. Whenever an external command is executed, a child process is created. This action is termed forking.

$ps -f
UID        PID  PPID  C STIME TTY          TIME CMD  
umcr7     3647  3638  0 13:54 pts/0    00:00:00 bash
umcr7     3749  3647  0 13:59 pts/0    00:00:00 ps -f

As ps -f is external command (i.e. An external command, sometimes called a filesystem command, is a program that exists outside of the bash shell.) this will create child process with parent id of bash shell from which it is executed.

Share:
9,633
done
Author by

done

Updated on September 18, 2022

Comments

  • done
    done over 1 year

    According to this and this, a subshell is started by using parenthesis (…).

    ( echo "Hello" )
    

    According to this, this and this, a process is forked when the command is started with a &

    echo "Hello" &
    

    The Posix specification use the word subshell in this page but doesn't define it and, also, on the same page, doesn't define "child process".

    Both are using the kernel fork() function, correct?

    What is the exact difference then to call some forks a "sub-shell" and some other forks a "child process".

    • fra-san
      fra-san about 5 years
      Not clear why you are linking POSIX Rationale: Base Definitions instead of the Base Definitions themselves: 3.93 Child Process "A new process created (by fork(), posix_spawn(), or ...) by a given process"; 3.376 Subshell "A shell execution environment, distinguished from the main or current shell execution environment". So, not instances of the same kind of thing. Is this the distinction you are looking for?
    • done
      done about 5 years
      @fra-san A child process could have a distinct environment than main: Like in ( LANG=C eval 'echo "$LANG"' ). Is that child process (inside parenthesis) also a subshell (different environment)?
    • fra-san
      fra-san about 5 years
      The expression in ( ) is by definition a subshell with its own execution environment. My point is that a subshell is not required to be implemented as a child process (as Stéphane points out in his answer with the ksh93 example). It looks like subshell and child process have not to be both results of a fork() call; thus, looking for the difference between two kinds of fork doesn't seem the right point of view to me. That's why I was trying to better understand your question.
    • fra-san
      fra-san about 5 years
      Ah, now I see that the tldp page you linked to actually says that a subshell is a child process. In my opinion that definition is a possibly misleading simplification.
  • aderchox
    aderchox over 3 years
    This is the accepted answer but I'm surprised it doesn't mention child "process" at all.
  • Pavan
    Pavan almost 3 years
    @aderchox A child process is the same as a child shell in this context, child process or child shell does not get parent shell variables unless they're exported in parent shell.
  • The Quark
    The Quark over 2 years
    Why { subshell; } | other command implies that the processes need to run concurrently but not necessarily command | other command? Indeed { ps; } | ps or ps | { ps; } create an additional bash process compared to ps | ps. Especially intriguing since { list; } is supposed to execute in the current process environment according to man bash or pubs.opengroup.org/onlinepubs/9699919799/utilities/…...
  • The Quark
    The Quark over 2 years
    Is it really because { command; } | other command implies that the processes need to run concurrently that an additional shell process is started?... Isn't it rather that the creation of a new shell process is required here because | needs two processes to create a pipe AND because { ... } must be interpreted by a shell, thereby requiring a new shell process to be spawned, whereas other command can be executed in its own non-shell child process? Yet, couldn't the pipe be simply created between the current shell executing { ... } and the child process executing other command?
  • Stéphane Chazelas
    Stéphane Chazelas over 2 years
    pipeline components run concurrently in different processes, that's not only for { ...; } or other compound commands. Here I was comparing {..} | .. to {...} or (...) alone. In bash that needs to be a shell process that "runs" the compound command. In some other shells, you may find that the last command in a command group is executed in the same process as was evaluating the command group, but that would just be an optimisation.