Rule for invoking subshell in Bash?

15,409

Solution 1

The parentheses always start a subshell. What's happening is that bash detects that sleep 5 is the last command executed by that subshell, so it calls exec instead of fork+exec. The sleep command replaces the subshell in the same process.

In other words, the base case is:

  1. ( … ) create a subshell. The original process calls fork and wait. In the subprocess, which is a subshell:
    1. sleep is an external command which requires a subprocess of the subprocess. The subshell calls fork and wait. In the subsubprocess:
      1. The subsubprocess executes the external command → exec.
      2. Eventually the command terminates → exit.
    2. wait completes in the subshell.
  2. wait completes in the original process.

The optimization is:

  1. ( … ) create a subshell. The original process calls fork and wait. In the subprocess, which is a subshell until it calls exec:
    1. sleep is an external command, and it's the last thing this process needs to do.
    2. The subprocess executes the external command → exec.
    3. Eventually the command terminates → exit.
  2. wait completes in the original process.

When you add something else after the call the sleep, the subshell needs to be kept around, so this optimization can't happen.

When you add something else before the call to sleep, the optimization could be made (and ksh does it), but bash doesn't do it (it's very conservative with this optimization).

Solution 2

From the Advanced Bash Programming Guide:

"In general, an external command in a script forks off a subprocess, whereas a Bash builtin does not. For this reason, builtins execute more quickly and use fewer system resources than their external command equivalents."

And a little further down:

"A command list embedded between parentheses runs as a subshell."

Examples:

[root@talara test]# echo $BASHPID
10792
[root@talara test]# (echo $BASHPID)
4087
[root@talara test]# (echo $BASHPID)
4088
[root@talara test]# (echo $BASHPID)
4089

Example using OPs code (with shorter sleeps because I am impatient):

echo $BASHPID

sleep 2
(
    echo $BASHPID
    sleep 2
    echo $BASHPID
)

The output:

[root@talara test]# bash sub_bash
6606
6608
6608

Solution 3

An additional note to @Gilles answer.

As said by Gilles: The parentheses always start a subshell.

However, the numbers that such sub-shell have might repeat:

$ (echo "$BASHPID and $$"; sleep 1)
2033 and 31679
$ (echo "$BASHPID and $$"; sleep 1)
2040 and 31679
$ (echo "$BASHPID and $$"; sleep 1)
2047 and 31679

As you can see, the $$ keeps repeating, and that is as expected, because (execute this command to find the correct man bash line):

$ LESS=+/'^ *BASHPID' man bash

BASHPID
Expands to the process ID of the current bash process. This differs from $$ under certain circumstances, such as subshells that do not require bash to be re-initialized.

That is: If the shell is not re-initialized, the $$ is the same.

Or with this:

$ LESS=+/'^ *Special Parameters' man bash

Special Parameters
$ Expands to the process ID of the shell. In a () subshell, it expands to the process ID of the current shell, not the subshell.

The $$ is the ID of the current shell (not the subshell).

Share:
15,409

Related videos on Youtube

bashful
Author by

bashful

Updated on September 18, 2022

Comments

  • bashful
    bashful over 1 year

    I seem to misunderstand the Bash rule for creating a subshell. I thought parentheses always creates a subshell, which runs as its own process.

    However, this doesn't seem to be the case. In Code Snippet A (below), the second sleep command does not run in a separate shell (as determined by pstree in another terminal). However, in Code Snippet B, the second sleep command does run in a separate shell. The only difference between the snippets is that the second snippet has two commands within the parentheses.

    Could somebody please explain the rule for when subshells are created?

    CODE SNIPPET A:

    sleep 5
    (
    sleep 5
    )
    

    CODE SNIPPET B:

    sleep 5
    (
    x=1
    sleep 5
    )
    
  • bashful
    bashful almost 12 years
    Thanks for the reply Tim. I'm not sure it fully answers my question though. Since "A command list embedded between parentheses runs as a subshell", I would expect the second sleep to run in a subshell (perhaps on the subshell's process since it's a built-in, rather than a subprocess of the subshell). However, in any case, I would have expected a subshell to exist, i.e. a Bash subprocess under the parent Bash process. For Snippet B above, this doesn't seem to be the case.
  • bashful
    bashful almost 12 years
    Correction: Because sleep doesn't seem to be a built-in, I would expect the second sleep call in both snippets to run in a subprocess of the subshell process.
  • Daniel Serodio
    Daniel Serodio over 7 years
    Nice trick for opening the bash manpage in a specific section
  • haccks
    haccks about 6 years
    Subshell is created by calling fork and child process is created (to execute external commands) by calling fork + exec. But your first para suggests that fork + exec is called for subshell too. What I am getting wrong here?
  • Gilles 'SO- stop being evil'
    Gilles 'SO- stop being evil' about 6 years
    @haccks fork+exec is not called for the subshell, it's called for the external command. Without any optimization, there's a fork call for the subshell and another one for the external command. I've added a detailed flow description to my answer.
  • haccks
    haccks about 6 years
    Thanks a ton for the update. Now it explains better. I can deduce from it that in case of (...) (in base case), there may or may not be a call to exec depends on whether the subshell has any external command to execute, while in case of executing any external command there must be fork + exec.
  • haccks
    haccks about 6 years
    One more question: Is this optimisation works only for subshell or it can be done for a command like date in a shell?
  • Gilles 'SO- stop being evil'
    Gilles 'SO- stop being evil' about 6 years
    @haccks I don't understand the question. This optimization is about invoking an external command as the last thing a shell process does. It isn't restricted to subshells: compare strace -f -e clone,execve,write bash -c 'date' and strace -f -e clone,execve,write bash -c 'date; true'
  • haccks
    haccks about 6 years
    I am talking about a simple command in a shell: date. As it is an external command so shell will first fork a new process and then use exec. But is that the case that shell only call exec as date is the only command to processed by the shell?
  • Gilles 'SO- stop being evil'
    Gilles 'SO- stop being evil' about 6 years
    @haccks It depends on the shell, on what comes before the command and on what's around it. Calling exec directly is an optimization.
  • haccks
    haccks about 6 years
    Considering bash and the only command is date (nothing around it).