Collect exit codes of parallel background processes (sub shells)

bash shell-script shell subshell

33,655

Solution 1

The answer by Alexander Mills which uses handleJobs gave me a great starting point, but also gave me this error

warning: run_pending_traps: bad value in trap_list[17]: 0x461010

Which may be a bash race-condition problem

Instead I did just store pid of each child and wait and gets exit code for each child specifically. I find this cleaner in terms of subprocesses spawning subprocesses in functions and avoiding the risk of waiting for a parent process where I meant to wait for child. Its clearer what happens because its not using the trap.

#!/usr/bin/env bash

# it seems it does not work well if using echo for function return value, and calling inside $() (is a subprocess spawned?) 
function wait_and_get_exit_codes() {
    children=("$@")
    EXIT_CODE=0
    for job in "${children[@]}"; do
       echo "PID => ${job}"
       CODE=0;
       wait ${job} || CODE=$?
       if [[ "${CODE}" != "0" ]]; then
           echo "At least one test failed with exit code => ${CODE}" ;
           EXIT_CODE=1;
       fi
   done
}

DIRN=$(dirname "$0");

commands=(
    "{ echo 'a'; exit 1; }"
    "{ echo 'b'; exit 0; }"
    "{ echo 'c'; exit 2; }"
    )

clen=`expr "${#commands[@]}" - 1` # get length of commands - 1

children_pids=()
for i in `seq 0 "$clen"`; do
    (echo "${commands[$i]}" | bash) &   # run the command via bash in subshell
    children_pids+=("$!")
    echo "$i ith command has been issued as a background job"
done
# wait; # wait for all subshells to finish - its still valid to wait for all jobs to finish, before processing any exit-codes if we wanted to
#EXIT_CODE=0;  # exit code of overall script
wait_and_get_exit_codes "${children_pids[@]}"

echo "EXIT_CODE => $EXIT_CODE"
exit "$EXIT_CODE"
# end

Solution 2

Use wait with a PID, which will:

Wait until the child process specified by each process ID pid or job specification jobspec exits and return the exit status of the last command waited for.

You'll need to save the PID of each process as you go:

echo "x" & X=$!
echo "y" & Y=$!
echo "z" & Z=$!

You can also enable job control in the script with set -m and use a %n jobspec, but you almost certainly don't want to - job control has a lot of other side effects.

wait will return the same code as the process finished with. You can use wait $X at any (reasonable) later point to access the final code as $? or simply use it as true/false:

echo "x" & X=$!
echo "y" & Y=$!
...
wait $X
echo "job X returned $?"

wait will pause until the command completes if it hasn't already.

If you want to avoid stalling like that, you can set a trap on SIGCHLD, count the number of terminations, and handle all the waits at once when they've all finished. You can probably get away with using wait alone almost all the time.

Solution 3

If you had a good way to identify the commands, you could print their exit code to a tmp file and then access the specific file you're interested in:

#!/bin/bash

for i in `seq 1 5`; do
    ( sleep $i ; echo $? > /tmp/cmd__${i} ) &
done

wait

for i in `seq 1 5`; do # or even /tmp/cmd__*
    echo "process $i:"
    cat /tmp/cmd__${i}
done

Don't forget to remove the tmp files.

Solution 4

Use a compound command - put the statement in parentheses:

( echo "x" ; echo X: $? ) &
( true ; echo TRUE: $? ) &
( false ; echo FALSE: $? ) &

will give the output

x
X: 0
TRUE: 0
FALSE: 1

A really different way to run several commands in parallel is by using GNU Parallel. Make a list of commands to run and put them in the file list:

cat > list
sleep 2 ; exit 7
sleep 3 ; exit 55
^D

Run all the commands in parallel and collect the exit codes in the file job.log:

cat list | parallel -j0 --joblog job.log
cat job.log

and the output is:

Seq     Host    Starttime       JobRuntime      Send    Receive Exitval Signal  Command
1       :       1486892487.325       1.976      0       0       7       0       sleep 2 ; exit 7
2       :       1486892487.326       3.003      0       0       55      0       sleep 3 ; exit 55

Solution 5

script3 will be executed only if script1 and script2 are successful and script1 and script2 will be executed in parallel:

./script1 &
process1=$!

./script2 &
process2=$!

wait $process1
rc1=$?

wait $process2
rc2=$?

if [[ $rc1 -eq 0 ]] && [[ $rc2 -eq 0  ]];then
./script3
fi

View more solutions

33,655

Alexander Mills

Updated on September 18, 2022

Comments

Alexander Mills over 1 year
Say we have a bash script like so:
```
echo "x" &
echo "y" &
echo "z" &
.....
echo "Z" &
wait
```
is there a way to collect the exit codes of the sub shells / sub processes? Looking for way to do this and can't find anything. I need to run these subshells in parallel, otherwise yes this would be easier.

I am looking for a generic solution (I have an unknown/dynamic number of sub processes to run in parallel).
Alexander Mills about 7 years

ok thanks, is there a way to generify this? I don't just have 3 sub processes, I have Z sub processes.
Alexander Mills about 7 years

I updated the original question to reflect that I am looking for a generic solution, thanks
Alexander Mills about 7 years

ughh, sorry, I need to run these subshells in parallel, I will specify that in the question...
Alexander Mills about 7 years

nevermind, maybe this works with my setup...where does the wait command come into play in your code? I don't follow
Michael Homer about 7 years

@AlexanderMills They are running in parallel. If you have a variable number of them, use an array. (as e.g. here which may thus be a duplicate).
Alexander Mills about 7 years

yes thanks I will check that out, if the wait command pertains to your answer, then please add it
Michael Homer about 7 years

You run wait $X at any (reasonable) later point.
Alexander Mills about 7 years

ok thanks, sorry but what does the $X represent?
Alexander Mills about 7 years

I mean $X represents the PID of the first process, but why would you want to use that with wait?
hschou about 7 years

Looping? Do you have a fixed list of commands or is that controlled by the user? I am not sure I understand what you are trying to do but maybe PIPESTATUS is something you should check out. This seq 10 | gzip -c > seq.gz ; echo ${PIPESTATUS[@]} returns 0 0 (exit code from first and last command).
Alexander Mills about 7 years

Yeah essentially controlled by the user
Alexander Mills about 7 years

oh, I need the exit codes of all the sub processes, not just the first one
Alexander Mills about 7 years

LOL, ok I see, but it's not super clear, let me update your answer, thanks :)
Mathias Begert about 7 years

@oletange, your edit should really have taken the form of a separate answer
Chunko about 7 years

You can also use SIGCHLD trap to process the children when they exit, such as printing out the status at that time. Or updating a progress counter: declare a function then use "trap function_name CHLD" though that may also require an option to be turned on in a non-interactive shell, such as possibly "set -m"
Chunko about 7 years

Also "wait -n" will wait for any child and then return the exit status of that child in the $? variable. So you can print progress as each one exits. However note that unless you use the CHLD trap, you may miss some child exits that way.
Ole Tange about 7 years

@iruvar I believe you mistake my edit (#3) for hschou's (#2). Mine is correcting a speling msitake and adding -j0. Hardly worth its own answer.
Alexander Mills about 7 years

@Chunko thanks! that is good info, could you maybe update the answer with something you think is best?
Alexander Mills about 7 years

thanks @Chunko, trap works better, you're right. With wait <pid>, I got fallthrough.
Scott - Слава Україні about 6 years

Can you explain how and why you believe the version with the trap is better than the one without it? (I believe that it’s no better, and therefore that it is worse, because it is more complex with no benefit.)
Scott - Слава Україні about 6 years

AFAICT, this is nothing more than a rehash of Michael Homer’s answer.
Alexander Mills about 6 years

cool, I think for job in "${childen[@]}"; do should be for job in "${1}"; do though, for clarity
Alexander Mills about 6 years

Yeah @arberg used your idea and solved it your way, I changed the accepted answer to argberg's. if you think his answer needs to work please comment on that answer below thanks.
Alexander Mills about 6 years

the only concern I have with this script, is if children_pids+=("$!") is actually capturing the desired pid for the sub shell.
Alexander Mills about 6 years

I think @argberg's answer is better now, I may delete this answer soon
arberg about 6 years

I tested with "${1}" and it doesn't work. I'm passing an array to the function, and apparently that needs special attention in bash. $! is the pid of the last spawned job, see tldp.org/LDP/abs/html/internalvariables.html It seems to work correctly in my tests, and I'm now using in the in unRAID cache_dirs script, and it seems to do its job. I'm using bash 4.4.12.
Alexander Mills about 6 years

nice yep seems like you are correct