Collect exit codes of parallel background processes (sub shells)

33,655

Solution 1

The answer by Alexander Mills which uses handleJobs gave me a great starting point, but also gave me this error

warning: run_pending_traps: bad value in trap_list[17]: 0x461010

Which may be a bash race-condition problem

Instead I did just store pid of each child and wait and gets exit code for each child specifically. I find this cleaner in terms of subprocesses spawning subprocesses in functions and avoiding the risk of waiting for a parent process where I meant to wait for child. Its clearer what happens because its not using the trap.

#!/usr/bin/env bash

# it seems it does not work well if using echo for function return value, and calling inside $() (is a subprocess spawned?) 
function wait_and_get_exit_codes() {
    children=("$@")
    EXIT_CODE=0
    for job in "${children[@]}"; do
       echo "PID => ${job}"
       CODE=0;
       wait ${job} || CODE=$?
       if [[ "${CODE}" != "0" ]]; then
           echo "At least one test failed with exit code => ${CODE}" ;
           EXIT_CODE=1;
       fi
   done
}

DIRN=$(dirname "$0");

commands=(
    "{ echo 'a'; exit 1; }"
    "{ echo 'b'; exit 0; }"
    "{ echo 'c'; exit 2; }"
    )

clen=`expr "${#commands[@]}" - 1` # get length of commands - 1

children_pids=()
for i in `seq 0 "$clen"`; do
    (echo "${commands[$i]}" | bash) &   # run the command via bash in subshell
    children_pids+=("$!")
    echo "$i ith command has been issued as a background job"
done
# wait; # wait for all subshells to finish - its still valid to wait for all jobs to finish, before processing any exit-codes if we wanted to
#EXIT_CODE=0;  # exit code of overall script
wait_and_get_exit_codes "${children_pids[@]}"

echo "EXIT_CODE => $EXIT_CODE"
exit "$EXIT_CODE"
# end

Solution 2

Use wait with a PID, which will:

Wait until the child process specified by each process ID pid or job specification jobspec exits and return the exit status of the last command waited for.

You'll need to save the PID of each process as you go:

echo "x" & X=$!
echo "y" & Y=$!
echo "z" & Z=$!

You can also enable job control in the script with set -m and use a %n jobspec, but you almost certainly don't want to - job control has a lot of other side effects.

wait will return the same code as the process finished with. You can use wait $X at any (reasonable) later point to access the final code as $? or simply use it as true/false:

echo "x" & X=$!
echo "y" & Y=$!
...
wait $X
echo "job X returned $?"

wait will pause until the command completes if it hasn't already.

If you want to avoid stalling like that, you can set a trap on SIGCHLD, count the number of terminations, and handle all the waits at once when they've all finished. You can probably get away with using wait alone almost all the time.

Solution 3

If you had a good way to identify the commands, you could print their exit code to a tmp file and then access the specific file you're interested in:

#!/bin/bash

for i in `seq 1 5`; do
    ( sleep $i ; echo $? > /tmp/cmd__${i} ) &
done

wait

for i in `seq 1 5`; do # or even /tmp/cmd__*
    echo "process $i:"
    cat /tmp/cmd__${i}
done

Don't forget to remove the tmp files.

Solution 4

Use a compound command - put the statement in parentheses:

( echo "x" ; echo X: $? ) &
( true ; echo TRUE: $? ) &
( false ; echo FALSE: $? ) &

will give the output

x
X: 0
TRUE: 0
FALSE: 1

A really different way to run several commands in parallel is by using GNU Parallel. Make a list of commands to run and put them in the file list:

cat > list
sleep 2 ; exit 7
sleep 3 ; exit 55
^D

Run all the commands in parallel and collect the exit codes in the file job.log:

cat list | parallel -j0 --joblog job.log
cat job.log

and the output is:

Seq     Host    Starttime       JobRuntime      Send    Receive Exitval Signal  Command
1       :       1486892487.325       1.976      0       0       7       0       sleep 2 ; exit 7
2       :       1486892487.326       3.003      0       0       55      0       sleep 3 ; exit 55

Solution 5

script3 will be executed only if script1 and script2 are successful and script1 and script2 will be executed in parallel:

./script1 &
process1=$!

./script2 &
process2=$!

wait $process1
rc1=$?

wait $process2
rc2=$?

if [[ $rc1 -eq 0 ]] && [[ $rc2 -eq 0  ]];then
./script3
fi
Share:
33,655

Related videos on Youtube

Alexander Mills
Author by

Alexander Mills

Updated on September 18, 2022

Comments

  • Alexander Mills
    Alexander Mills over 1 year

    Say we have a bash script like so:

    echo "x" &
    echo "y" &
    echo "z" &
    .....
    echo "Z" &
    wait
    

    is there a way to collect the exit codes of the sub shells / sub processes? Looking for way to do this and can't find anything. I need to run these subshells in parallel, otherwise yes this would be easier.

    I am looking for a generic solution (I have an unknown/dynamic number of sub processes to run in parallel).

  • Alexander Mills
    Alexander Mills about 7 years
    ok thanks, is there a way to generify this? I don't just have 3 sub processes, I have Z sub processes.
  • Alexander Mills
    Alexander Mills about 7 years
    I updated the original question to reflect that I am looking for a generic solution, thanks
  • Alexander Mills
    Alexander Mills about 7 years
    ughh, sorry, I need to run these subshells in parallel, I will specify that in the question...
  • Alexander Mills
    Alexander Mills about 7 years
    nevermind, maybe this works with my setup...where does the wait command come into play in your code? I don't follow
  • Michael Homer
    Michael Homer about 7 years
    @AlexanderMills They are running in parallel. If you have a variable number of them, use an array. (as e.g. here which may thus be a duplicate).
  • Alexander Mills
    Alexander Mills about 7 years
    yes thanks I will check that out, if the wait command pertains to your answer, then please add it
  • Michael Homer
    Michael Homer about 7 years
    You run wait $X at any (reasonable) later point.
  • Alexander Mills
    Alexander Mills about 7 years
    ok thanks, sorry but what does the $X represent?
  • Alexander Mills
    Alexander Mills about 7 years
    I mean $X represents the PID of the first process, but why would you want to use that with wait?
  • hschou
    hschou about 7 years
    Looping? Do you have a fixed list of commands or is that controlled by the user? I am not sure I understand what you are trying to do but maybe PIPESTATUS is something you should check out. This seq 10 | gzip -c > seq.gz ; echo ${PIPESTATUS[@]} returns 0 0 (exit code from first and last command).
  • Alexander Mills
    Alexander Mills about 7 years
    Yeah essentially controlled by the user
  • Alexander Mills
    Alexander Mills about 7 years
    oh, I need the exit codes of all the sub processes, not just the first one
  • Alexander Mills
    Alexander Mills about 7 years
    LOL, ok I see, but it's not super clear, let me update your answer, thanks :)
  • Mathias Begert
    Mathias Begert about 7 years
    @oletange, your edit should really have taken the form of a separate answer
  • Chunko
    Chunko about 7 years
    You can also use SIGCHLD trap to process the children when they exit, such as printing out the status at that time. Or updating a progress counter: declare a function then use "trap function_name CHLD" though that may also require an option to be turned on in a non-interactive shell, such as possibly "set -m"
  • Chunko
    Chunko about 7 years
    Also "wait -n" will wait for any child and then return the exit status of that child in the $? variable. So you can print progress as each one exits. However note that unless you use the CHLD trap, you may miss some child exits that way.
  • Ole Tange
    Ole Tange about 7 years
    @iruvar I believe you mistake my edit (#3) for hschou's (#2). Mine is correcting a speling msitake and adding -j0. Hardly worth its own answer.
  • Alexander Mills
    Alexander Mills about 7 years
    @Chunko thanks! that is good info, could you maybe update the answer with something you think is best?
  • Alexander Mills
    Alexander Mills about 7 years
    thanks @Chunko, trap works better, you're right. With wait <pid>, I got fallthrough.
  • Scott - Слава Україні
    Scott - Слава Україні about 6 years
    Can you explain how and why you believe the version with the trap is better than the one without it?  (I believe that it’s no better, and therefore that it is worse, because it is more complex with no benefit.)
  • Scott - Слава Україні
    Scott - Слава Україні about 6 years
    AFAICT,  this is nothing more than a rehash of Michael Homer’s answer.
  • Alexander Mills
    Alexander Mills about 6 years
    cool, I think for job in "${childen[@]}"; do should be for job in "${1}"; do though, for clarity
  • Alexander Mills
    Alexander Mills about 6 years
    Yeah @arberg used your idea and solved it your way, I changed the accepted answer to argberg's. if you think his answer needs to work please comment on that answer below thanks.
  • Alexander Mills
    Alexander Mills about 6 years
    the only concern I have with this script, is if children_pids+=("$!") is actually capturing the desired pid for the sub shell.
  • Alexander Mills
    Alexander Mills about 6 years
    I think @argberg's answer is better now, I may delete this answer soon
  • arberg
    arberg about 6 years
    I tested with "${1}" and it doesn't work. I'm passing an array to the function, and apparently that needs special attention in bash. $! is the pid of the last spawned job, see tldp.org/LDP/abs/html/internalvariables.html It seems to work correctly in my tests, and I'm now using in the in unRAID cache_dirs script, and it seems to do its job. I'm using bash 4.4.12.
  • Alexander Mills
    Alexander Mills about 6 years
    nice yep seems like you are correct