Collect exit codes of parallel background processes (sub shells)
Solution 1
The answer by Alexander Mills which uses handleJobs gave me a great starting point, but also gave me this error
warning: run_pending_traps: bad value in trap_list[17]: 0x461010
Which may be a bash race-condition problem
Instead I did just store pid of each child and wait and gets exit code for each child specifically. I find this cleaner in terms of subprocesses spawning subprocesses in functions and avoiding the risk of waiting for a parent process where I meant to wait for child. Its clearer what happens because its not using the trap.
#!/usr/bin/env bash
# it seems it does not work well if using echo for function return value, and calling inside $() (is a subprocess spawned?)
function wait_and_get_exit_codes() {
children=("$@")
EXIT_CODE=0
for job in "${children[@]}"; do
echo "PID => ${job}"
CODE=0;
wait ${job} || CODE=$?
if [[ "${CODE}" != "0" ]]; then
echo "At least one test failed with exit code => ${CODE}" ;
EXIT_CODE=1;
fi
done
}
DIRN=$(dirname "$0");
commands=(
"{ echo 'a'; exit 1; }"
"{ echo 'b'; exit 0; }"
"{ echo 'c'; exit 2; }"
)
clen=`expr "${#commands[@]}" - 1` # get length of commands - 1
children_pids=()
for i in `seq 0 "$clen"`; do
(echo "${commands[$i]}" | bash) & # run the command via bash in subshell
children_pids+=("$!")
echo "$i ith command has been issued as a background job"
done
# wait; # wait for all subshells to finish - its still valid to wait for all jobs to finish, before processing any exit-codes if we wanted to
#EXIT_CODE=0; # exit code of overall script
wait_and_get_exit_codes "${children_pids[@]}"
echo "EXIT_CODE => $EXIT_CODE"
exit "$EXIT_CODE"
# end
Solution 2
Use wait
with a PID, which will:
Wait until the child process specified by each process ID pid or job specification jobspec exits and return the exit status of the last command waited for.
You'll need to save the PID of each process as you go:
echo "x" & X=$!
echo "y" & Y=$!
echo "z" & Z=$!
You can also enable job control in the script with set -m
and use a %n
jobspec, but you almost certainly don't want to - job control has a lot of other side effects.
wait
will return the same code as the process finished with. You can use wait $X
at any (reasonable) later point to access the final code as $?
or simply use it as true/false:
echo "x" & X=$!
echo "y" & Y=$!
...
wait $X
echo "job X returned $?"
wait
will pause until the command completes if it hasn't already.
If you want to avoid stalling like that, you can set a trap
on SIGCHLD
, count the number of terminations, and handle all the wait
s at once when they've all finished. You can probably get away with using wait
alone almost all the time.
Solution 3
If you had a good way to identify the commands, you could print their exit code to a tmp file and then access the specific file you're interested in:
#!/bin/bash
for i in `seq 1 5`; do
( sleep $i ; echo $? > /tmp/cmd__${i} ) &
done
wait
for i in `seq 1 5`; do # or even /tmp/cmd__*
echo "process $i:"
cat /tmp/cmd__${i}
done
Don't forget to remove the tmp files.
Solution 4
Use a compound command
- put the statement in parentheses:
( echo "x" ; echo X: $? ) &
( true ; echo TRUE: $? ) &
( false ; echo FALSE: $? ) &
will give the output
x
X: 0
TRUE: 0
FALSE: 1
A really different way to run several commands in parallel is by using GNU Parallel. Make a list of commands to run and put them in the file list
:
cat > list
sleep 2 ; exit 7
sleep 3 ; exit 55
^D
Run all the commands in parallel and collect the exit codes in the file job.log
:
cat list | parallel -j0 --joblog job.log
cat job.log
and the output is:
Seq Host Starttime JobRuntime Send Receive Exitval Signal Command
1 : 1486892487.325 1.976 0 0 7 0 sleep 2 ; exit 7
2 : 1486892487.326 3.003 0 0 55 0 sleep 3 ; exit 55
Solution 5
script3
will be executed only if script1
and script2
are successful and script1
and script2
will be executed in parallel:
./script1 &
process1=$!
./script2 &
process2=$!
wait $process1
rc1=$?
wait $process2
rc2=$?
if [[ $rc1 -eq 0 ]] && [[ $rc2 -eq 0 ]];then
./script3
fi
Related videos on Youtube
Alexander Mills
Updated on September 18, 2022Comments
-
Alexander Mills over 1 year
Say we have a bash script like so:
echo "x" & echo "y" & echo "z" & ..... echo "Z" & wait
is there a way to collect the exit codes of the sub shells / sub processes? Looking for way to do this and can't find anything. I need to run these subshells in parallel, otherwise yes this would be easier.
I am looking for a generic solution (I have an unknown/dynamic number of sub processes to run in parallel).
-
Alexander Mills about 7 yearsok thanks, is there a way to generify this? I don't just have 3 sub processes, I have Z sub processes.
-
Alexander Mills about 7 yearsI updated the original question to reflect that I am looking for a generic solution, thanks
-
Alexander Mills about 7 yearsughh, sorry, I need to run these subshells in parallel, I will specify that in the question...
-
Alexander Mills about 7 yearsnevermind, maybe this works with my setup...where does the wait command come into play in your code? I don't follow
-
Michael Homer about 7 years@AlexanderMills They are running in parallel. If you have a variable number of them, use an array. (as e.g. here which may thus be a duplicate).
-
Alexander Mills about 7 yearsyes thanks I will check that out, if the wait command pertains to your answer, then please add it
-
Michael Homer about 7 yearsYou run
wait $X
at any (reasonable) later point. -
Alexander Mills about 7 yearsok thanks, sorry but what does the $X represent?
-
Alexander Mills about 7 yearsI mean $X represents the PID of the first process, but why would you want to use that with wait?
-
hschou about 7 yearsLooping? Do you have a fixed list of commands or is that controlled by the user? I am not sure I understand what you are trying to do but maybe
PIPESTATUS
is something you should check out. Thisseq 10 | gzip -c > seq.gz ; echo ${PIPESTATUS[@]}
returns0 0
(exit code from first and last command). -
Alexander Mills about 7 yearsYeah essentially controlled by the user
-
Alexander Mills about 7 yearsoh, I need the exit codes of all the sub processes, not just the first one
-
Alexander Mills about 7 yearsLOL, ok I see, but it's not super clear, let me update your answer, thanks :)
-
Mathias Begert about 7 years@oletange, your edit should really have taken the form of a separate answer
-
Chunko about 7 yearsYou can also use SIGCHLD trap to process the children when they exit, such as printing out the status at that time. Or updating a progress counter: declare a function then use "trap function_name CHLD" though that may also require an option to be turned on in a non-interactive shell, such as possibly "set -m"
-
Chunko about 7 yearsAlso "wait -n" will wait for any child and then return the exit status of that child in the $? variable. So you can print progress as each one exits. However note that unless you use the CHLD trap, you may miss some child exits that way.
-
Ole Tange about 7 years@iruvar I believe you mistake my edit (#3) for hschou's (#2). Mine is correcting a speling msitake and adding -j0. Hardly worth its own answer.
-
Alexander Mills about 7 years@Chunko thanks! that is good info, could you maybe update the answer with something you think is best?
-
Alexander Mills about 7 yearsthanks @Chunko, trap works better, you're right. With wait <pid>, I got fallthrough.
-
Scott - Слава Україні about 6 yearsCan you explain how and why you believe the version with the trap is better than the one without it? (I believe that it’s no better, and therefore that it is worse, because it is more complex with no benefit.)
-
Scott - Слава Україні about 6 yearsAFAICT, this is nothing more than a rehash of Michael Homer’s answer.
-
Alexander Mills about 6 yearscool, I think
for job in "${childen[@]}"; do
should befor job in "${1}"; do
though, for clarity -
Alexander Mills about 6 yearsYeah @arberg used your idea and solved it your way, I changed the accepted answer to argberg's. if you think his answer needs to work please comment on that answer below thanks.
-
Alexander Mills about 6 yearsthe only concern I have with this script, is if
children_pids+=("$!")
is actually capturing the desired pid for the sub shell. -
Alexander Mills about 6 yearsI think @argberg's answer is better now, I may delete this answer soon
-
arberg about 6 yearsI tested with "${1}" and it doesn't work. I'm passing an array to the function, and apparently that needs special attention in bash. $! is the pid of the last spawned job, see tldp.org/LDP/abs/html/internalvariables.html It seems to work correctly in my tests, and I'm now using in the in unRAID cache_dirs script, and it seems to do its job. I'm using bash 4.4.12.
-
Alexander Mills about 6 yearsnice yep seems like you are correct