Forking / Multi-Threaded Processes | Bash
Solution 1
In bash scripts (non-interactive) by default JOB CONTROL is disabled so you can't do the the commands: job, fg, and bg.
Here is what works well for me:
#!/bin/sh
set -m # Enable Job Control
for i in `seq 30`; do # start 30 jobs in parallel
sleep 3 &
done
# Wait for all parallel jobs to finish
while [ 1 ]; do fg 2> /dev/null; [ $? == 1 ] && break; done
The last line uses "fg" to bring a background job into the foreground. It does this in a loop until fg returns 1 ($? == 1), which it does when there are no longer any more background jobs.
Solution 2
I don't know of any explicit fork
call in bash. What you probably want to do is append &
to a command that you want to run in the background. You can also use &
on functions that you define within a bash script:
do_something_with_line()
{
line=$1
foo
foo2
foo3
}
for line in file
do
do_something_with_line $line &
done
EDIT: to put a limit on the number of simultaneous background processes, you could try something like this:
for line in file
do
while [`jobs | wc -l` -ge 50 ]
do
sleep 5
done
do_something_with_line $line &
done
Solution 3
I don't like using wait
because it gets blocked until the process exits, which is not ideal when there are multiple process to wait on as I can't get a status update until the current process is done. I prefer to use a combination of kill -0
and sleep
to this.
Given an array of pids
to wait on, I use the below waitPids()
function to get a continuous feedback on what pids are still pending to finish.
declare -a pids
waitPids() {
while [ ${#pids[@]} -ne 0 ]; do
echo "Waiting for pids: ${pids[@]}"
local range=$(eval echo {0..$((${#pids[@]}-1))})
local i
for i in $range; do
if ! kill -0 ${pids[$i]} 2> /dev/null; then
echo "Done -- ${pids[$i]}"
unset pids[$i]
fi
done
pids=("${pids[@]}") # Expunge nulls created by unset.
sleep 1
done
echo "Done!"
}
When I start a process in the background, I add its pid immediately to the pids
array by using this below utility function:
addPid() {
local desc=$1
local pid=$2
echo "$desc -- $pid"
pids=(${pids[@]} $pid)
}
Here is a sample that shows how to use:
for i in {2..5}; do
sleep $i &
addPid "Sleep for $i" $!
done
waitPids
And here is how the feedback looks:
Sleep for 2 -- 36271
Sleep for 3 -- 36272
Sleep for 4 -- 36273
Sleep for 5 -- 36274
Waiting for pids: 36271 36272 36273 36274
Waiting for pids: 36271 36272 36273 36274
Waiting for pids: 36271 36272 36273 36274
Done -- 36271
Waiting for pids: 36272 36273 36274
Done -- 36272
Waiting for pids: 36273 36274
Done -- 36273
Waiting for pids: 36274
Done -- 36274
Done!
Solution 4
With GNU Parallel you can do:
cat file | parallel 'foo {}; foo2 {}; foo3 {}'
This will run one job on each cpu core. To run 50 do:
cat file | parallel -j 50 'foo {}; foo2 {}; foo3 {}'
Watch the intro videos to learn more:
http://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Solution 5
haridsv's approach is great, it gives the flexibility to run a processor slots setup where a number of processes can be kept running with new jobs submitting as jobs complete, keeping the overall load up. Here are my mods to haridsv's code for an n-slot processor for a 'grid' of ngrid 'jobs' ( I use it for grids of simulation models ) Followed by test output for 8 jobs 3 at a time, with running totals of running, submitted, completed and remaining
#!/bin/bash
########################################################################
# see haridsv on forking-multi-threaded-processes-bash
# loop over grid, submitting jobs in the background.
# As jobs complete new ones are set going to keep the number running
# up to n as much as possible, until it tapers off at the end.
#
# 8 jobs
ngrid=8
# 3 at a time
n=3
# running counts
running=0
completed=0
# previous values
prunning=0
pcompleted=0
#
########################################################################
# process monitoring functions
#
declare -a pids
#
function checkPids() {
echo ${#pids[@]}
if [ ${#pids[@]} -ne 0 ]
then
echo "Checking for pids: ${pids[@]}"
local range=$(eval echo {0..$((${#pids[@]}-1))})
local i
for i in $range; do
if ! kill -0 ${pids[$i]} 2> /dev/null; then
echo "Done -- ${pids[$i]}"
unset pids[$i]
completed=$(expr $completed + 1)
fi
done
pids=("${pids[@]}") # Expunge nulls created by unset.
running=$((${#pids[@]}))
echo "#PIDS :"$running
fi
}
#
function addPid() {
desc=$1
pid=$2
echo " ${desc} - "$pid
pids=(${pids[@]} $pid)
}
########################################################################
#
# Loop and report when job changes happen,
# keep going until all are completed.
#
idx=0
while [ $completed -lt ${ngrid} ]
do
#
if [ $running -lt $n ] && [ $idx -lt ${ngrid} ]
then
####################################################################
#
# submit a new process if less than n
# are running and we haven't finished...
#
# get desc for process
#
name="job_"${idx}
# background execution
sleep 3 &
addPid $name $!
idx=$(expr $idx + 1)
#
####################################################################
#
fi
#
checkPids
# if something changes...
if [ ${running} -gt ${prunning} ] || \
[ ${completed} -gt ${pcompleted} ]
then
remain=$(expr $ngrid - $completed)
echo " Running: "${running}" Submitted: "${idx}\
" Completed: "$completed" Remaining: "$remain
fi
# save counts to prev values
prunning=${running}
pcompleted=${completed}
#
sleep 1
#
done
#
########################################################################
Test output:
job_0 - 75257
1
Checking for pids: 75257
#PIDS :1
Running: 1 Submitted: 1 Completed: 0 Remaining: 8
job_1 - 75262
2
Checking for pids: 75257 75262
#PIDS :2
Running: 2 Submitted: 2 Completed: 0 Remaining: 8
job_2 - 75267
3
Checking for pids: 75257 75262 75267
#PIDS :3
Running: 3 Submitted: 3 Completed: 0 Remaining: 8
3
Checking for pids: 75257 75262 75267
Done -- 75257
#PIDS :2
Running: 2 Submitted: 3 Completed: 1 Remaining: 7
job_3 - 75277
3
Checking for pids: 75262 75267 75277
Done -- 75262
#PIDS :2
Running: 2 Submitted: 4 Completed: 2 Remaining: 6
job_4 - 75283
3
Checking for pids: 75267 75277 75283
Done -- 75267
#PIDS :2
Running: 2 Submitted: 5 Completed: 3 Remaining: 5
job_5 - 75289
3
Checking for pids: 75277 75283 75289
#PIDS :3
Running: 3 Submitted: 6 Completed: 3 Remaining: 5
3
Checking for pids: 75277 75283 75289
Done -- 75277
#PIDS :2
Running: 2 Submitted: 6 Completed: 4 Remaining: 4
job_6 - 75298
3
Checking for pids: 75283 75289 75298
Done -- 75283
#PIDS :2
Running: 2 Submitted: 7 Completed: 5 Remaining: 3
job_7 - 75304
3
Checking for pids: 75289 75298 75304
Done -- 75289
#PIDS :2
Running: 2 Submitted: 8 Completed: 6 Remaining: 2
2
Checking for pids: 75298 75304
#PIDS :2
2
Checking for pids: 75298 75304
Done -- 75298
#PIDS :1
Running: 1 Submitted: 8 Completed: 7 Remaining: 1
1
Checking for pids: 75304
Done -- 75304
#PIDS :0
Running: 0 Submitted: 8 Completed: 8 Remaining: 0
Matt Nguyen
Updated on July 28, 2022Comments
-
Matt Nguyen almost 2 years
I would like to make a section of my code more efficient. I'm thinking of making it fork off into multiple processes and have them execute 50/100 times at once, instead of just once.
For example (pseudo):
for line in file; do foo; foo2; foo3; done
I would like this for loop to run multiple times. I know this can be done with forking. Would it look something like this?
while(x <= 50) parent(child pid) { fork child() } child { do foo; foo2; foo3; done return child_pid() }
Or am I thinking about this the wrong way?
Thanks!
-
Michael Krelin - hacker over 14 yearsYou've miscapitalized the do_something… name ;-)
-
Matt Nguyen over 14 yearsGot it - What about when I want to make sure I am only running 50 instances at a time? And - when one of those processes are done, make sure 1 more is spawned.
-
Matt Nguyen over 14 yearsAh yes - I didn't see the last line of your answer. Thank you very much. I'll get to work.
-
Michael Krelin - hacker over 14 yearsI've added the line after you asked, so it's alright that you failed to read my mind ;-) (just as I failed to read yours before you asked:)). BTW,
man bash
is one great source for the information on job control. Once you're gone this path you may have many questions ;-) -
NVRAM over 14 yearsFor the value of a variable, you must use $varname -- otherwise you'll just pass the literal (string) "varname"
-
NVRAM over 14 yearsYou probably also want $file (or use while read line).
-
Rafael Xavier almost 12 yearsIn bash scripts, you are able to use
wait
, eg:sleep 3 & WAITPID=$!; wait $WAITPID
, or concanate the pids this wayWAITPIDS="$WAITPIDS "$!;...; wait $WAITPIDS
-
cape1232 about 11 years+1 mob. I modified this into a function that you add to your file of commands after anything you put in the background. Then you can run some commands in the file sequentially and only run some of them in the background: #!/bin/bash waitpid(){ while [[
jobs | wc -l
-ge $1 ]]; do sleep 1; done; } -
labyrinth almost 10 yearsI would add that parallel is already installed on most systems. My OS X 10.8.5 machine has it. It's time for me to dust the cobwebs off my shell scripting and update my for loops to parallel...
-
Kaushik Ghose over 9 yearsFor pointing out that functions in a bash script can be parallelized - never would have guessed that! - this should be the chosen answer
-
chovy over 8 yearshow would i do 1000 things, 50 at a time? in a loop of say
$(seq 1 1000)
-
μολὼν.λαβέ about 8 yearsthis seems to get real messy when using search/replace that have characters that must be escaped.
-
Vinícius Ferrão about 8 yearsI've tried to use this on FreeBSD with /bin/sh but it got stuck in the while loop.
-
elwarren almost 8 yearsThe seq command supports step increments:
$(seq 1 50 1000)
but it's up to you to do 50 things in each loop @chovy -
Szymon Sadło over 7 years
#!/bin/sh
produced./test.sh: 10: [: 2: unexpected operator
and stuck in infinite loop. Use#!/bin/bash
to fix that -
OstermanA about 3 yearsVery minor, inconsequential improvement: Using
local range=$(eval echo {0..$((${#pids[@]}-1))})
is much slower than using the built-infor i in ${!pids[@]}
. Using the${!
is more common on an associative array, but on a basic array will spit out all of the indices just fine, at least as far back as Bash 4.1.