Forking / Multi-Threaded Processes | Bash

128,756

Solution 1

In bash scripts (non-interactive) by default JOB CONTROL is disabled so you can't do the the commands: job, fg, and bg.

Here is what works well for me:

#!/bin/sh

set -m # Enable Job Control

for i in `seq 30`; do # start 30 jobs in parallel
  sleep 3 &
done

# Wait for all parallel jobs to finish
while [ 1 ]; do fg 2> /dev/null; [ $? == 1 ] && break; done

The last line uses "fg" to bring a background job into the foreground. It does this in a loop until fg returns 1 ($? == 1), which it does when there are no longer any more background jobs.

Solution 2

I don't know of any explicit fork call in bash. What you probably want to do is append & to a command that you want to run in the background. You can also use & on functions that you define within a bash script:

do_something_with_line()
{
  line=$1
  foo
  foo2
  foo3
}

for line in file
do
  do_something_with_line $line &
done

EDIT: to put a limit on the number of simultaneous background processes, you could try something like this:

for line in file
do
  while [`jobs | wc -l` -ge 50 ]
  do
    sleep 5
  done
  do_something_with_line $line &
done

Solution 3

I don't like using wait because it gets blocked until the process exits, which is not ideal when there are multiple process to wait on as I can't get a status update until the current process is done. I prefer to use a combination of kill -0 and sleep to this.

Given an array of pids to wait on, I use the below waitPids() function to get a continuous feedback on what pids are still pending to finish.

declare -a pids
waitPids() {
    while [ ${#pids[@]} -ne 0 ]; do
        echo "Waiting for pids: ${pids[@]}"
        local range=$(eval echo {0..$((${#pids[@]}-1))})
        local i
        for i in $range; do
            if ! kill -0 ${pids[$i]} 2> /dev/null; then
                echo "Done -- ${pids[$i]}"
                unset pids[$i]
            fi
        done
        pids=("${pids[@]}") # Expunge nulls created by unset.
        sleep 1
    done
    echo "Done!"
}

When I start a process in the background, I add its pid immediately to the pids array by using this below utility function:

addPid() {
    local desc=$1
    local pid=$2
    echo "$desc -- $pid"
    pids=(${pids[@]} $pid)
}

Here is a sample that shows how to use:

for i in {2..5}; do
    sleep $i &
    addPid "Sleep for $i" $!
done
waitPids

And here is how the feedback looks:

Sleep for 2 -- 36271
Sleep for 3 -- 36272
Sleep for 4 -- 36273
Sleep for 5 -- 36274
Waiting for pids: 36271 36272 36273 36274
Waiting for pids: 36271 36272 36273 36274
Waiting for pids: 36271 36272 36273 36274
Done -- 36271
Waiting for pids: 36272 36273 36274
Done -- 36272
Waiting for pids: 36273 36274
Done -- 36273
Waiting for pids: 36274
Done -- 36274
Done!

Solution 4

With GNU Parallel you can do:

cat file | parallel 'foo {}; foo2 {}; foo3 {}'

This will run one job on each cpu core. To run 50 do:

cat file | parallel -j 50 'foo {}; foo2 {}; foo3 {}'

Watch the intro videos to learn more:

http://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

Solution 5

haridsv's approach is great, it gives the flexibility to run a processor slots setup where a number of processes can be kept running with new jobs submitting as jobs complete, keeping the overall load up. Here are my mods to haridsv's code for an n-slot processor for a 'grid' of ngrid 'jobs' ( I use it for grids of simulation models ) Followed by test output for 8 jobs 3 at a time, with running totals of running, submitted, completed and remaining

#!/bin/bash
########################################################################
# see haridsv on forking-multi-threaded-processes-bash
# loop over grid, submitting jobs in the background.
# As jobs complete new ones are set going to keep the number running
# up to n as much as possible, until it tapers off at the end.
#
# 8 jobs
ngrid=8
# 3 at a time
n=3
# running counts
running=0
completed=0
# previous values
prunning=0
pcompleted=0
#
########################################################################
# process monitoring functions
#
declare -a pids
#
function checkPids() {
echo  ${#pids[@]}
if [ ${#pids[@]} -ne 0 ]
then
    echo "Checking for pids: ${pids[@]}"
    local range=$(eval echo {0..$((${#pids[@]}-1))})
    local i
    for i in $range; do
        if ! kill -0 ${pids[$i]} 2> /dev/null; then
            echo "Done -- ${pids[$i]}"
            unset pids[$i]
            completed=$(expr $completed + 1)
        fi
    done
    pids=("${pids[@]}") # Expunge nulls created by unset.
    running=$((${#pids[@]}))
    echo "#PIDS :"$running
fi
}
#
function addPid() {
    desc=$1
    pid=$2
    echo " ${desc} - "$pid
    pids=(${pids[@]} $pid)
}
########################################################################
#
# Loop and report when job changes happen,
# keep going until all are completed.
#
idx=0
while [ $completed -lt ${ngrid} ]
do
#
    if [ $running -lt $n ] && [ $idx -lt ${ngrid} ]
    then
####################################################################
#
# submit a new process if less than n
# are running and we haven't finished...
#
# get desc for process
#
        name="job_"${idx}
# background execution
        sleep 3 &
        addPid $name $!
        idx=$(expr $idx + 1)
#
####################################################################
#
    fi
#
    checkPids
# if something changes...
    if [ ${running} -gt ${prunning} ] || \
       [ ${completed} -gt ${pcompleted} ]
    then
        remain=$(expr $ngrid - $completed)
        echo  " Running: "${running}" Submitted: "${idx}\
              " Completed: "$completed" Remaining: "$remain
    fi
# save counts to prev values
    prunning=${running}
    pcompleted=${completed}
#
    sleep 1
#
done
#
########################################################################

Test output:

 job_0 - 75257
1
Checking for pids: 75257
#PIDS :1
 Running: 1 Submitted: 1  Completed: 0 Remaining: 8
 job_1 - 75262
2
Checking for pids: 75257 75262
#PIDS :2
 Running: 2 Submitted: 2  Completed: 0 Remaining: 8
 job_2 - 75267
3
Checking for pids: 75257 75262 75267
#PIDS :3
 Running: 3 Submitted: 3  Completed: 0 Remaining: 8
3
Checking for pids: 75257 75262 75267
Done -- 75257
#PIDS :2
 Running: 2 Submitted: 3  Completed: 1 Remaining: 7
 job_3 - 75277
3
Checking for pids: 75262 75267 75277
Done -- 75262
#PIDS :2
 Running: 2 Submitted: 4  Completed: 2 Remaining: 6
 job_4 - 75283
3
Checking for pids: 75267 75277 75283
Done -- 75267
#PIDS :2
 Running: 2 Submitted: 5  Completed: 3 Remaining: 5
 job_5 - 75289
3
Checking for pids: 75277 75283 75289
#PIDS :3
 Running: 3 Submitted: 6  Completed: 3 Remaining: 5
3
Checking for pids: 75277 75283 75289
Done -- 75277
#PIDS :2
 Running: 2 Submitted: 6  Completed: 4 Remaining: 4
 job_6 - 75298
3
Checking for pids: 75283 75289 75298
Done -- 75283
#PIDS :2
 Running: 2 Submitted: 7  Completed: 5 Remaining: 3
 job_7 - 75304
3
Checking for pids: 75289 75298 75304
Done -- 75289
#PIDS :2
 Running: 2 Submitted: 8  Completed: 6 Remaining: 2
2
Checking for pids: 75298 75304
#PIDS :2
2
Checking for pids: 75298 75304
Done -- 75298
#PIDS :1
 Running: 1 Submitted: 8  Completed: 7 Remaining: 1
1
Checking for pids: 75304
Done -- 75304
#PIDS :0
 Running: 0 Submitted: 8  Completed: 8 Remaining: 0
Share:
128,756
Matt Nguyen
Author by

Matt Nguyen

Updated on July 28, 2022

Comments

  • Matt Nguyen
    Matt Nguyen almost 2 years

    I would like to make a section of my code more efficient. I'm thinking of making it fork off into multiple processes and have them execute 50/100 times at once, instead of just once.

    For example (pseudo):

    for line in file;
    do 
    foo;
    foo2;
    foo3;
    done
    

    I would like this for loop to run multiple times. I know this can be done with forking. Would it look something like this?

    while(x <= 50)
    parent(child pid)
    {
       fork child()
    }
    child
    {
       do 
       foo; foo2; foo3; 
       done
       return child_pid()
    }
    

    Or am I thinking about this the wrong way?

    Thanks!

  • Michael Krelin - hacker
    Michael Krelin - hacker over 14 years
    You've miscapitalized the do_something… name ;-)
  • Matt Nguyen
    Matt Nguyen over 14 years
    Got it - What about when I want to make sure I am only running 50 instances at a time? And - when one of those processes are done, make sure 1 more is spawned.
  • Matt Nguyen
    Matt Nguyen over 14 years
    Ah yes - I didn't see the last line of your answer. Thank you very much. I'll get to work.
  • Michael Krelin - hacker
    Michael Krelin - hacker over 14 years
    I've added the line after you asked, so it's alright that you failed to read my mind ;-) (just as I failed to read yours before you asked:)). BTW, man bash is one great source for the information on job control. Once you're gone this path you may have many questions ;-)
  • NVRAM
    NVRAM over 14 years
    For the value of a variable, you must use $varname -- otherwise you'll just pass the literal (string) "varname"
  • NVRAM
    NVRAM over 14 years
    You probably also want $file (or use while read line).
  • Rafael Xavier
    Rafael Xavier almost 12 years
    In bash scripts, you are able to use wait, eg: sleep 3 & WAITPID=$!; wait $WAITPID, or concanate the pids this way WAITPIDS="$WAITPIDS "$!;...; wait $WAITPIDS
  • cape1232
    cape1232 about 11 years
    +1 mob. I modified this into a function that you add to your file of commands after anything you put in the background. Then you can run some commands in the file sequentially and only run some of them in the background: #!/bin/bash waitpid(){ while [[ jobs | wc -l -ge $1 ]]; do sleep 1; done; }
  • labyrinth
    labyrinth almost 10 years
    I would add that parallel is already installed on most systems. My OS X 10.8.5 machine has it. It's time for me to dust the cobwebs off my shell scripting and update my for loops to parallel...
  • Kaushik Ghose
    Kaushik Ghose over 9 years
    For pointing out that functions in a bash script can be parallelized - never would have guessed that! - this should be the chosen answer
  • chovy
    chovy over 8 years
    how would i do 1000 things, 50 at a time? in a loop of say $(seq 1 1000)
  • μολὼν.λαβέ
    μολὼν.λαβέ about 8 years
    this seems to get real messy when using search/replace that have characters that must be escaped.
  • Vinícius Ferrão
    Vinícius Ferrão about 8 years
    I've tried to use this on FreeBSD with /bin/sh but it got stuck in the while loop.
  • elwarren
    elwarren almost 8 years
    The seq command supports step increments: $(seq 1 50 1000) but it's up to you to do 50 things in each loop @chovy
  • Szymon Sadło
    Szymon Sadło over 7 years
    #!/bin/sh produced ./test.sh: 10: [: 2: unexpected operator and stuck in infinite loop. Use #!/bin/bash to fix that
  • OstermanA
    OstermanA about 3 years
    Very minor, inconsequential improvement: Using local range=$(eval echo {0..$((${#pids[@]}-1))}) is much slower than using the built-in for i in ${!pids[@]}. Using the ${! is more common on an associative array, but on a basic array will spit out all of the indices just fine, at least as far back as Bash 4.1.