Parallelizing a for loop

6,606

Solution 1

#!/bin/bash
# set -x # debug version
N=${1:-123}
n=${2:-45}
workers=${workers:-${3:-10}}
((workers < 1)) && ((workers = 1))
((workers > 20)) && ((workers = 20))

((min=100000000000000))  #set min to some garbage value

work() {
  for i in ${*}; do
    for (( j=1; j<=${n}; j++ )); do
      val=$(/path/to/a.out)
      val2=$(echo ${val} | bc)
      (( val2 < min )) && (( min = val2 ));
    done
    echo ${min}
    # # debug version
    # echo ${i} ${j} ${min}
  done
}

# --
arr=($(
  seq ${N} | xargs -n$[N/workers + 1] | while read i; do
    work ${i} &
  done
  wait
))
echo ${arr[*]}
# --

# # debug version
# seq ${N} | xargs -t -n$[N/workers + 1] | while read i; do
#  work ${i} &
# done
# wait

Always use workers when spawning a parameterized numbers of processes and limit the maximum number of workers that can spawn.

xargs -n | while read is a simple means of iterating lists in batches.

  • seq creates a list of numbers from 1 to N.
  • xargs -n breaks that list into N/workers+1 batches.
    • e.g. N=100 workers=10 will produces 10 lines of up to 11 numbers from 1 to 100.
  • while read i reads each line of numbers.
  • work ${i} & just calls the work function with ${i} batch of numbers.

To debug i've added commented-out debug code. Just replace the echo with the debug version and the code between # -- with its debug version and you can see how it works on batches. Uncomment set -x for a more detailed debug output which you might want to redirect to a file.

Just run the debug version with different parameters to watch how it operates:

parallel.sh 223 5 1
parallel.sh 223 5 5
parallel.sh 223 5 10
parallel.sh 223 5 20

Disclaimer: This code doesn't synchronize the min value between worker processes. Obtaining the minimum value isn't a horrible exercise. This'll probably do:

parallel.sh 223 5 20 | tr ' ' '\n' | sort -n | head -1

Or just add the same to the script itself:

echo ${arr[*]} | tr ' ' '\n' | sort -n | head -1

Solution 2

Using GNU Parallel:

#!/bin/bash

N=$1
n=$2

arr=($(
# Generate all combinations of 1..n and 1..N
parallel -k --tag /path/to/a.out {1} {2} '|' bc :::: <(seq $N) <(seq $n) |
  perl -ane 'BEGIN{$min=1e30} $last||=$F[0]; if($F[0] != $last) {print $min,"\n";$min=1e30;$last=$F[0]} $min = $F[2]<$min ? $F[2] : $min; END {print $min,"\n"}'
))
echo ${arr[*]}

This will run /path/to/a.out on each cpu. The output will be like:

1 1 19269
1 2 6158
1 3 2794
1 4 25104
2 1 13160
2 2 32683
2 3 12535
2 4 15197
3 1 8228
3 2 7673
3 3 8428
3 4 24463

The perl script looks at the first column and finds the minimum in the 3rd column that has the same first column.

You can install GNU Parallel simply by:

wget http://git.savannah.gnu.org/cgit/parallel.git/plain/src/parallel
chmod 755 parallel

Watch the intro videos to learn more: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

Share:
6,606

Related videos on Youtube

RIchard Williams
Author by

RIchard Williams

Updated on September 18, 2022

Comments

  • RIchard Williams
    RIchard Williams over 1 year

    I want to parallelize the for loops of the following code. How to do this?

    #!/bin/bash
    N=$1 
    n=$2
    for (( i=1; i<=$N; i++ )); do
      min=100000000000000  //set min to some garbage value
      for (( j=1; j<=$n; j++ )); do
        val=$(/path/to/a.out)
        val2=`echo $val | bc`  
    
          if (( $val2 < $min )); then
            min=$val2; 
          fi
      done
      arr=("${arr[@]}" "$min")
    done
  • RIchard Williams
    RIchard Williams over 12 years
    Can you explain the code? It is not clear to me how you have used worker threads here.
  • nicerobot
    nicerobot over 12 years
    @prasenjit Done. I hope that's helpful.
  • Marco
    Marco over 11 years
    The link in your last paragraph links to a facebook looking page that tells the visitor that he has won an exclusive gift (advertising or scam). There are no videos. You should remove it.