run a script in multiple folders in parallel

6,944

Solution 1

Assuming this does the right thing - only in serial:

find dir_* -type f -execdir sh for_loop.sh {} \;

Then you should be able to replace that with:

find dir_* -type f | parallel 'cd {//} && sh for_loop.sh {}'

To run it in multiple terminals GNU Parallel supports tmux to run each command in its own tmux pane:

find dir_* -type f | parallel --tmuxpane 'cd {//} && sh for_loop.sh {}'

It defaults to one job per CPU core. In your case you might want to run one more job than you have cores:

 find dir_* -type f | parallel -j+1 --tmuxpane 'cd {//} && sh for_loop.sh {}'

GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to.

If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:

Simple scheduling

GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:

GNU Parallel scheduling

Installation

For security reasons you should install GNU Parallel with your package manager, but if GNU Parallel is not packaged for your distribution, you can do a personal installation, which does not require root access. It can be done in 10 seconds by doing this:

$ (wget -O - pi.dk/3 || lynx -source pi.dk/3 || curl pi.dk/3/ || \
   fetch -o - http://pi.dk/3 ) > install.sh
$ sha1sum install.sh | grep 883c667e01eed62f975ad28b6d50e22a
12345678 883c667e 01eed62f 975ad28b 6d50e22a
$ md5sum install.sh | grep cc21b4c943fd03e93ae1ae49e28573c0
cc21b4c9 43fd03e9 3ae1ae49 e28573c0
$ sha512sum install.sh | grep da012ec113b49a54e705f86d51e784ebced224fdf
79945d9d 250b42a4 2067bb00 99da012e c113b49a 54e705f8 6d51e784 ebced224
fdff3f52 ca588d64 e75f6033 61bd543f d631f592 2f87ceb2 ab034149 6df84a35
$ bash install.sh

For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README

Learn more

See more examples: http://www.gnu.org/software/parallel/man.html

Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html

Sign up for the email list to get support: https://lists.gnu.org/mailman/listinfo/parallel

Solution 2

Probably the perfect tool for this is GNU Parallel:

parallel ::: dir_*/for_loop.sh

GNU Parallel not only runs each job in parallel, but also it demultiplexes their output so they won't interfere with each other.

From its man page:

GNU parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. The typical input is a list of files, a list of hosts, a list of users, a list of URLs, or a list of tables. A job can also be a command that reads from a pipe. GNU parallel can then split the input into blocks and pipe a block into each command in parallel.

If you use xargs and tee today you will find GNU parallel very easy to use as GNU parallel is written to have the same options as xargs. If you write loops in shell, you will find GNU parallel may be able to replace most of the loops and make them run faster by running several jobs in parallel.

GNU parallel makes sure output from the commands is the same output as you would get had you run the commands sequentially. This makes it possible to use output from GNU parallel as input for other programs.

Solution 3

find won't do that for you.

create a skript, locate your for_loop.sh scripts and execute them, like so:

#!/bin/bash

for theScript in $(find dir_* -name for_loop.sh); do
  "$theScript" &
done

if the script has to be run inside the sub-dir, try to cd into before, maybe like cd $(dirname "$theScript") && . $(basename "$theScript").

my examples are not tested in detail and not error-tolerant ...

Edit 1:

As Sato Katsura commented correctly, the script above breaks if there are spaces in the directory name.

So I changed to loop to read:

#!/bin/bash
find dir_* -name for_loop.sh | while IFS= read -r theScript; do
  "$theScript" &
done

Solution 4

You should be passing on find's output to xargs, running in parallel mode:

find dir_*/ -type f -name for_loop.sh -print0 | xargs -0 -r -n 1 -P 3 -t sh

We are asking find here to find all files with names of for_loop.sh recursively under the directories beginning with the names dir_ and pass them on to xargs, a file at a time, in parallel mode of running no more than 3 processes at any given time.

Use is made of the null delimiter \0 in printing filenames by find and splitting them on nulls by xargs.

Share:
6,944

Related videos on Youtube

user233520
Author by

user233520

Updated on September 18, 2022

Comments

  • user233520
    user233520 over 1 year

    I have several sub-directories within on high level directory. Each sub-directory has several files and a for loop shell script. The same for loop script is present in each sub-directory. I want to go into each sub-directory and run the for loop script in parallel in several terminals. I tried this but it seems to do serially (one after another) but I want run all of them in parallel.

    find dir_* -type f -execdir sh for_loop.sh {} \;
    
  • Satō Katsura
    Satō Katsura about 7 years
    This breaks if you have spaces in the names of a directories.
  • Admin
    Admin about 7 years
    It would still break on spaces. You need IFS= in the while read. And filenames with newlines are still not handled properly.
  • ChristophS
    ChristophS about 7 years
    You are right, but please notice the answer points to the concrete question to execute the script for_loop.sh inside each directory in parrallel, not to handle each file. The file handling inside the directories is up to the corresponding script. But thanks for editing anyway.
  • tripleee
    tripleee about 7 years
    Like the similar answer by ChristophS, this breaks if the output from find contains whitespace or shell wildcard characters which end up not matching themselves and only themselves.
  • M4rty
    M4rty about 7 years
    I vote up for this answer because it does not require anything and is more complete than mine
  • Nikos Alexandris
    Nikos Alexandris almost 7 years
    Off-topic: how did you produce the graphics?
  • Ole Tange
    Ole Tange almost 7 years
    LibreOffice Draw.
  • SuperSandro2000
    SuperSandro2000 over 3 years
    If your parallel does not support --tmuxpane you probably have the moreutil one. On Debian based systems just do apt install parallel.