run a script in multiple folders in parallel
Solution 1
Assuming this does the right thing - only in serial:
find dir_* -type f -execdir sh for_loop.sh {} \;
Then you should be able to replace that with:
find dir_* -type f | parallel 'cd {//} && sh for_loop.sh {}'
To run it in multiple terminals GNU Parallel supports tmux
to run each command in its own tmux
pane:
find dir_* -type f | parallel --tmuxpane 'cd {//} && sh for_loop.sh {}'
It defaults to one job per CPU core. In your case you might want to run one more job than you have cores:
find dir_* -type f | parallel -j+1 --tmuxpane 'cd {//} && sh for_loop.sh {}'
GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to.
If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:
GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:
Installation
For security reasons you should install GNU Parallel with your package manager, but if GNU Parallel is not packaged for your distribution, you can do a personal installation, which does not require root access. It can be done in 10 seconds by doing this:
$ (wget -O - pi.dk/3 || lynx -source pi.dk/3 || curl pi.dk/3/ || \
fetch -o - http://pi.dk/3 ) > install.sh
$ sha1sum install.sh | grep 883c667e01eed62f975ad28b6d50e22a
12345678 883c667e 01eed62f 975ad28b 6d50e22a
$ md5sum install.sh | grep cc21b4c943fd03e93ae1ae49e28573c0
cc21b4c9 43fd03e9 3ae1ae49 e28573c0
$ sha512sum install.sh | grep da012ec113b49a54e705f86d51e784ebced224fdf
79945d9d 250b42a4 2067bb00 99da012e c113b49a 54e705f8 6d51e784 ebced224
fdff3f52 ca588d64 e75f6033 61bd543f d631f592 2f87ceb2 ab034149 6df84a35
$ bash install.sh
For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README
Learn more
See more examples: http://www.gnu.org/software/parallel/man.html
Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html
Sign up for the email list to get support: https://lists.gnu.org/mailman/listinfo/parallel
Solution 2
Probably the perfect tool for this is GNU Parallel:
parallel ::: dir_*/for_loop.sh
GNU Parallel not only runs each job in parallel, but also it demultiplexes their output so they won't interfere with each other.
From its man page:
GNU parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. The typical input is a list of files, a list of hosts, a list of users, a list of URLs, or a list of tables. A job can also be a command that reads from a pipe. GNU parallel can then split the input into blocks and pipe a block into each command in parallel.
If you use xargs and tee today you will find GNU parallel very easy to use as GNU parallel is written to have the same options as xargs. If you write loops in shell, you will find GNU parallel may be able to replace most of the loops and make them run faster by running several jobs in parallel.
GNU parallel makes sure output from the commands is the same output as you would get had you run the commands sequentially. This makes it possible to use output from GNU parallel as input for other programs.
Solution 3
find
won't do that for you.
create a skript, locate your for_loop.sh scripts and execute them, like so:
#!/bin/bash
for theScript in $(find dir_* -name for_loop.sh); do
"$theScript" &
done
if the script has to be run inside the sub-dir, try to cd
into before, maybe like cd $(dirname "$theScript") && . $(basename "$theScript")
.
my examples are not tested in detail and not error-tolerant ...
Edit 1:
As Sato Katsura commented correctly, the script above breaks if there are spaces in the directory name.
So I changed to loop to read
:
#!/bin/bash
find dir_* -name for_loop.sh | while IFS= read -r theScript; do
"$theScript" &
done
Solution 4
You should be passing on find
's output to xargs
, running in parallel mode:
find dir_*/ -type f -name for_loop.sh -print0 | xargs -0 -r -n 1 -P 3 -t sh
We are asking find
here to find all files with names of for_loop.sh recursively under the directories beginning with the names dir_ and pass them on to xargs, a file at a time, in parallel mode of running no more than 3 processes at any given time.
Use is made of the null delimiter \0
in printing filenames by find
and splitting them on nulls by xargs
.
Related videos on Youtube
user233520
Updated on September 18, 2022Comments
-
user233520 over 1 year
I have several sub-directories within on high level directory. Each sub-directory has several files and a for loop shell script. The same for loop script is present in each sub-directory. I want to go into each sub-directory and run the for loop script in parallel in several terminals. I tried this but it seems to do serially (one after another) but I want run all of them in parallel.
find dir_* -type f -execdir sh for_loop.sh {} \;
-
Satō Katsura about 7 yearsThis breaks if you have spaces in the names of a directories.
-
Admin about 7 yearsIt would still break on spaces. You need
IFS=
in the while read. And filenames with newlines are still not handled properly. -
ChristophS about 7 yearsYou are right, but please notice the answer points to the concrete question to execute the script
for_loop.sh
inside each directory in parrallel, not to handle each file. The file handling inside the directories is up to the corresponding script. But thanks for editing anyway. -
tripleee about 7 yearsLike the similar answer by ChristophS, this breaks if the output from
find
contains whitespace or shell wildcard characters which end up not matching themselves and only themselves. -
M4rty about 7 yearsI vote up for this answer because it does not require anything and is more complete than mine
-
Nikos Alexandris almost 7 yearsOff-topic: how did you produce the graphics?
-
Ole Tange almost 7 yearsLibreOffice Draw.
-
SuperSandro2000 over 3 yearsIf your parallel does not support
--tmuxpane
you probably have the moreutil one. On Debian based systems just doapt install parallel
.