parallel processing using xargs

8,241

Solution 1

You'd like to read the xargs manual and look up the -L and the -P flags in there.

tail -f logfile.log | grep 'patternline' |
xargs -P 4 -L 1 bash scriptname.sh

This will execute at most four instances of the command at a time (-P 4), and with one line of input for each invocation (-L 1).

Add -t to xargs to see what gets executed.

Solution 2

GNU Parallel will do the quoting of the argument correctly for you:

tail -f logfile.log | grep 'patternline' |
  parallel bash scriptname.sh

On top of that it will default to 1 process per CPU core and it will make sure the output of two parallel jobs will not be mixed.

GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to. It can often replace a for loop.

If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:

Simple scheduling

GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:

GNU Parallel scheduling

Installation

If GNU Parallel is not packaged for your distribution, you can do a personal installation, which does not require root access. It can be done in 10 seconds by doing this:

$ (wget -O - pi.dk/3 || lynx -source pi.dk/3 || curl pi.dk/3/ || \
   fetch -o - http://pi.dk/3 ) > install.sh
$ sha1sum install.sh | grep 883c667e01eed62f975ad28b6d50e22a
12345678 883c667e 01eed62f 975ad28b 6d50e22a
$ md5sum install.sh | grep cc21b4c943fd03e93ae1ae49e28573c0
cc21b4c9 43fd03e9 3ae1ae49 e28573c0
$ sha512sum install.sh | grep da012ec113b49a54e705f86d51e784ebced224fdf
79945d9d 250b42a4 2067bb00 99da012e c113b49a 54e705f8 6d51e784 ebced224
fdff3f52 ca588d64 e75f6033 61bd543f d631f592 2f87ceb2 ab034149 6df84a35
$ bash install.sh

For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README

Learn more

See more examples: http://www.gnu.org/software/parallel/man.html

Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html

Sign up for the email list to get support: https://lists.gnu.org/mailman/listinfo/parallel

Share:
8,241

Related videos on Youtube

AVJ
Author by

AVJ

Updated on September 18, 2022

Comments

  • AVJ
    AVJ over 1 year

    How would I execute a bash script in parallel for each line ? Actually, I will be tailing to log file and, for each line found, I want to execute a script in the background; something like the example below:

    tailf logfile.log | grep 'patternline' | while read line ; do 
        bash scriptname.sh "$line" & ; 
    done
    

    I would like to know how to perform the above task using xargs (OR any other suitable method) in parallel and also how to limit processes.

    Thanks in advance.

  • AVJ
    AVJ almost 8 years
    output after after grep command will be a line of Json format, how to enclose input for xargs in double quotes.. ??
  • Kusalananda
    Kusalananda almost 8 years
    @AVJ Why? Can't you do that in the script? Also, do you know about jq? stedolan.github.io/jq
  • Kusalananda
    Kusalananda almost 8 years
    @AVJ the double quoting is not needed here, is it? Just use "$*" in your script to get the complete line without word splitting.
  • Kusalananda
    Kusalananda almost 8 years
    @AVJ If you really want to have double-quoted input, that must happen before xargs. Insert sed -e 's/^/"/' -e 's/$/"/' in the pipe before xargs.
  • Kusalananda
    Kusalananda almost 8 years
    @AVJ With GNU sed you might also want to add -u to make the sed output unbuffered... The same option is -l for BSD sed.
  • AVJ
    AVJ almost 8 years
    it worked using sed before xargs as u said... to parse json using jq, input to script should be enclosed in single or double quotes.
  • AVJ
    AVJ almost 8 years
    same command mentioned in answer, not working now. Here is the command, 'tail -3f logfile | grep "patternline" | sed -u -e "s/^/'/" -e "s/$/'/" | xargs -L 1 -P 5 bash scriptname.sh'.