Parallel processing from a command queue on Linux (bash, python, ruby... whatever)

21,984

Solution 1

I would imagine you could do this using make and the make -j xx command.

Perhaps a makefile like this

all : usera userb userc....

usera:
       imapsync usera
userb:
       imapsync userb
....

make -j 10 -f makefile

Solution 2

On the shell, xargs can be used to queue parallel command processing. For example, for having always 3 sleeps in parallel, sleeping for 1 second each, and executing 10 sleeps in total do

echo {1..10} | xargs -d ' ' -n1 -P3 sh -c 'sleep 1s' _

And it would sleep for 4 seconds in total. If you have a list of names, and want to pass the names to commands executed, again executing 3 commands in parallel, do

cat names | xargs -n1 -P3 process_name

Would execute the command process_name alice, process_name bob and so on.

Solution 3

Parallel is made exatcly for this purpose.

cat userlist | parallel imapsync

One of the beauties of Parallel compared to other solutions is that it makes sure output is not mixed. Doing traceroute in Parallel works fine for example:

(echo foss.org.my; echo www.debian.org; echo www.freenetproject.org) | parallel traceroute

Solution 4

For this kind of job PPSS is written: Parallel processing shell script. Google for this name and you will find it, I won't linkspam.

Solution 5

GNU make (and perhaps other implementations as well) has the -j argument, which governs how many jobs it will run at once. When a job completes, make will start another one.

Share:
21,984

Related videos on Youtube

mlambie
Author by

mlambie

I like Brazilian jiu jitsu, Xbox, coffee, Lego, comics, knitting, cycling, helicopters, skateboarding and skydiving.

Updated on September 14, 2020

Comments

  • mlambie
    mlambie over 3 years

    I have a list/queue of 200 commands that I need to run in a shell on a Linux server.

    I only want to have a maximum of 10 processes running (from the queue) at once. Some processes will take a few seconds to complete, other processes will take much longer.

    When a process finishes I want the next command to be "popped" from the queue and executed.

    Does anyone have code to solve this problem?

    Further elaboration:

    There's 200 pieces of work that need to be done, in a queue of some sort. I want to have at most 10 pieces of work going on at once. When a thread finishes a piece of work it should ask the queue for the next piece of work. If there's no more work in the queue, the thread should die. When all the threads have died it means all the work has been done.

    The actual problem I'm trying to solve is using imapsync to synchronize 200 mailboxes from an old mail server to a new mail server. Some users have large mailboxes and take a long time tto sync, others have very small mailboxes and sync quickly.

  • mlambie
    mlambie over 15 years
    This worked exactly as I hoped it would. I wrote some code to generate the Makefile. It ended up being over 1000 lines. Thanks!
  • Cristian Ciupitu
    Cristian Ciupitu about 15 years
    1) os.system could be replaced with the new improved subprocess module. 2) It doesn't matter that CPython has a GIL because you're running external commands, not Python code (functions).
  • jfs
    jfs about 15 years
    If you replace threading.Thread by multiprocessing.Process and Queue by multiprocessing.Queue then the code will run using multiple processes.
  • Devrim
    Devrim over 14 years
    awesome! can we set it up dynamic threading that e.g. 80% CPU/Ram allow ?
  • chiggsy
    chiggsy over 13 years
    Man I love this tool. I've known about it for like 3h and I am going to use it until the very stones of the earth cry out for me to stop.
  • Yuvi
    Yuvi almost 13 years
    Almost exactly what I was looking for. /me goes back to struggling to get it working.
  • rogerdpack
    rogerdpack over 12 years
    pssh is written in python I think
  • Kyle Simek
    Kyle Simek over 12 years
    I found that if any of the commands exits with an error code, make will exit, preventing execution of future jobs. In some situations, this solution is less than ideal. Any recommendations for this scenario?
  • myroslav
    myroslav about 12 years
    Fedora 16 included the tool into package repository
  • Joseph Lisee
    Joseph Lisee almost 12 years
    @redmoskito If you run make with the "-k" option it will keep running even if there are errors.
  • Yauhen Yakimovich
    Yauhen Yakimovich over 11 years
    if one starts thinking of make not as "task scheduler" but as of "parallel compilation" tool.. I guess the bigger picture is that 'make -j' respects dependencies, which makes this solution mind blowing once applied universally.
  • Warrick
    Warrick almost 11 years
    Wow, I use xargs all the time and never expected it would have this option!
  • Brian Minton
    Brian Minton over 10 years
    you can use the wait command for a specific child process too. It can be given any number of arguments, each of which can be a pid or job id.
  • Jonathan Leffler
    Jonathan Leffler over 10 years
    @BrianMinton: you're right that you can list specific PIDs with wait, but you still get the 'all of them dead' behaviour, not 'first one dead' which is what this code really needs.
  • Eddy
    Eddy almost 9 years
    For the second example you give, how do you modify the command so that process_name can take more than one argument? I want to do something like this: cat commands.txt | xargs -n1 -P3 eval where commands.txt has a bunch of commands in it (one on each line, each with multiple arguments). The problem is that eval doesn't work as it's a shell builtin command
  • staticfloat
    staticfloat about 5 years
    @Eddy try using a shell as the program to run; this allows you to use arbitrary shell commands as the inputs. The first answer above does this with sh, but you can do it with bash as well. E.g. if your commands.txt has a bunch of lines in it that look like echo test1; sleep1, you can use that via something like cat commands.txt | xargs -d'\n' -P3 -n1 /bin/bash -c