Wait for set of qsub jobs to complete

qsub

38,808

Solution 1

Launch your qsub jobs, using the -N option to give them arbitrary names (job1, job2, etc):

qsub -N job1 -cwd ./job1_script
qsub -N job2 -cwd ./job2_script
qsub -N job3 -cwd ./job3_script

Launch your script and tell it to wait until the jobs named job1, job2 and job3 are finished before it starts:

qsub -hold_jid job1,job2,job3 -cwd ./results_script

Solution 2

If all the jobs have a common pattern in the name, you can provide that pattern when you submit the jobs. https://linux.die.net/man/1/sge_types shows you what patterns you can use. example:

-hold_jid "job_name_pattern*"

Solution 3

Another alternative (from here) is as follows:

FIRST=$(qsub job1.pbs)
echo $FIRST
SECOND=$(qsub -W depend=afterany:$FIRST job2.pbs)
echo $SECOND
THIRD=$(qsub -W depend=afterany:$SECOND job3.pbs)
echo $THIRD

The insight is that qsub returns the jobid and this is typically dumped to standard output. Instead, capture it in a variable ($FIRST, $SECOND, $THIRD) and use the -W depend=afterany:[JOBIDs] flag when you enqueue your jobs to control the dependency structure of when they are dequeued.

Solution 4

qsub -hold_jid job1,job2,job3 -cwd ./myscript

Solution 5

This works in bash, but the ideas should be portable. Use -terse to facilitate building up a string with job ids to wait on; then submit a dummy job that uses -hold_jid to wait on the previous jobs and -sync y so that qsub doesn't return until it (and thus all prereqs) has finished:

# example where each of three jobs just sleeps for some time:
job_ids=$(qsub -terse -b y sleep 10)
job_ids=job_ids,$(qsub -terse -b y sleep 20)
job_ids=job_ids,$(qsub -terse -b y sleep 30)
qsub -hold_jid ${job_ids} -sync y -b y echo "DONE"

-terse option makes the output of qsub just be the job id
-hold_jid option (as mentioned in other answers) makes a job wait on specified job ids
-sync y option (referenced by the OP) asks qsub not to return until the submitted job is finished
-b y specifies that the command is not a path to a script file (for instance, I'm using sleep 30 as the command)

See the man page for more details.

View more solutions

38,808

Author by

Nupur

Updated on June 05, 2020

Comments

Nupur about 4 years

I have a batch script which starts off a couple of qsub jobs, and I want to trap when they are all completed.

I don't want to use the -sync option, because I want them to be running simultaneously. Each job has a different set of command line parameters.

I want my script to wait till when all the jobs have been completed, and do something after that. I don't want to use the sleep function e.g. to check if certain files have been generated after each 30 s, because this is a drain on resources.

I believe Torque may have some options, but I am running SGE.

Any ideas on how I could implement this please?

Thanks P.s. I did find another thread Link

which had a reponse

You can use wait to stop execution until all your jobs are done. You can even collect all the exit statuses and other running statistics (time it took, count of jobs done at the time, whatever) if you cycle around waiting for specific ids.

but I am not sure how to use it without polling on some value. Can bash trap be used, but how would I with qsub?
Mick MacCallum almost 12 years

To improve the quality of your post please include why/how your post solves the problem.
naught101 almost 8 years

This seems to not work if the list of jobs is too long (I have 40 jobs, the command ends up being 940 chars...)
naught101 almost 8 years

Hrm.. no, that's not the problem. It's that PBS Pro uses a different format. You need to use -W depend=afterok:<job_id>[:<job_id>:...]
gaitat about 6 years

is there a way to pass arguments to results_script?
safay almost 3 years

The shlex import does not look necessary. Also your shebang looks environment specific; #!/usr/bin/env python is more portable.