Wait for set of qsub jobs to complete
Solution 1
Launch your qsub jobs, using the -N option to give them arbitrary names (job1, job2, etc):
qsub -N job1 -cwd ./job1_script
qsub -N job2 -cwd ./job2_script
qsub -N job3 -cwd ./job3_script
Launch your script and tell it to wait until the jobs named job1, job2 and job3 are finished before it starts:
qsub -hold_jid job1,job2,job3 -cwd ./results_script
Solution 2
If all the jobs have a common pattern in the name, you can provide that pattern when you submit the jobs. https://linux.die.net/man/1/sge_types shows you what patterns you can use. example:
-hold_jid "job_name_pattern*"
Solution 3
Another alternative (from here) is as follows:
FIRST=$(qsub job1.pbs)
echo $FIRST
SECOND=$(qsub -W depend=afterany:$FIRST job2.pbs)
echo $SECOND
THIRD=$(qsub -W depend=afterany:$SECOND job3.pbs)
echo $THIRD
The insight is that qsub returns the jobid and this is typically dumped to standard output. Instead, capture it in a variable ($FIRST
, $SECOND
, $THIRD
) and use the -W depend=afterany:[JOBIDs]
flag when you enqueue your jobs to control the dependency structure of when they are dequeued.
Solution 4
qsub -hold_jid job1,job2,job3 -cwd ./myscript
Solution 5
This works in bash, but the ideas should be portable. Use -terse
to facilitate building up a string with job ids to wait on; then submit a dummy job that uses -hold_jid
to wait on the previous jobs and -sync y
so that qsub doesn't return until it (and thus all prereqs) has finished:
# example where each of three jobs just sleeps for some time:
job_ids=$(qsub -terse -b y sleep 10)
job_ids=job_ids,$(qsub -terse -b y sleep 20)
job_ids=job_ids,$(qsub -terse -b y sleep 30)
qsub -hold_jid ${job_ids} -sync y -b y echo "DONE"
-
-terse
option makes the output of qsub just be the job id -
-hold_jid
option (as mentioned in other answers) makes a job wait on specified job ids -
-sync y
option (referenced by the OP) asks qsub not to return until the submitted job is finished -
-b y
specifies that the command is not a path to a script file (for instance, I'm usingsleep 30
as the command)
See the man page for more details.
Nupur
Updated on June 05, 2020Comments
-
Nupur about 4 years
I have a batch script which starts off a couple of qsub jobs, and I want to trap when they are all completed.
I don't want to use the -sync option, because I want them to be running simultaneously. Each job has a different set of command line parameters.
I want my script to wait till when all the jobs have been completed, and do something after that. I don't want to use the sleep function e.g. to check if certain files have been generated after each 30 s, because this is a drain on resources.
I believe Torque may have some options, but I am running SGE.
Any ideas on how I could implement this please?
Thanks P.s. I did find another thread Link
which had a reponse
You can use wait to stop execution until all your jobs are done. You can even collect all the exit statuses and other running statistics (time it took, count of jobs done at the time, whatever) if you cycle around waiting for specific ids.
but I am not sure how to use it without polling on some value. Can bash trap be used, but how would I with qsub?
-
Mick MacCallum almost 12 yearsTo improve the quality of your post please include why/how your post solves the problem.
-
naught101 almost 8 yearsThis seems to not work if the list of jobs is too long (I have 40 jobs, the command ends up being 940 chars...)
-
naught101 almost 8 yearsHrm.. no, that's not the problem. It's that PBS Pro uses a different format. You need to use
-W depend=afterok:<job_id>[:<job_id>:...]
-
gaitat about 6 yearsis there a way to pass arguments to
results_script
? -
safay almost 3 yearsThe
shlex
import does not look necessary. Also your shebang looks environment specific;#!/usr/bin/env python
is more portable.