How to cancel jobs on Slurm with job ID(job number) bigger than a certain number?
Solution 1
It is not strictly answering how to cancel jobs greater that a given number, but it would work for the problem @mona-jalilvand was trying to solve : cancel jobs in a range as described here
scancel {1000..1050}
Much simpler then getting into bash scripting... worked well for me.
Solution 2
You need to use squeue
to get a list of all jobs belonging to you, then loop over that list and cancel each job that matches your criterion (job id greater than X) with scancel
.
squeue
has very flexible output options, with its -o
aka --format
option you can tell it to print exactly what you want. In this case, we only want the job number. According to man squeue
, that is %i
in the format string:
%i Job or job step id.
In the case of job arrays, the job ID format will be of the form "_". By default, the job array index field size will be limited to 64 bytes. Use the environment variable SLURM_BITSTR_LEN to specify larger field sizes. (Valid for jobs and job steps) In the case of heterogeneous job allocations, the job ID format will be of the form "#+#" where the first number is the "heterogeneous job leader" and the second number the zero origin offset for each component of the job.
The following sh script uses that:
#!/bin/sh
if [ -z "$1" ] ; then
echo "Minimum Job Number argument is required. Run as '$0 jobnum'"
exit 1
fi
minjobnum="$1"
myself="$(id -u -n)"
for j in $(squeue --user="$myself" --noheader --format='%i') ; do
if [ "$j" -gt "$minjobnum" ] ; then
scancel "$j"
fi
done
Save it as, e.g., cancel-jobs.sh
, make it executable with chmod +x cancel-jobs.sh
, and run it as ./cancel-jobs.sh 50000
to cancel all your jobs with job ids greater than 50,000.
If you want it to cancel job numbers greater than or equal to the minimum job number, change the -gt
to -ge
.
One minor optimisation is to not cancel each job individually, but to build up a list of matching job numbers and then cancel them all with one invocation of scancel
.
For example, using a bash array to hold the job numbers:
#!/bin/bash
declare -a jobs=()
if [ -z "$1" ] ; then
echo "Minimum Job Number argument is required. Run as '$0 jobnum'"
exit 1
fi
minjobnum="$1"
myself="$(id -u -n)"
for j in $(squeue --user="$myself" --noheader --format='%i') ; do
if [ "$j" -gt "$minjobnum" ] ; then
jobs+=($j)
fi
done
scancel "${jobs[@]}"
This does the same thing as the first script, but slightly more efficiently.
Related videos on Youtube
Admin
Updated on September 18, 2022Comments
-
Admin over 1 year
I have submitted 800 jobs on Slurm. I want to cancel those jobs that have job ID/number bigger than a number(since there is a mistake in them). I don't want to cancel all my jobs because some are running and some that are in the queue are correct.
-
Alessio over 6 yearsnote: your username may, and probably is, already be in a variable in your shell, probably called
$USER
. I wrote the script to work whether that variable exists or not - if it does exist, you don't need the$myself
variable. Delete themyself="$(id -u -n)"
line and change$myself
to$USER
on the for loop line that runssqueue
. -
mugen over 3 yearsI brute-forced my way out by running
scancel {70000-71000}
to cancel "all" jobs with id bigger than 70000. You deserve best prize for thinking out of the box, saved me an hour of my time thanks! -
Christian O'Reilly over 3 yearsThanks @mugen for your kind words! Happy it could have save you some time :)