How to cancel jobs on Slurm with job ID(job number) bigger than a certain number?

10,510

Solution 1

It is not strictly answering how to cancel jobs greater that a given number, but it would work for the problem @mona-jalilvand was trying to solve : cancel jobs in a range as described here

scancel  {1000..1050}

Much simpler then getting into bash scripting... worked well for me.

Solution 2

You need to use squeue to get a list of all jobs belonging to you, then loop over that list and cancel each job that matches your criterion (job id greater than X) with scancel.

squeue has very flexible output options, with its -o aka --format option you can tell it to print exactly what you want. In this case, we only want the job number. According to man squeue, that is %i in the format string:

%i Job or job step id.

In the case of job arrays, the job ID format will be of the form "_". By default, the job array index field size will be limited to 64 bytes. Use the environment variable SLURM_BITSTR_LEN to specify larger field sizes. (Valid for jobs and job steps) In the case of heterogeneous job allocations, the job ID format will be of the form "#+#" where the first number is the "heterogeneous job leader" and the second number the zero origin offset for each component of the job.

The following sh script uses that:

#!/bin/sh

if [ -z "$1" ] ; then
    echo "Minimum Job Number argument is required.  Run as '$0 jobnum'"
    exit 1
fi

minjobnum="$1"

myself="$(id -u -n)"

for j in $(squeue --user="$myself" --noheader --format='%i') ; do
  if [ "$j" -gt "$minjobnum" ] ; then
    scancel "$j"
  fi
done

Save it as, e.g., cancel-jobs.sh, make it executable with chmod +x cancel-jobs.sh, and run it as ./cancel-jobs.sh 50000 to cancel all your jobs with job ids greater than 50,000.

If you want it to cancel job numbers greater than or equal to the minimum job number, change the -gt to -ge.


One minor optimisation is to not cancel each job individually, but to build up a list of matching job numbers and then cancel them all with one invocation of scancel.

For example, using a bash array to hold the job numbers:

#!/bin/bash

declare -a jobs=()

if [ -z "$1" ] ; then
    echo "Minimum Job Number argument is required.  Run as '$0 jobnum'"
    exit 1
fi

minjobnum="$1"

myself="$(id -u -n)"

for j in $(squeue --user="$myself" --noheader --format='%i') ; do
  if [ "$j" -gt "$minjobnum" ] ; then
    jobs+=($j)
  fi
done

scancel "${jobs[@]}"

This does the same thing as the first script, but slightly more efficiently.

Share:
10,510

Related videos on Youtube

Admin
Author by

Admin

Updated on September 18, 2022

Comments

  • Admin
    Admin over 1 year

    I have submitted 800 jobs on Slurm. I want to cancel those jobs that have job ID/number bigger than a number(since there is a mistake in them). I don't want to cancel all my jobs because some are running and some that are in the queue are correct.

  • Alessio
    Alessio over 6 years
    note: your username may, and probably is, already be in a variable in your shell, probably called $USER. I wrote the script to work whether that variable exists or not - if it does exist, you don't need the $myself variable. Delete the myself="$(id -u -n)" line and change $myself to $USER on the for loop line that runs squeue.
  • mugen
    mugen over 3 years
    I brute-forced my way out by running scancel {70000-71000} to cancel "all" jobs with id bigger than 70000. You deserve best prize for thinking out of the box, saved me an hour of my time thanks!
  • Christian O'Reilly
    Christian O'Reilly over 3 years
    Thanks @mugen for your kind words! Happy it could have save you some time :)