SLURM: Custom standard output name

20,731

Solution 1

Here is my take away from previous answers

  • %j gives job id
  • %x gives job name
  • I don't know how to get the date in the desired format. Job ID kind of serves as unique identifier across runs and file modified date captures date for later analysis.

My SBATCH magic looks like this:

#SBATCH --output=R-%x.%j.out
#SBATCH --error=R-%x.%j.err

I prefer adding R- as a prefix, that way I can easily move or remove all R-*

Solution 2

You can't, at least not the way you want to do it. Those #SBATCH lines are shell comments which happen to be interpreted by the sbatch command, you can't execute shell code in them.

Also, sbatch's -o option only understands a very limited set of replacement symbols (see man page extracts below).

Probably the closest you can get to what you want is run sbatch in a wrapper script that appends the Job ID, Job Name, and the current date & time in a text file (e.g. timestamp<TAB>jobid<TAB>jobname) and then use that after the job has finished running to rename the output file.

time_t, or seconds-since-the-epoch, i.e. date +%s, is the most useful date/time format in a script. It can easily be pretty-printed however you want without needing to parse it first.

It wouldn't be difficult to write a script that iterated through that text file, and renamed the output files for every jobid that was no longer running (check with squeue -t BF,CA,CD,F,NF,PR,TO) AND hadn't already been renamed.

FYI, The man page for sbatch says:

-o, --output=

Instruct Slurm to connect the batch script's standard output directly to the file name specified in the "filename pattern". By default both standard output and standard error are directed to the same file. For job arrays, the default file name is slurm-%A_%a.out, %A is replaced by the job ID and %a with the array index. For other jobs, the default file name is slurm-%j.out, where the %j is replaced by the job ID. See the --input option for filename specification options.

And, for -i it says:

-i, --input=

Instruct Slurm to connect the batch script's standard input directly to the file name specified in the "filename pattern". By default, /dev/null is open on the batch script's standard input and both standard output and standard error are directed to a file of the name slurm-%j.out, where the %j is replaced with the job allocation number, as described below.

The filename pattern may contain one or more replacement symbols, which are a percent sign % followed by a letter (e.g. %j).

Supported replacement symbols are:

%A Job array's master job allocation number.
%a Job array ID (index) number.
%j Job allocation number.
%N Node name. Only one file is created, so %N will be replaced
   by the name of the first node in the job, which is the one
   that runs the script.
%u User name.

Solution 3

The documentation for the sbatchcommand provides a comprehensive list of the allowed character replacements: at sbatch man page

filename pattern sbatch allows for a filename pattern to contain one or more replacement symbols, which are a percent sign "%" followed by a letter (e.g. %j).

\\ Do not process any of the replacement symbols.

%% The character "%".

%A Job array's master job allocation number.

%a Job array ID (index) number.

%J jobid.stepid of the running job. (e.g. "128.0")

%j jobid of the running job.

%N short hostname. This will create a separate IO file per node.

%n Node identifier relative to current job (e.g. "0" is the first node of the running job) This will create a separate IO file per node.

%s stepid of the running job.

%t task identifier (rank) relative to current job. This will create a separate IO file per task.

%u User name.

%x Job name.

A number placed between the percent character and format specifier may be used to zero-pad the result in the IO filename. This number is ignored if the format specifier corresponds to non-numeric data (%N for example).

Some examples of how the format string may be used for a 4 task job step with a Job ID of 128 and step id of 0 are included below:

job%J.out

job128.0.out

job%4j.out

job0128.out

job%j-%2t.out

job128-00.out, job128-01.out, ...

Solution 4

Here's a new answer that's a little interesting. You can't change the name of the log file during the process, and it is not ideal to change it afterward, because it involves wrapping the sbatch command in more code. Instead, you can create a new log file with the name you want that is "linked" to the sbatch logfile. This logfile will have the name you want and update as the sbatch logfile updates. At the end of your code, just delete the sbatch logfile, and the linked file will be the only one that remains. Sure, you have two log files when your process is running, but that's not so bad.

#SBATCH -o something%j.log
[...]
ln -f something${SLURM_JOB_ID}.log coolName.log
[...]
rm something${SLURM_JOB_ID}.log
Share:
20,731

Related videos on Youtube

mindlessgreen
Author by

mindlessgreen

Updated on September 18, 2022

Comments

  • mindlessgreen
    mindlessgreen over 1 year

    When running a SLURM job using sbatch, slurm produces a standard output file which looks like slurm-102432.out (slurm-jobid.out). I would like to customise this to (yyyymmddhhmmss-jobid-jobname.txt). How do I go about doing this?

    Or more generally, how do I include computed variables in the sbatch argument -o?

    I have tried the following in my script.sh

    #SBATCH -p core
    #SBATCH -n 6
    #SBATCH -t 1:00:00
    #SBATCH -J indexing
    #SBATCH -o "/home/user/slurm/$(date +%Y%m%d%H%M%S)-$(SLURM_JOB_ID)-indexing.txt"
    

    but that did not work. The location of the file was correct in the new directory but the filename was just literal line $(date +%Y%m%d%H%M%S)-$(SLURM_JOB_ID)-indexing.txt.

    So, I am looking for a way to save the standard output file in a directory /home/user/slurm/ with a filename like so: 20160526093322-10453-indexing.txt