1. Slurm Job Scheduler#

1.1. Command line tools#

Slurm provides a complete toolbox to manage and control your jobs. Some of them carry out common tasks, such as submitting job scripts to the queue (sbatch) or printing information about the queue (squeue). Others have new roles not found in a classic PBS environment, such as srun.

1.1.1. Job management commands#

sbatch
Submit a job script to the queue.
srun
Execute command in parallel, supersedes mpirun
scancel
Cancel a job or resource allocation.
salloc
Allocate resources in the cluster to be used from the current interactive shell.

1.1.2. Job monitoring commands#

squeue
Monitor your jobs in the queue. In the VUB clusters you can also use mysqueue to get a more detailed view of your queue.
sinfo
Monitor the nodes and partitions in the cluster. In the VUB clusters you can also use mysinfo to get a more detailed view of the cluster.
sacct
Warning Use with restrain, avoid including sacct or mysacct in your scripts.

Display accounting data of your current and past jobs, such as CPU time and memory used. In the VUB clusters you can also use mysacct to get a more detailed view of your jobs and slurm_jobinfo <JOB_ID> to view details of a given job. (Replace <JOB_ID> with the ID of your job.)
sattach
Attach standard input, output, and error of a current job to your shell.

1.2. Torque/Moab to Slurm migration#

Users that have workflows developed for Torque/Moab, which is based on a PBS environment and q commands, have multiple options to quickly get up and running in Slurm.

1.2.1. Quick translation tables#

Note

This section is meant for experienced Torque/Moab users to quickly get up and running with Slurm.

We encourage all users to convert their workflows to Slurm. The tables below provide a quick reference with translations from Torque/Moab to Slurm that can help you with the migration.

1.2.1.1. Submitting and monitoring jobs#

Replace <JOB_ID> with the ID of your job.#
Torque/Moab	Slurm	Description
`qsub job.sh`	`sbatch job.sh`	Submit a job with batch script `job.sh`
`qsub [resources] -I`	`srun [resources] --pty bash -l`	Start an interactive job, see Interactive jobs
`qdel <JOB_ID>`	`scancel <JOB_ID>`	Delete a job
`qstat`	`mysqueue --states=all` or `mysacct --starttime=YYYY-MM-DD`	Show job queue status
`qstat -f <JOB_ID>`	`scontrol show job <JOB_ID>`	Show details about a job
`myresources`	`mysacct`	Show resources usage
`nodestat`	`mysinfo`	Show summary of available nodes and their usage

1.2.1.2. Requesting resources and other options#

Torque/Moab	Slurm	Description
`-N job_name`	`--job-name=job_name`	Set job name to `job_name`
`-l walltime=HH:MM:SS`	`--time=DD-HH:MM:SS`	Define the time limit
`-l nodes=1:ppn=1`	`--ntasks=1`	Request a single CPU core
`-l nodes=1:ppn=X`	`--ntasks=1 --cpus-per-task=X`	Request multiple cores on 1 node for Parallel non-MPI jobs
`-l nodes=X:ppn=Y`	`--ntasks=X` or `--ntasks=X --nodes=Y` or `--nodes=Y --ntasks-per-node=Z`	Request multiple cores on 1 or multiple nodes for Parallel MPI jobs
`-l pmem=N`	`--mem-per-cpu=N` default unit = MB	Request memory per CPU core Only if needed, see Memory allocation
`-l feature=pascal`	`--partition=pascal_gpu`	Request `pascal` GPU architecture, see Slurm partitions
`-M email@example.com`	`--mail-user=email@example.com`	Send job alerts to given email address
`-m <a\|b\|e>`	`--mail-type=` `BEGIN\|END\|FAIL\|REQUEUE\|ALL` select 1 or comma separated list	Conditions for sending alerts by email
`-o out_file`	`--output out_file`	Write stdout to `out_file`
`-e err_file`	`--error err_file`	Write stderr to `err_file`
`-j oe`	(default, unless `--error` is specified)	Write stdout and stderr to the same file

1.2.1.3. Environment variables defined by resource managers#

Torque/Moab	Slurm	Description
`$PBS_JOBID`	`$SLURM_JOB_ID`	Job ID
`$PBS_O_WORKDIR`	`$SLURM_SUBMIT_DIR`	Directory where job was submitted from, see Job working directory
`$PBS_NODEFILE` (nodes file)	`$SLURM_JOB_NODELIST` or `$(scontrol show hostnames)` (nodes string)	List of nodes assigned to job
`$PBS_JOBNAME`	`$SLURM_JOB_NAME`	Job name
`$PBS_ARRAYID`	`$SLURM_ARRAY_TASK_ID`	Job array ID (index) number
`$PBS_NUM_NODES`	`$SLURM_JOB_NUM_NODES`	Number of nodes
`$PBS_NUM_PPN`	see Job variables about CPUs	Number of cores per node
`$PBS_NP`	see Job variables about CPUs	Total number of cores

1.2.1.4. Features to partitions#

See Slurm partitions for more info.#
Torque/Moab features	Slurm partitions
`pascal`	`pascal_gpu`

1.3. CPU cores allocation#

Requesting CPU cores in Torque/Moab is done with the option -l nodes=X:ppn:Y, where it is mandatory to specify the number of nodes even for single core jobs (-l nodes=1:ppn:1). The concept behind the keyword nodes is different between Torque/Moab and Slurm though. While Torque/Moab nodes do not necessarily represent a single physical server of the cluster, the option --nodes in Slurm specifies the exact number of physical nodes to be used for the job, as explained in Parallel non-MPI jobs.

While in Torque/Moab the total number of CPU cores allocated to a job is always defined by the combination of nodes and processes per node ppn, in Slurm the definition of resources is more nuanced and it is mandatory to distinguish between (at least) two classes of parallel applications:

Parallel non-MPI jobs: single node jobs with a single task and multiple CPU cores per task
Parallel MPI jobs: multi-task jobs that can run on multiple-nodes

1.3.1. Job variables about CPUs#

The job variables in Torque/Moab providing information about the number of allocated cores is $PBS_NP for the total and $PBS_NUM_PPN for CPU cores per node. The equivalent variables in Slurm depend on the type of job that you are running:

Parallel non-MPI jobs: The number of cores allocated for the threads and processes of your application is given by the environment variable $SLURM_CPUS_PER_TASK.
Parallel MPI jobs: The total number of cores allocated to your job is given by the environment variable $SLURM_NTASKS, and the number of cores per node by SLURM_TASKS_PER_NODE.

1.4. Memory allocation#

Jobs that do not define any specific memory request will get a default allocation per core, which is the total node memory divided by the number of cores on the node. In most cases, the default memory allocation is sufficient, and it is also what we recommend. If your jobs need more than the default memory, make sure to control their memory usage (e.g. with mysacct) to avoid allocating more resources than needed.

If your job needs a non-default amount of memory, we highly recommend to specify memory allocation of your job with the Slurm option --mem-per-cpu=X, which sets the memory per core. It is also possible to request the total amount of memory per node of your job with the option --mem=X. However, requesting a proper amount of memory with --mem is not trivial for multi-node jobs in which you want to leave some freedom for node allocation. In any case, these two options are mutually exclusive, so should only use one of them.

The default memory unit is megabytes, but you can specify different units using one of the following one letter suffixes: K, M, G or T. For example, to request 2GB per core you can use --mem-per-cpu=2000 or --mem-per-cpu=2G.

If your job needs more than 240GB memory, you have to specify a high-memory node with --partition=zen5_himem. These nodes provide up to 1.5TB.

1.5. Slurm partitions#

Compute nodes in the cluster are organized in partitions based on their hardware characteristics. In most cases, specifying a partition is not necessary, as Slurm will automatically determine the partitions that are suitable for your job based on the requested resources, such as number of tasks or GPUs.

The command mysinfo provides detailed information about all partitions in the cluster. The name of the partition tells its main characteristic: GPU nodes are all in specific partitions suffixed with _gpu and nodes with a fast node interconnect are suffixed with _mpi. The name of the partition before the suffix tells the generation of the hardware in that partition. For example, ampere_gpu has Nvidia Ampere (A100) GPUs, while zen5_mpi has nodes with AMD Zen 5 CPUs and a fast interconnect.

1.6. Job working directory#

In Torque/Moab, each job starts in the user’s $HOME directory. In Slurm, by default the job stays in the directory where it was submitted from. Thus, adding cd $SLURM_SUBMIT_DIR to the job script is not needed. Users can also use the Slurm option --chdir to specify in which directory a job should start.