1. Slurm Job Scheduler#

1.1. Command line tools#

Slurm provides a complete toolbox to manage and control your jobs. Some of them carry out common tasks, such as submitting job scripts to the queue (sbatch) or printing information about the queue (squeue). Others have new roles not found in a classic PBS environment, such as srun.

Overview of commands in Slurm:

  • Job management

    • sbatch

      Submit a job script to the queue.

    • srun

      Execute command in parallel, supersedes mpirun

    • scancel

      Cancel a job or resource allocation.

    • salloc

      Allocate resources in the cluster to be used from the current interactive shell.

  • Job monitoring

    • squeue

      Monitor your jobs in the queue. In the VUB clusters you can also use mysqueue to get a more detailed view of your queue.

    • sinfo

      Monitor the nodes and partitions in the cluster. In the VUB clusters you can also use mysinfo to get a more detailed view of the cluster.

    • sacct

      Warning Use with restrain, avoid including sacct or mysacct in your scripts.

      Display accounting data of your current and past jobs, such as CPU time and memory used. In the VUB clusters you can also use mysacct to get a more detailed view of your jobs.

    • sattach

      Attach standard input, output, and error of a current job to your shell.

1.2. Torque/Moab to Slurm migration#

Users that have workflows developed for Torque/Moab, which is based on a PBS environment and q commands, have multiple options to quickly get up and running in Slurm.

1.2.1. Quick translation tables#

Note

This section is meant for experienced Torque/Moab users to quickly get up and running with Slurm.

We encourage all users to convert their workflows to Slurm. The tables below provide a quick reference with translations from Torque/Moab to Slurm that can help you with the migration.

1.2.1.1. Submitting and monitoring jobs#

Replace <JOB_ID> with the ID of your job.#

Torque/Moab

Slurm

Description

qsub job.sh

sbatch job.sh

Submit a job with batch script job.sh

qsub [resources] -I

srun [resources] --pty bash -l

Start an interactive job, see Interactive jobs

qdel <JOB_ID>

scancel <JOB_ID>

Delete a job

qstat

mysqueue --states=all or
mysacct --starttime=YYYY-MM-DD

Show job queue status

qstat -f <JOB_ID>

scontrol show job <JOB_ID>

Show details about a job

myresources

mysacct

Show resources usage

nodestat

mysinfo

Show summary of available nodes and their usage

1.2.1.2. Requesting resources and other options#

Torque/Moab

Slurm

Description

-N job_name

--job-name=job_name

Set job name to job_name

-l walltime=HH:MM:SS

--time=DD-HH:MM:SS

Define the time limit

-l nodes=1:ppn=1

--ntasks=1

Request a single CPU core

-l nodes=1:ppn=X

--ntasks=1 --cpus-per-task=X

Request multiple cores on 1 node for Parallel non-MPI jobs

-l nodes=X:ppn=Y

--ntasks=X or
--ntasks=X --nodes=Y or
--nodes=Y --ntasks-per-node=Z

Request multiple cores on 1 or multiple nodes for Parallel MPI jobs

-l pmem=N

--mem-per-cpu=N
default unit = MB
Request memory per CPU core
Only if needed, see Memory allocation

-l feature=skylake

--partition=skylake

Request skylake CPU architecture, see Slurm partitions

-l feature=pascal

--partition=pascal_gpu

Request pascal GPU architecture, see Slurm partitions

-M email@example.com

--mail-user=email@example.com

Send job alerts to given email address

-m <a|b|e>

--mail-type=
BEGIN|END|FAIL|REQUEUE|ALL
select 1 or comma separated list

Conditions for sending alerts by email

-o out_file

--output out_file

Write stdout to out_file

-e err_file

--error err_file

Write stderr to err_file

-j oe

(default, unless --error is specified)

Write stdout and stderr to the same file

1.2.1.3. Environment variables defined by resource managers#

Torque/Moab

Slurm

Description

$PBS_JOBID

$SLURM_JOB_ID

Job ID

$PBS_O_WORKDIR

$SLURM_SUBMIT_DIR

Directory where job was submitted from, see Job working directory

$PBS_NODEFILE
(nodes file)
$SLURM_JOB_NODELIST or
$(scontrol show hostnames)
(nodes string)

List of nodes assigned to job

$PBS_JOBNAME

$SLURM_JOB_NAME

Job name

$PBS_ARRAYID

$SLURM_ARRAY_TASK_ID

Job array ID (index) number

$PBS_NUM_NODES

$SLURM_JOB_NUM_NODES

Number of nodes

$PBS_NUM_PPN

see Job variables about CPUs

Number of cores per node

$PBS_NP

see Job variables about CPUs

Total number of cores

1.2.1.4. Features to partitions#

See Slurm partitions for more info.#

Torque/Moab features

Slurm partitions

skylake

skylake or
skylake_mpi

broadwell

broadwell

ivybridge

ivybridge_mpi

pascal

pascal_gpu

himem

broadwell_himem

1.2.2. Compatibility layer#

The Slurm clusters in VUB provide a compatibility layer with Torque/Moab. It is possible to manage your jobs in the queue with the classic commands qsub, qdel and qstat. Job scripts with #PBS directives or using $PBS_* environment variables can be interpreted and handled to Slurm. Please note that this compatibility layer does not support all possible combinations of options, as there is no direct translation for all of them between Torque/Moab and Slurm. Nonetheless, common workflows should work out-of-the-box.

Note

In some cases, when an interactive job is started using the compatibility layer with qsub -I, the terminal width might be reduced to 80 characters. This can be easily fixed by issuing the command resize in the interactive shell.

1.3. CPU cores allocation#

Requesting CPU cores in Torque/Moab is done with the option -l nodes=X:ppn:Y, where it is mandatory to specify the number of nodes even for single core jobs (-l nodes=1:ppn:1). The concept behind the keyword nodes is different between Torque/Moab and Slurm though. While Torque/Moab nodes do not necessarily represent a single physical server of the cluster, the option --nodes in Slurm specifies the exact number of physical nodes to be used for the job, as explained in Parallel non-MPI jobs.

While in Torque/Moab the total number of CPU cores allocated to a job is always defined by the combination of nodes and processes per node ppn, in Slurm the definition of resources is more nuanced and it is mandatory to distinguish between (at least) two classes of parallel applications:

1.3.1. Job variables about CPUs#

The job variables in Torque/Moab providing information about the number of allocated cores is $PBS_NP for the total and $PBS_NUM_PPN for CPU cores per node. The equivalent variables in Slurm depend on the type of job that you are running:

  • Parallel non-MPI jobs: The number of cores allocated for the threads and processes of your application is given by the environment variable $SLURM_CPUS_PER_TASK.

  • Parallel MPI jobs: The total number of cores allocated to your job is given by the environment variable $SLURM_NTASKS, and the number of cores per node by SLURM_TASKS_PER_NODE.

1.4. Memory allocation#

Jobs that do not define any specific memory request will get a default allocation per core, which is the total node memory divided by the number of cores on the node. In most cases, the default memory allocation is sufficient, and it is also what we recommend. If your jobs need more than the default memory, make sure to control their memory usage (e.g. with mysacct) to avoid allocating more resources than needed.

If your job needs a non-default amount of memory, we highly recommend to specify memory allocation of your job with the Slurm option --mem-per-cpu=X, which sets the memory per core. It is also possible to request the total amount of memory per node of your job with the option --mem=X. However, requesting a proper amount of memory with --mem is not trivial for multi-node jobs in which you want to leave some freedom for node allocation. In any case, these two options are mutually exclusive, so should only use one of them.

The default memory unit is megabytes, but you can specify different units using one of the following one letter suffixes: K, M, G or T. For example, to request 2GB per core you can use --mem-per-cpu=2000 or --mem-per-cpu=2G.

If your job needs more than 240GB memory, you have to specify the high-memory node with --partition=broadwell_himem. This node provides up to 1.4TB.

1.5. Slurm partitions#

In Torque/Moab, specific hardware resources can be requested with features. In Slurm, we provide this functionality with partitions. In most cases, specifying a partition is not necessary, as Slurm will automatically determine the partitions that are suitable for your job.

The command mysinfo provides detailed information about all partitions in the cluster. The name of the partition tells its main characteristic: GPU nodes are all in specific partitions suffixed with _gpu and nodes with a fast node interconnect are suffixed with _mpi. The name of the partition before the suffix tells the generation of the hardware in that partition. For instance ampere_gpu has Nvidia Ampere (A100) GPUs, while skylake_mpi has nodes with Intel Skylake CPUs and a fast interconnect.

You can submit your jobs to specific partitions if needed. It’s also possible to request a comma-separated list of partitions. For example, to indicate that your job may run in partitions skylake or broadwell you can use --partition=skylake,broadwell. Note however, that a job will only run in a single partition. Slurm will decide the partition based on priority and availability.

1.6. Job working directory#

In Torque/Moab, each job starts in the user’s $HOME directory. In Slurm, by default the job stays in the directory where it was submitted from. Thus, adding cd $SLURM_SUBMIT_DIR to the job script is not needed. Users can also use the Slurm option --chdir to specify in which directory a job should start.