1. Slurm Job Scheduler#
1.1. Command line tools#
Slurm provides a complete toolbox to manage and control your jobs. Some of them
carry out common tasks, such as submitting job scripts to the queue (sbatch
)
or printing information about the queue (squeue
). Others have new roles not
found in a classic PBS environment, such as srun
.
1.1.1. Job management commands#
1.1.2. Job monitoring commands#
- squeue
Monitor your jobs in the queue. In the VUB clusters you can also use
mysqueue
to get a more detailed view of your queue.
- sinfo
Monitor the nodes and partitions in the cluster. In the VUB clusters you can also use
mysinfo
to get a more detailed view of the cluster.
- sacct
Warning Use with restrain, avoid including
sacct
ormysacct
in your scripts.Display accounting data of your current and past jobs, such as CPU time and memory used. In the VUB clusters you can also use
mysacct
to get a more detailed view of your jobs andslurm_jobinfo <JOB_ID>
to view details of a given job. (Replace <JOB_ID> with the ID of your job.)
- sattach
Attach standard input, output, and error of a current job to your shell.
1.2. Torque/Moab to Slurm migration#
Users that have workflows developed for Torque/Moab, which is based on a PBS environment and q commands, have multiple options to quickly get up and running in Slurm.
1.2.1. Quick translation tables#
Note
This section is meant for experienced Torque/Moab users to quickly get up and running with Slurm.
We encourage all users to convert their workflows to Slurm. The tables below provide a quick reference with translations from Torque/Moab to Slurm that can help you with the migration.
1.2.1.1. Submitting and monitoring jobs#
Torque/Moab |
Slurm |
Description |
---|---|---|
|
|
Submit a job with batch script |
|
|
Start an interactive job, see Interactive jobs |
|
|
Delete a job |
|
mysqueue --states=all ormysacct --starttime=YYYY-MM-DD |
Show job queue status |
|
|
Show details about a job |
|
|
Show resources usage |
|
|
Show summary of available nodes and their usage |
1.2.1.2. Requesting resources and other options#
Torque/Moab |
Slurm |
Description |
---|---|---|
|
|
Set job name to |
|
|
Define the time limit |
|
|
Request a single CPU core |
|
|
Request multiple cores on 1 node for Parallel non-MPI jobs |
|
--ntasks=X or--ntasks=X --nodes=Y or--nodes=Y --ntasks-per-node=Z |
Request multiple cores on 1 or multiple nodes for Parallel MPI jobs |
|
--mem-per-cpu=N default unit = MB
|
Request memory per CPU core
Only if needed, see Memory allocation
|
|
|
Request |
|
|
Request |
|
|
Send job alerts to given email address |
|
--mail-type= BEGIN|END|FAIL|REQUEUE|ALL select 1 or comma separated list
|
Conditions for sending alerts by email |
|
|
Write stdout to |
|
|
Write stderr to |
|
(default, unless |
Write stdout and stderr to the same file |
1.2.1.3. Environment variables defined by resource managers#
Torque/Moab |
Slurm |
Description |
---|---|---|
|
|
Job ID |
|
|
Directory where job was submitted from, see Job working directory |
$PBS_NODEFILE (nodes file)
|
$SLURM_JOB_NODELIST or$(scontrol show hostnames) (nodes string)
|
List of nodes assigned to job |
|
|
Job name |
|
|
Job array ID (index) number |
|
|
Number of nodes |
|
Number of cores per node |
|
|
Total number of cores |
1.2.1.4. Features to partitions#
Torque/Moab features |
Slurm partitions |
---|---|
|
skylake orskylake_mpi |
|
|
|
|
1.2.2. Compatibility layer#
The Slurm clusters in VUB provide a compatibility layer with Torque/Moab. It is
possible to manage your jobs in the queue with the classic commands qsub
,
qdel
and qstat
. Job scripts with #PBS
directives or using $PBS_*
environment variables can be interpreted and handled to Slurm. Please note that
this compatibility layer does not support all possible combinations of options,
as there is no direct translation for all of them between Torque/Moab and Slurm.
Nonetheless, common workflows should work out-of-the-box.
As of 27 November 2023, the compatibility layer has been deprecated in favor of
the native Slurm commands and #SBATCH
directives. We strongly suggest to
migrate any torque-based jobs to native Slurm commands.
Users who still need to rely on the compatibility layer should first load the
slurm-torque-wrappers
module in their environment:
module load slurm-torque-wrappers
Note
In some cases, when an interactive job is started using the
compatibility layer with qsub -I
, the terminal width might be reduced to
80 characters. This can be easily fixed by issuing the command resize
in
the interactive shell.
1.3. CPU cores allocation#
Requesting CPU cores in Torque/Moab is done with the option -l nodes=X:ppn:Y
,
where it is mandatory to specify the number of nodes even for single core jobs
(-l nodes=1:ppn:1
). The concept behind the keyword nodes is different
between Torque/Moab and Slurm though. While Torque/Moab nodes do not
necessarily represent a single physical server of the cluster, the option
--nodes
in Slurm specifies the exact number of physical nodes to be used for
the job, as explained in Parallel non-MPI jobs.
While in Torque/Moab the total number of CPU cores allocated to a job is always
defined by the combination of nodes
and processes per node ppn
, in Slurm
the definition of resources is more nuanced and it is mandatory to distinguish
between (at least) two classes of parallel applications:
Parallel non-MPI jobs: single node jobs with a single task and multiple CPU cores per task
Parallel MPI jobs: multi-task jobs that can run on multiple-nodes
1.3.1. Job variables about CPUs#
The job variables in Torque/Moab providing information about the number of
allocated cores is $PBS_NP
for the total and $PBS_NUM_PPN
for CPU cores
per node. The equivalent variables in Slurm depend on the type of job that you
are running:
Parallel non-MPI jobs: The number of cores allocated for the threads and processes of your application is given by the environment variable
$SLURM_CPUS_PER_TASK
.Parallel MPI jobs: The total number of cores allocated to your job is given by the environment variable
$SLURM_NTASKS
, and the number of cores per node bySLURM_TASKS_PER_NODE
.
1.4. Memory allocation#
Jobs that do not define any specific memory request will get a default
allocation per core, which is the total node memory divided by the number of
cores on the node. In most cases, the default memory allocation is sufficient,
and it is also what we recommend. If your jobs need more than the default
memory, make sure to control their memory usage (e.g. with mysacct
) to
avoid allocating more resources than needed.
If your job needs a non-default amount of memory, we highly recommend to specify
memory allocation of your job with the Slurm option --mem-per-cpu=X
, which
sets the memory per core. It is also possible to request the total amount of
memory per node of your job with the option --mem=X
. However, requesting a
proper amount of memory with --mem
is not trivial for multi-node jobs in
which you want to leave some freedom for node allocation. In any case, these two
options are mutually exclusive, so should only use one of them.
The default memory unit is megabytes, but you can specify different units using
one of the following one letter suffixes: K, M, G or T. For example, to request
2GB per core you can use --mem-per-cpu=2000
or --mem-per-cpu=2G
.
If your job needs more than 240GB memory, you have to specify the high-memory
node with --partition=broadwell_himem
. This node provides up to 1.4TB.
1.5. Slurm partitions#
Compute nodes in the cluster are organized in partitions based on their hardware characteristics. In most cases, specifying a partition is not necessary, as Slurm will automatically determine the partitions that are suitable for your job based on the requested resources, such as number of tasks or GPUs.
The command mysinfo
provides detailed information about all partitions in
the cluster. The name of the partition tells its main characteristic: GPU nodes
are all in specific partitions suffixed with _gpu
and nodes with a fast node
interconnect are suffixed with _mpi
. The name of the partition before the
suffix tells the generation of the hardware in that partition. For instance
ampere_gpu
has Nvidia Ampere (A100) GPUs, while skylake_mpi
has nodes with
Intel Skylake CPUs and a fast interconnect.
See also
Full hardware description of the partitions in Hydra can be found in VSCdocHydra Hardware
You can submit your jobs to specific partitions if needed. It’s also possible to
request a comma-separated list of partitions. For example, to indicate that your
job may run in partitions skylake or zen4 you can use
--partition=skylake,zen4
. Note however, that a job will only run in a
single partition. Slurm will decide the partition based on priority and
availability.
1.6. Job working directory#
In Torque/Moab, each job starts in the user’s $HOME directory. In Slurm, by
default the job stays in the directory where it was submitted from. Thus, adding
cd $SLURM_SUBMIT_DIR
to the job script is not needed. Users can also use the
Slurm option --chdir
to specify in which directory a job should start.