1. Slurm Workload Manager#

1.1. Command line tools#

Slurm provides a complete toolbox to manage and control your jobs. Some of them carry out common tasks, such as submitting job scripts to the queue (sbatch) or printing information about the queue (squeue). Others have new roles not found in a classic PBS environment, such as srun.

1.1.1. Job management tools#

  • sbatch

    Submit a job script to the queue.

  • srun

    Execute command in parallel, supersedes mpirun

  • scancel

    Cancel a job or resource allocation.

  • salloc

    Allocate resources in the cluster to be used from the current interactive shell.

1.1.2. Slurm monitoring tools#

1.1.2.1. Monitoring jobs#

With the squeue command you can monitor your jobs in the queue.

In the VUB clusters you can also use mysqueue to get a more detailed view of your queue. It shows a detailed overview of your jobs currently in the queue, either PENDING to start or already RUNNING.

Example mysqueue output#
  JOBID PARTITION   NAME           USER     STATE       TIME TIME_LIMIT NODES CPUS MIN_MEMORY NODELIST(REASON)
1125244 ampere_gpu  gpu_job01  vsc10000   RUNNING 3-01:55:38 5-00:00:00     1   16      7810M node404
1125245 ampere_gpu  gpu_job02  vsc10000   PENDING       0:00 5-00:00:00     1   16     10300M (Priority)
1125246 zen5_mpi    my_job01   vsc10000   RUNNING 2-19:58:16 4-23:59:00     2   32         8G node[710,719]
1125247 pascal_gpu  gpu_job03  vsc10000   PENDING       0:00 3-00:00:00     1   12       230G (Resources)

Each row in the table corresponds to one of your running or pending jobs or any individual running job in your Job arrays. You can check the PARTITION where each job is running or trying to start and the resources (TIME, NODES, CPUS, MIN_MEMORY) that are/will be allocated to it.

Note

The command mysqueue -t all will show all your jobs in the last 24 hours.

The column NODELIST(REASON) will either show the list of nodes used by a running job or the reason behind the pending state of a job. The most common reason codes are the following:

Priority

Job is waiting for other pending jobs in front to be processed.

Resources

Job is in front of the queue but there are no available nodes with the requested resources.

ReqNodeNotAvail

The requested partition/nodes are not available. This usually happens on a scheduled maintenance.

See also

Full list of reason tags for pending jobs.

1.1.2.2. Monitoring nodes and partitions#

With the sinfo command you can monitor the nodes and partitions in the cluster.

In the VUB clusters you can also use mysinfo to get a more detailed view of the cluster. It shows an overview in real time of the available hardware resources for each partition in the cluster, including cores, memory and GPUs, as well as their current load and running state.

Example mysinfo output#
 CLUSTER: hydra
 PARTITION       STATE [NODES x CPUS]   CPUS(A/I/O/T)     CPU_LOAD   MEMORY MB  GRES                GRES_USED
 ampere_gpu      resv  [    2 x 32  ]       0/64/0/64    0.01-0.03   246989 MB  gpu:a100:2(S:1)     gpu:a100:0(IDX:N/A)
 ampere_gpu      mix   [    3 x 32  ]      66/30/0/96  13.92-19.47   257567 MB  gpu:a100:2(S:0-1)   gpu:a100:2(IDX:0-1)
 ampere_gpu      alloc [    3 x 32  ]       96/0/0/96   3.27-32.00   257567 MB  gpu:a100:2(S:0-1)   gpu:a100:2(IDX:0-1)
 zen5_himem      alloc [    1 x 128 ]     128/0/0/128        59.54  1547679 MB  (null)              (null)
 zen5_mpi        mix   [   10 x 128 ]  817/463/0/1280  0.00-124.32  773536+ MB  (null)              (null)
 [...]
 zen4            mix   [   13 x 64  ]   346/486/0/832   0.02-50.06   386510 MB  (null)              (null)
 zen4            alloc [    7 x 64  ]     448/0/0/448   2.03-74.64   386510 MB  (null)              (null)

Tip

The command mysinfo -N shows a detailed overview per node.

1.1.2.3. Monitoring job accounting data#

Warning Use with restrain, avoid including sacct or mysacct in your scripts.

With the sacct command you can display accounting data of your current and past jobs (and job steps), such as CPU time and memory used. In the VUB clusters you can also use mysacct to get a more detailed view of your jobs and slurm_jobinfo <JOB_ID> to view details of a given job. (Replace <JOB_ID> with the ID of your job.)

Tip

Use --starttime and --endtime to specify a time range.

1.1.2.4. Monitoring running jobs#

With the sattach command you can attach standard input, output, and error of a current job to your shell.

1.2. Job working directory#

In Slurm, by default the job stays in the directory where it was submitted from. Thus, adding cd $SLURM_SUBMIT_DIR to the job script is not needed. Users can also use the Slurm option --chdir to specify in which directory a job should start.

1.3. Torque/Moab to Slurm migration#

Experienced Torque/Moab users can use the translation tables below to quickly get up and running in Slurm.

1.3.1. Submitting and monitoring jobs#

Replace <JOB_ID> with the ID of your job.#

Torque/Moab

Slurm

Description

qsub job.sh

sbatch job.sh

Submit a job with batch script job.sh

qsub [resources] -I

srun [resources] --pty bash -l

Start an interactive job, see Interactive jobs

qdel <JOB_ID>

scancel <JOB_ID>

Delete a job

qstat

mysqueue --states=all or
mysacct --starttime=YYYY-MM-DD

Show job queue status

qstat -f <JOB_ID>

scontrol show job <JOB_ID>

Show details about a job

myresources

mysacct

Show resources usage

nodestat

mysinfo

Show summary of available nodes and their usage

1.3.2. Requesting resources and other options#

Torque/Moab

Slurm

Description

-N job_name

--job-name=job_name

Set job name to job_name

-l walltime=HH:MM:SS

--time=DD-HH:MM:SS

Define the time limit

-l nodes=1:ppn=1

--ntasks=1

Request a single CPU core

-l nodes=1:ppn=X

--ntasks=1 --cpus-per-task=X

Request multiple cores on 1 node for Parallel non-MPI jobs

-l nodes=X:ppn=Y

--ntasks=X or
--ntasks=X --nodes=Y or
--nodes=Y --ntasks-per-node=Z

Request multiple cores on 1 or multiple nodes for Parallel MPI jobs

-l pmem=N

--mem-per-cpu=N
default unit = MB
Request memory per CPU core
Only if needed, see Memory allocation

-l feature=pascal

--partition=pascal_gpu

Request pascal GPU architecture, see Slurm partitions

-M email@example.com

--mail-user=email@example.com

Send job alerts to given email address

-m <a|b|e>

--mail-type=
BEGIN|END|FAIL|REQUEUE|ALL
select 1 or comma separated list

Conditions for sending alerts by email

-o out_file

--output out_file

Write stdout to out_file

-e err_file

--error err_file

Write stderr to err_file

-j oe

(default, unless --error is specified)

Write stdout and stderr to the same file

1.3.3. Environment variables defined by resource managers#

Torque/Moab

Slurm

Description

$PBS_JOBID

$SLURM_JOB_ID

Job ID

$PBS_O_WORKDIR

$SLURM_SUBMIT_DIR

Directory where job was submitted from, see Job working directory

$PBS_NODEFILE
(nodes file)
$SLURM_JOB_NODELIST or
$(scontrol show hostnames)
(nodes string)

List of nodes assigned to job

$PBS_JOBNAME

$SLURM_JOB_NAME

Job name

$PBS_ARRAYID

$SLURM_ARRAY_TASK_ID

Job array ID (index) number

$PBS_NUM_NODES

$SLURM_JOB_NUM_NODES

Number of nodes

$PBS_NUM_PPN

see Job variables about CPUs

Number of cores per node

$PBS_NP

see Job variables about CPUs

Total number of cores

1.3.4. Features to partitions#

See Slurm partitions for more info.#

Torque/Moab features

Slurm partitions

pascal

pascal_gpu