1. Slurm Workload Manager#

1.1. Command line tools#

Slurm provides a complete toolbox to manage and control your jobs. Some of them carry out common tasks, such as submitting job scripts to the queue (sbatch) or printing information about the queue (squeue). Others have new roles not found in a classic PBS environment, such as srun.

1.1.1. Job management tools#

sbatch
Submit a job script to the queue.
srun
Execute command in parallel, supersedes mpirun
scancel
Cancel a job or resource allocation.
salloc
Allocate resources in the cluster to be used from the current interactive shell.

1.1.2. Slurm monitoring tools#

1.1.2.1. Monitoring jobs#

With the squeue command you can monitor your jobs in the queue.

In the VUB clusters you can also use mysqueue to get a more detailed view of your queue. It shows a detailed overview of your jobs currently in the queue, either PENDING to start or already RUNNING.

Example mysqueue output#

  JOBID PARTITION   NAME           USER     STATE       TIME TIME_LIMIT NODES CPUS MIN_MEMORY NODELIST(REASON)
1125244 ampere_gpu  gpu_job01  vsc10000   RUNNING 3-01:55:38 5-00:00:00     1   16      7810M node404
1125245 ampere_gpu  gpu_job02  vsc10000   PENDING       0:00 5-00:00:00     1   16     10300M (Priority)
1125246 zen5_mpi    my_job01   vsc10000   RUNNING 2-19:58:16 4-23:59:00     2   32         8G node[710,719]
1125247 pascal_gpu  gpu_job03  vsc10000   PENDING       0:00 3-00:00:00     1   12       230G (Resources)

Each row in the table corresponds to one of your running or pending jobs or any individual running job in your Job arrays. You can check the PARTITION where each job is running or trying to start and the resources (TIME, NODES, CPUS, MIN_MEMORY) that are/will be allocated to it.

Note

The command mysqueue -t all will show all your jobs in the last 24 hours.

The column NODELIST(REASON) will either show the list of nodes used by a running job or the reason behind the pending state of a job. The most common reason codes are the following:

Priority: Job is waiting for other pending jobs in front to be processed.
Resources: Job is in front of the queue but there are no available nodes with the requested resources.
ReqNodeNotAvail: The requested partition/nodes are not available. This usually happens on a scheduled maintenance.

1.1.2.2. Monitoring nodes and partitions#

With the sinfo command you can monitor the nodes and partitions in the cluster.

In the VUB clusters you can also use mysinfo to get a more detailed view of the cluster. It shows an overview in real time of the available hardware resources for each partition in the cluster, including cores, memory and GPUs, as well as their current load and running state.

Example mysinfo output#

 CLUSTER: hydra
 PARTITION       STATE [NODES x CPUS]   CPUS(A/I/O/T)     CPU_LOAD   MEMORY MB  GRES                GRES_USED
 ampere_gpu      resv  [    2 x 32  ]       0/64/0/64    0.01-0.03   246989 MB  gpu:a100:2(S:1)     gpu:a100:0(IDX:N/A)
 ampere_gpu      mix   [    3 x 32  ]      66/30/0/96  13.92-19.47   257567 MB  gpu:a100:2(S:0-1)   gpu:a100:2(IDX:0-1)
 ampere_gpu      alloc [    3 x 32  ]       96/0/0/96   3.27-32.00   257567 MB  gpu:a100:2(S:0-1)   gpu:a100:2(IDX:0-1)
 zen5_himem      alloc [    1 x 128 ]     128/0/0/128        59.54  1547679 MB  (null)              (null)
 zen5_mpi        mix   [   10 x 128 ]  817/463/0/1280  0.00-124.32  773536+ MB  (null)              (null)
 [...]
 zen4            mix   [   13 x 64  ]   346/486/0/832   0.02-50.06   386510 MB  (null)              (null)
 zen4            alloc [    7 x 64  ]     448/0/0/448   2.03-74.64   386510 MB  (null)              (null)

Tip

The command mysinfo -N shows a detailed overview per node.

1.1.2.3. Monitoring job accounting data#

Warning Use with restrain, avoid including sacct or mysacct in your scripts.

With the sacct command you can display accounting data of your current and past jobs (and job steps), such as CPU time and memory used. In the VUB clusters you can also use mysacct to get a more detailed view of your jobs and slurm_jobinfo <JOB_ID> to view details of a given job. (Replace <JOB_ID> with the ID of your job.)

Tip

Use --starttime and --endtime to specify a time range.

1.1.2.4. Monitoring running jobs#

With the sattach command you can attach standard input, output, and error of a current job to your shell.

1.2. Job working directory#

In Slurm, by default the job stays in the directory where it was submitted from. Thus, adding cd $SLURM_SUBMIT_DIR to the job script is not needed. Users can also use the Slurm option --chdir to specify in which directory a job should start.

1.3. Torque/Moab to Slurm migration#

Experienced Torque/Moab users can use the translation tables below to quickly get up and running in Slurm.

1.3.1. Submitting and monitoring jobs#

Replace <JOB_ID> with the ID of your job.#
Torque/Moab	Slurm	Description
`qsub job.sh`	`sbatch job.sh`	Submit a job with batch script `job.sh`
`qsub [resources] -I`	`srun [resources] --pty bash -l`	Start an interactive job, see Interactive jobs
`qdel <JOB_ID>`	`scancel <JOB_ID>`	Delete a job
`qstat`	`mysqueue --states=all` or `mysacct --starttime=YYYY-MM-DD`	Show job queue status
`qstat -f <JOB_ID>`	`scontrol show job <JOB_ID>`	Show details about a job
`myresources`	`mysacct`	Show resources usage
`nodestat`	`mysinfo`	Show summary of available nodes and their usage

1.3.2. Requesting resources and other options#

Torque/Moab	Slurm	Description
`-N job_name`	`--job-name=job_name`	Set job name to `job_name`
`-l walltime=HH:MM:SS`	`--time=DD-HH:MM:SS`	Define the time limit
`-l nodes=1:ppn=1`	`--ntasks=1`	Request a single CPU core
`-l nodes=1:ppn=X`	`--ntasks=1 --cpus-per-task=X`	Request multiple cores on 1 node for Parallel non-MPI jobs
`-l nodes=X:ppn=Y`	`--ntasks=X` or `--ntasks=X --nodes=Y` or `--nodes=Y --ntasks-per-node=Z`	Request multiple cores on 1 or multiple nodes for Parallel MPI jobs
`-l pmem=N`	`--mem-per-cpu=N` default unit = MB	Request memory per CPU core Only if needed, see Memory allocation
`-l feature=pascal`	`--partition=pascal_gpu`	Request `pascal` GPU architecture, see Slurm partitions
`-M email@example.com`	`--mail-user=email@example.com`	Send job alerts to given email address
`-m <a\|b\|e>`	`--mail-type=` `BEGIN\|END\|FAIL\|REQUEUE\|ALL` select 1 or comma separated list	Conditions for sending alerts by email
`-o out_file`	`--output out_file`	Write stdout to `out_file`
`-e err_file`	`--error err_file`	Write stderr to `err_file`
`-j oe`	(default, unless `--error` is specified)	Write stdout and stderr to the same file

1.3.3. Environment variables defined by resource managers#

Torque/Moab	Slurm	Description
`$PBS_JOBID`	`$SLURM_JOB_ID`	Job ID
`$PBS_O_WORKDIR`	`$SLURM_SUBMIT_DIR`	Directory where job was submitted from, see Job working directory
`$PBS_NODEFILE` (nodes file)	`$SLURM_JOB_NODELIST` or `$(scontrol show hostnames)` (nodes string)	List of nodes assigned to job
`$PBS_JOBNAME`	`$SLURM_JOB_NAME`	Job name
`$PBS_ARRAYID`	`$SLURM_ARRAY_TASK_ID`	Job array ID (index) number
`$PBS_NUM_NODES`	`$SLURM_JOB_NUM_NODES`	Number of nodes
`$PBS_NUM_PPN`	see Job variables about CPUs	Number of cores per node
`$PBS_NP`	see Job variables about CPUs	Total number of cores

1.3.4. Features to partitions#

See Slurm partitions for more info.#
Torque/Moab features	Slurm partitions
`pascal`	`pascal_gpu`