3. Main Job Types#
Jobs in VUB clusters start with a clean environment to guarantee the
reproducibility of your simulations. This means that the commands available at
job start will be the default in the system, without any software modules loaded
and without those commands from software installed in the user directory (i.e.
the $PATH
environment variable is reset). All jobs have to explicitly
load/enable any software from modules or in the user directory. This approach
was already applied in Hydra with the Torque/Moab scheduler and continues with
Slurm.
Note
If you look for information about Slurm elsewhere, be aware that by default Slurm copies the environment from the submission shell into the job. Users who prefer this mode of operation can find information on how to enable it in How can I copy the login shell environment to my jobs?
Job scripts are submitted to the queue in Slurm with the sbatch
command.
This is equivalent to qsub
in Torque/Moab. Job resources and other job
options can be either set in the command line or in the header of your job
script with the #SBATCH
keyword. Here below we provide basic examples of job
scripts for a variety of common use cases to illustrate its usage.
3.1. Serial jobs#
These jobs perform executions in serial and only require one CPU core.
Example Scripts in R, Python, MATLAB are serial by default (these environments can do parallel execution, but it has to be enabled with extra modules/packages).
1#!/bin/bash
2#SBATCH --job-name=myjob
3#SBATCH --time=04:00:00
4
5module load somemodule/1.1.1
6
7<more-commands>
Submit your job script with sbatch
, no extra options are needed. The job
will get a single core and a default amount of memory, which is sufficient for
most serial jobs.
3.2. Jobs for GPUs#
The different types of jobs for GPUs are covered in their own page GPU Job Types.
3.3. Parallel non-MPI jobs#
Applications that use threads or subprocesses are capable of parallelizing the computation on multiple CPU cores in a single node. Typically, such software use specific tools/libraries that can organize the parallel execution in the local system of the node, not using any network communication.
Example Code in C/C++ using OpenMP or Pthreads. R scripts using doParallel. Python scripts using the multiprocessing module or other modules supporting parallel execution, e.g. TensorFlow and PyTorch.
1#!/bin/bash
2#SBATCH --job-name=myjob
3#SBATCH --time=04:00:00
4#SBATCH --ntasks=1
5#SBATCH --cpus-per-task=X
6
7module load somemodule/1.1.1
8
9<more-commands>
Submit your job script with sbatch
. The job script will request a single
task with --ntasks=1
and X
cores in the same physical server for this
task --cpus-per-task=X
. Requesting 1 task always implies --nodes=1
,
ensuring that all cores will be allocated in a single node.
Warning
Always check the documentation of the software used in your jobs on how to
run it in parallel. Some applications require to explicitly set the number
of threads or processes via a command line option. In such a case, we
recommend to set those options using the environment variable
${SLURM_CPUS_PER_TASK:-1}
, which corresponds to --cpus-per-task
of your
job.
The goal is to maximise performance by using all CPU cores allocated to your job and execute 1 thread or process on each CPU core. See the FAQ My jobs run slower than expected, what can I do? for more information.
3.4. Parallel MPI jobs#
Applications using MPI to handle parallel execution can distribute the workload among any number of cores located in any number of nodes. MPI manages the communication over the network between the running processes.
Example Fortran, C/C++ applications with support for MPI: VASP, NAMD, CP2K, GROMACS, ORCA, OpenFOAM, CESM. Python scripts using mpi4py.
Note
Applications supporting MPI can also be used in single-node setups if needed.
1#!/bin/bash
2#SBATCH --job-name=myjob
3#SBATCH --time=04:00:00
4#SBATCH --ntasks=X
5
6module load somemodule/1.1.1
7
8srun <mpi-program>
Submit your job script with sbatch
. The job script will request
--ntasks=X
, where X
is the total number of cores you need. By default,
each task gets allocated one CPU core, and CPU cores will be allocated in one or
more nodes, using as little nodes as possible.
Warning
Always check the documentation of the MPI software used in your jobs on how
to run it in parallel. Some applications require to explicitly set the
number of tasks via a command line option. In such a case, we recommend to
set those options using the environment variable $SLURM_NTASKS
, which
corresponds to --ntask
of your job.
The goal is to maximise performance by using all CPU cores allocated to your job and execute 1 task on each CPU core. See the FAQ My jobs run slower than expected, what can I do? for more information.
We highly recommend to launch your MPI applications in your job script with
srun
to ensure optimal run conditions for your job, although mpirun
can
still be used.
If you want to keep some extra control on how the tasks will be distributed
in the cluster, it is possible to specify the number of nodes with the
option --nodes=Y
. For example, minimizing the amount of nodes assigned
to the job can lead to better performance if the interconnect between nodes
is not very fast. Imagine a cluster composed of nodes with 24 cores where
you want to submit a job using 72 cores, but using precisely 3 full nodes.
You can do so by asking for --ntasks=72
and adding the extra option
--nodes=3
.
If you want to fix the number of cores per node, use the options --nodes=Y
--ntasks-per-node=Z
, where X=Y*Z
, and X
is the total number of cores.
This is useful if you don’t want full nodes, but the cores must be evenly spread
over the allocated nodes.
If you want to provide some flexibility to the resources allocated to your
job, it is also possible to provide a range of values to --nodes
, for
example with --nodes=3-5
. In such a case, cores will be allocated in any
amount of nodes in the range (although it will try to allocate as many nodes
as possible). It could still end up allocated in just 3 full nodes, but also
in other possible combination, e.g. two full nodes with their 24 cores
plus three other nodes with 8 cores each.
3.5. Hybrid MPI-threading jobs#
Some MPI applications support a hybrid approach, combining MPI and multithreading parallelism which may outperform pure MPI and multithreading.
Example CP2K, GROMACS
1#!/bin/bash
2#SBATCH --job-name=myjob
3#SBATCH --time=04:00:00
4#SBATCH --ntasks=X
5#SBATCH --cpus-per-task=Y
6
7module load somemodule/1.1.1
8
9export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
10
11srun --cpus-per-task=$SLURM_CPUS_PER_TASK <mpi-threading-program>
Submit your job script with sbatch
. The job script will request
--ntasks=X
, where X
is the number of tasks (MPI ranks). Each task gets
allocated Y
CPU cores (threads) in the same node as the task. The total
number of cores is X * Y
, and tasks will be allocated in one or more nodes,
using as little nodes as possible.
Note
The performance of hybrid jobs depends on the application, system, number of MPI ranks, and the ratio between MPI ranks and threads. Make sure to run benchmarks to get the best setup for your case.
3.6. Task farming#
Jobs can request multiple cores in the cluster, even distributed in multiple nodes, to run a collection of independent tasks. This is called task farming and the tasks carried out in such jobs can be any executable.
Example Embarrassingly parallel workload with 3 tasks
1#!/bin/bash
2#SBATCH --job-name=myjob
3#SBATCH --time=04:00:00
4#SBATCH --ntasks=3
5
6module load parallel/20210622-GCCcore-10.3.0
7module load some-other-module/1.1.1-foss-2021a
8
9parallel -j $SLURM_NTASKS srun -N 1 -n 1 -c 1 --exact <command> ::: *.input
Submit your job script with sbatch
. The job script will request
--ntasks=X
, where X
is the total number of cores you need (3 in the
example). By default, each task gets allocated one CPU core, and CPU cores will
be allocated in one or more nodes, using as few nodes as possible.
Execute each task with parallel
from GNU parallel
in combination with srun
. You can launch as many executions as needed and
parallel
will manage them in an orderly manner using the tasks allocated to
your job. srun
is the tool from Slurm that will allocate the resources for
each execution of <command>
with the input files matching *.input
. In
the example above, srun
will only use 1 task with 1 core per execution (-N
1 -n 1 -c 1
), but it is also possible to run commands with multiple cores. The
option --exact
is necessary to ensure that as many tasks as possible can run
in parallel.
See also
The command parallel
is very powerful, it can be used to submit
a batch of differents commands from a list, submit the same command with
multiple input files (as in the example above) or with multiple parameters.
Check the tutorial from GNU Parallel
for more information.
3.7. Job arrays#
Job arrays allow submitting multiple jobs with a single job script. All jobs in
the array are identical with the exception of their input parameters and/or file
names. The array ID of an individual job is given by the environment variable
$SLURM_ARRAY_TASK_ID
.
Note
Job arrays should only be used for jobs that take longer than ~10 minutes to avoid overloading the job scheduler. If your jobs take less time, please consider using Task farming instead.
The following example job script executes 10 jobs with array IDs 1 to 10. Each job uses a command that takes as input a file named infile-<arrayID>.dat:
1#!/bin/bash
2#SBATCH --job-name=myarrayjob
3#SBATCH --time=1:0:0
4#SBATCH --array=1-10
5
6<your-command> infile-$SLURM_ARRAY_TASK_ID.dat
Slurm has support for managing the job array as a whole or each individual array ID:
# Kill the entire job array:
scancel <jobID>
# Kill a single array ID or range of array IDs:
scancel <jobID>_<range_of_arrayIDs>
# Show summarized status info for a pending array IDs:
squeue
# Show individual status info for a pending array IDs:
squeue --array
See also
The Job array documentation by SchedMD, the Slurm developers.
3.8. Interactive jobs#
Interactive jobs are regular jobs that run in the compute nodes but open an interactive shell instead of executing a job script or any other command. These jobs are useful to carry out compute intensive tasks not suited for the login nodes and are commonly used to test regular job scripts in the environment of the compute nodes.
Interactive jobs are started with the command srun
using the option
--pty
to launch a new Bash shell. The resources for interactive jobs are
requested in the same way as for non-interactive jobs
with srun
.
srun --cpus-per-task=X [resources] --pty bash -l
Interactive jobs for parallel workflows with more than 1 CPU core are subject to the same considerations described in CPU cores allocation.
srun --ntasks=X [--nodes=Y] [resources] --pty bash -l
3.9. Data management#
All the HPC clusters in the VSC network have multiple storage partitions with
different characteristics. The most basic and common situation across all
clusters is that there is a slow but very reliable Data storage partition to
save important data (e.g. $VSC_DATA
) and a fast but less reliable
Scratch storage to run jobs with optimal performance (e.g.
$VSC_SCRATCH
). We encourage all users to check the section
HPC Data Storage for a detailed description of the available storage
partitions in the clusters of VUB-HPC.
Managing the data between VSC_SCRATCH
and VSC_DATA
can be cumbersome
for those users running many jobs regularly or for those users with large datasets.
The solution is to automatize as much as possible in your job scripts. The section
Data in jobs contains an example job script for a simple case
that can be used as starting point in your workflow.