3. Main Job Types#

Jobs in VUB clusters start with a clean environment to guarantee the reproducibility of your simulations. This means that the commands available at job start will be the default in the system, without any software modules loaded and without those commands from software installed in the user directory (i.e. the $PATH environment variable is reset). All jobs have to explicitly load/enable any software from modules or in the user directory. This approach was already applied in Hydra with the Torque/Moab scheduler and continues with Slurm.

Note

If you look for information about Slurm elsewhere, be aware that by default Slurm copies the environment from the submission shell into the job. Users who prefer this mode of operation can find information on how to enable it in How can I copy the login shell environment to my jobs?

Job scripts are submitted to the queue in Slurm with the sbatch command. This is equivalent to qsub in Torque/Moab. Job resources and other job options can be either set in the command line or in the header of your job script with the #SBATCH keyword. Here below we provide basic examples of job scripts for a variety of common use cases to illustrate its usage.

3.1. Serial jobs#

These jobs perform executions in serial and only require one CPU core.

Example Scripts in R, Python, MATLAB are serial by default (these environments can do parallel execution, but it has to be enabled with extra modules/packages).

Basic single core Slurm batch script#

#!/bin/bash
#SBATCH --job-name=myjob
#SBATCH --time=04:00:00

module load somemodule/1.1.1

<more-commands>

Submit your job script with sbatch, no extra options are needed. The job will get a single core and a default amount of memory, which is sufficient for most serial jobs.

3.2. Jobs for GPUs#

The different types of jobs for GPUs are covered in their own page GPU Job Types.

3.3. Parallel non-MPI jobs#

Applications that use threads or subprocesses are capable of parallelizing the computation on multiple CPU cores in a single node. Typically, such software use specific tools/libraries that can organize the parallel execution in the local system of the node, not using any network communication.

Example Code in C/C++ using OpenMP or Pthreads. R scripts using doParallel. Python scripts using the multiprocessing module or other modules supporting parallel execution, e.g. TensorFlow and PyTorch.

Basic multi-core Slurm batch script#

#!/bin/bash
#SBATCH --job-name=myjob
#SBATCH --time=04:00:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=X

module load somemodule/1.1.1

<more-commands>

Submit your job script with sbatch. The job script will request a single task with --ntasks=1 and X cores in the same physical server for this task --cpus-per-task=X. Requesting 1 task always implies --nodes=1, ensuring that all cores will be allocated in a single node.

Warning

Always check the documentation of the software used in your jobs on how to run it in parallel. Some applications require to explicitly set the number of threads or processes via a command line option. In such a case, we recommend to set those options using the environment variable ${SLURM_CPUS_PER_TASK:-1}, which corresponds to --cpus-per-task of your job.

The goal is to maximise performance by using all CPU cores allocated to your job and execute 1 thread or process on each CPU core. See the FAQ My jobs run slower than expected, what can I do? for more information.

3.4. Parallel MPI jobs#

Applications using MPI to handle parallel execution can distribute the workload among any number of cores located in any number of nodes. MPI manages the communication over the network between the running processes.

Example Fortran, C/C++ applications with support for MPI: VASP, NAMD, CP2K, GROMACS, ORCA, OpenFOAM, CESM. Python scripts using mpi4py.

Note

Applications supporting MPI can also be used in single-node setups if needed.

Basic MPI Slurm batch script#

#!/bin/bash
#SBATCH --job-name=myjob
#SBATCH --time=04:00:00
#SBATCH --ntasks=X

module load somemodule/1.1.1

srun <mpi-program>

Submit your job script with sbatch. The job script will request --ntasks=X, where X is the total number of cores you need. By default, each task gets allocated one CPU core, and CPU cores will be allocated in one or more nodes, using as little nodes as possible.

Warning

Always check the documentation of the MPI software used in your jobs on how to run it in parallel. Some applications require to explicitly set the number of tasks via a command line option. In such a case, we recommend to set those options using the environment variable $SLURM_NTASKS, which corresponds to --ntask of your job.

The goal is to maximise performance by using all CPU cores allocated to your job and execute 1 task on each CPU core. See the FAQ My jobs run slower than expected, what can I do? for more information.

We highly recommend to launch your MPI applications in your job script with srun to ensure optimal run conditions for your job, although mpirun can still be used.

If you want to keep some extra control on how the tasks will be distributed in the cluster, it is possible to specify the number of nodes with the option --nodes=Y. For example, minimizing the amount of nodes assigned to the job can lead to better performance if the interconnect between nodes is not very fast. Imagine a cluster composed of nodes with 24 cores where you want to submit a job using 72 cores, but using precisely 3 full nodes. You can do so by asking for --ntasks=72 and adding the extra option --nodes=3.

If you want to fix the number of cores per node, use the options --nodes=Y --ntasks-per-node=Z, where X=Y*Z, and X is the total number of cores. This is useful if you don’t want full nodes, but the cores must be evenly spread over the allocated nodes.

If you want to provide some flexibility to the resources allocated to your job, it is also possible to provide a range of values to --nodes, for example with --nodes=3-5. In such a case, cores will be allocated in any amount of nodes in the range (although it will try to allocate as many nodes as possible). It could still end up allocated in just 3 full nodes, but also in other possible combination, e.g. two full nodes with their 24 cores plus three other nodes with 8 cores each.

3.5. Hybrid MPI-threading jobs#

Some MPI applications support a hybrid approach, combining MPI and multithreading parallelism which may outperform pure MPI and multithreading.

Example CP2K, GROMACS

Hybrid MPI-threading Slurm batch script#

#!/bin/bash
#SBATCH --job-name=myjob
#SBATCH --time=04:00:00
#SBATCH --ntasks=X
#SBATCH --cpus-per-task=Y

module load somemodule/1.1.1

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

srun --cpus-per-task=$SLURM_CPUS_PER_TASK <mpi-threading-program>

Submit your job script with sbatch. The job script will request --ntasks=X, where X is the number of tasks (MPI ranks). Each task gets allocated Y CPU cores (threads) in the same node as the task. The total number of cores is X * Y, and tasks will be allocated in one or more nodes, using as little nodes as possible.

Note

The performance of hybrid jobs depends on the application, system, number of MPI ranks, and the ratio between MPI ranks and threads. Make sure to run benchmarks to get the best setup for your case.

3.6. Task farming#

Jobs can request multiple cores in the cluster, even distributed in multiple nodes, to run a collection of independent tasks. This is called task farming and the tasks carried out in such jobs can be any executable.

Example Embarrassingly parallel workload with 3 tasks

Basic task farming Slurm batch script#

#!/bin/bash
#SBATCH --job-name=myjob
#SBATCH --time=04:00:00
#SBATCH --ntasks=3

module load parallel/20240722-GCCcore-13.3.0
module load some-other-module/1.1.1-foss-2024a

parallel -j $SLURM_NTASKS srun -N 1 -n 1 -c 1 --exact <command> ::: *.input

Submit your job script with sbatch. The job script will request --ntasks=X, where X is the total number of cores you need (3 in the example). By default, each task gets allocated one CPU core, and CPU cores will be allocated in one or more nodes, using as few nodes as possible.

Execute each task with parallel from GNU parallel in combination with srun. You can launch as many executions as needed and parallel will manage them in an orderly manner using the tasks allocated to your job. srun is the tool from Slurm that will allocate the resources for each execution of <command> with the input files matching *.input. In the example above, srun will only use 1 task with 1 core per execution (-N 1 -n 1 -c 1), but it is also possible to run commands with multiple cores. The option --exact is necessary to ensure that as many tasks as possible can run in parallel.

3.7. Job arrays#

Job arrays allow submitting multiple jobs with a single job script. All jobs in the array are identical with the exception of their input parameters and/or file names. The array ID of an individual job is given by the environment variable $SLURM_ARRAY_TASK_ID.

Note

Job arrays should only be used for jobs that take longer than ~10 minutes to avoid overloading the job scheduler. If your jobs take less time, please consider using Task farming instead.

The following example job script executes 10 jobs with array IDs 1 to 10. Each job uses a command that takes as input a file named infile-<arrayID>.dat:

Job array Slurm batch script#

#!/bin/bash
#SBATCH --job-name=myarrayjob
#SBATCH --time=1:0:0
#SBATCH --array=1-10

<your-command> infile-$SLURM_ARRAY_TASK_ID.dat

Slurm has support for managing the job array as a whole or each individual array ID:

# Kill the entire job array:
scancel <jobID>

# Kill a single array ID or range of array IDs:
scancel <jobID>_<range_of_arrayIDs>

# Show summarized status info for a pending array IDs:
squeue

# Show individual status info for a pending array IDs:
squeue --array

3.8. Interactive jobs#

Interactive jobs are regular jobs that run in the compute nodes but open an interactive shell instead of executing a job script or any other command. These jobs are useful to carry out compute intensive tasks not suited for the login nodes and are commonly used to test regular job scripts in the environment of the compute nodes.

Interactive jobs are started with the command srun using the option --pty to launch a new Bash shell. The resources for interactive jobs are requested in the same way as for non-interactive jobs with srun.

Standalone interactive Parallel non-MPI jobs#

srun --cpus-per-task=X [resources] --pty bash -l

Interactive jobs for parallel workflows with more than 1 CPU core are subject to the same considerations described in CPU cores allocation.

Standalone interactive Parallel MPI jobs#

srun --ntasks=X [--nodes=Y] [resources] --pty bash -l

3.9. Data management#

All the HPC clusters in the VSC network have multiple storage partitions with different characteristics. The most basic and common situation across all clusters is that there is a slow but very reliable Data storage partition to save important data (e.g. $VSC_DATA) and a fast but less reliable Scratch storage to run jobs with optimal performance (e.g. $VSC_SCRATCH). We encourage all users to check the section HPC Data Storage for a detailed description of the available storage partitions in the clusters of VUB-HPC.

Managing the data between VSC_SCRATCH and VSC_DATA can be cumbersome for those users running many jobs regularly or for those users with large datasets. The solution is to automatize as much as possible in your job scripts. The section Data in jobs contains an example job script for a simple case that can be used as starting point in your workflow.