6. GPU Job Types#

Jobs for GPUs are not different than standard non-GPU jobs. They will get a certain number of CPU cores and memory as described in previous sections. GPUs is an extra resource on top of those and its allocation is controlled by its own family of options. The job scheduler Slurm will automatically identify jobs requesting GPUs and send those to nodes with GPU accelerators.

Warning

Not all software modules can be used on GPUs. Only those modules with CUDA in their version string support offloading onto GPUs.

Slurm provides several options to request GPUs, you might find the following common ones in the Slurm documentation or other sources of information:

--gpus=X sets the total amount of GPUs allocated to the job to X
--gpus-per-node=X allocates X GPUs for each node where the job runs
--gpus-per-task=X allocates X GPUs for each task requested by the job
--gpus-per-socket=X allocates X GPUs for each CPU socket used by the job
--gres gpu:X older option that allocates X GPUs per node (equivalent to --gpus-per-node)

6.1. GPU generation#

Jobs can request a specific GPU generation or model with the following options:

-p pascal_gpu for the Nvidia P100
-p ampere_gpu for the Nvidia A100

For instance, you might need to use a specific GPU type to reproduce previous results, or if your job needs more GPU memory than what is available in older GPU models. The characteristics of our GPUs are listed in VSCdocHydra Hardware. Keep in mind that more specific job requests will probably have to wait longer in the queue.

6.2. Memory settings of GPU jobs#

The amount of system memory assigned to your job automatically scales with the number of CPU cores requested and follows the same rules as for non-GPU jobs.

Alternatively you can use --mem-per-gpu=X to define the amount of system memory depending on the number of GPU allocated to your job. This setting is not related to the memory of the GPU cards though, it only affects the memory available on the CPUs.

6.3. Single GPU jobs#

Recommended Use --gpus in your single GPU jobs.

All GPU options in Slurm work well for single GPU jobs. We recommend requesting a single GPU with --gpus=1 for simplicity. The option --gpus does not need any other considerations beyond the amount of requested GPUs.

Basic multi-core, single-GPU Slurm batch script#

#!/bin/bash
#SBATCH --job-name=mygpujob
#SBATCH --time=04:00:00
#SBATCH --gpus=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=16

module load CoolGPUSoftware/x.y.z-foss-2024a-CUDA-12.6.0

<cool-gpu-program>

Applications executed on GPUs still need some amount of CPU power to work. By default, all jobs will only get 1 task with 1 CPU core. If your software will execute more than 1 process in parallel or multiple independent tasks on the GPUs, then you can use the option --ntasks to set a number of tasks larger than 1. Keep in mind to adjust --cpus-per-task accordingly to ensure that the total amount of cores used in your job is equal to those available to the GPU.

Important

It is not allowed to request more cores per GPU than those available to it. For nodes with 2 GPUs that is half the cores of the node. Our hardware specifications show the amount of cores available in the nodes of our clusters.

6.4. Multi GPU jobs#

Recommended Use --gpus-per-node combined with --ntasks-per-gpu in your multi-GPU jobs.

Jobs can request as many GPUs as available in each partition of GPUs in the cluster (it is not limited to a single node). In this case, we recommend requesting the number of nodes with --nodes=N and adjusting how many GPUs on each node it will use with --gpus-per-node=G. Hence, the total number of GPUs for your job will be N × G.

In the example below, the job requests 4 GPUs in total (2 GPUs in 2 nodes) and 1 tasks on each GPU (4 in total) with 16 CPU cores. Check the hardware specifications to see the distribution of GPUs and nodes in each partition of the cluster.

Important

Not all software supports using multiple GPUs in different nodes. In case of doubt, check the documentation of your software or contact VUB-HPC Support

Example Slurm batch script with 4 GPUs in 2 nodes#

#!/bin/bash
#SBATCH --job-name=mygpujob
#SBATCH --time=04:00:00
#SBATCH --nodes=2
#SBATCH --gpus-per-node=2
#SBATCH --ntasks-per-gpu=1
#SBATCH --cpus-per-task=16

module load CoolGPUSoftware/x.y.z-foss-2024a-CUDA-12.6.0

srun -n 1 --exact <cool-gpu-program> <input_1> &
srun -n 1 --exact <cool-gpu-program> <input_2> &
srun -n 1 --exact <cool-gpu-program> <input_3> &
srun -n 1 --exact <cool-gpu-program> <input_4> &
wait

Avoid setting job tasks with either --ntasks or --ntasks-per-node in multi-GPU jobs. Those can result in unbound task distributions, where there is no restriction on which GPUs a single task can use. Hence, all tasks in a node can potentially run on the same GPU if your software application is not properly handling that situation.

The option --gpus will work well for multi-GPU jobs as long as it is combined with --ntasks-per-gpu as well.

6.5. Advanced: task distribution in GPUs#

Slurm provides many options to configure the request of GPU resources for your jobs. We have detected that those options can result in different outcomes depending on which other options are used in your job. The allocation of GPUs and distribution of tasks among GPUs of single GPU jobs is very consistent across the board, but these different options can impact jobs executing multiple tasks on multiple GPUs.

Note

Our recommendations for Single GPU jobs and Multi GPU jobs are based on the results shown in the tables below.

The following tables show how various options will distribute tasks among the GPUs allocated to the job. The resulting task distribution is color coded in 4 main outcomes:

N – N Correct task distribution: Tasks are evenly distributed among GPUs, each task can access a single GPU.
2N – 2N Undefined task distribution: Tasks are correctly distributed among the CPUs bound to each GPU, but they can access all GPUs allocated to the job on that node. This outcome is not necessarily bad, it is up to the software application used in the job to pick the correct GPU for each task/process.
I – J Wrong task distribution: Tasks are assigned to a single GPU, but the distribution does not follow the configuration set in the job. This outcome will hinder performance as the distribution of tasks is not what was intended for the job.
error Bad configuration: Job will not start due to errors or due to the wrong binding of CPU/GPU resources as tasks would be distributed in the wrong CPU socket for the allocated GPU.

Note

The following results were obtained with Slurm version 24.05.4

6.5.1. Option –gpus#

Distribution of tasks across requested GPUs using the --gpus option of sbatch. Examples carried out on the nodes of the ampere_gpu partition with 2 GPUs per node and 16 CPU cores per GPU.

Distribution of tasks with `--gpus` and `--ntasks`#
`--ntasks`	1 GPU in 1 node `--gpus=1 --nodes=1`	2 GPUs in 1 node `--gpus=2 --nodes=1`	2 GPUs in 2 nodes `--gpus=2 --nodes=2`
Total Tasks	GPU: 0 – 1	GPU: 0 – 1	GPU: 0 – 1 GPU: 0 – 1
2	2 – 0	2 – 2	disallowed
8	8 – 0	8 – 8	4 – 4 4 – 4
16	16 – 0	16 – 16	8 – 8 8 – 8
24	disallowed	24 – 24	12 – 12 12 – 12
32	disallowed	32 – 32	16 – 16 16 – 16

Distribution of tasks with `--gpus` and `--ntasks-per-node`#
`--ntasks-per-node`	1 GPU in 1 node `--gpus=1 --nodes=1`	2 GPUs in 1 node `--gpus=2 --nodes=1`	2 GPUs in 2 nodes `--gpus=2 --nodes=2`
Total Tasks	GPU: 0 – 1	GPU: 0 – 1	GPU: 0 – 1 GPU: 0 – 1
2	2 – 0	2 – 2	disallowed
8	8 – 0	8 – 8	4 – 4 4 – 4
16	16 – 0	16 – 16	8 – 8 8 – 8
24	disallowed	24 – 24	12 – 12 12 – 12
32	disallowed	32 – 32	16 – 16 16 – 16

Distribution of tasks with `--gpus` and `--ntasks-per-gpu`#
`--ntasks-per-gpu`	1 GPU in 1 node `--gpus=1 --nodes=1`	2 GPUs in 1 node `--gpus=2 --nodes=1`	2 GPUs in 2 nodes `--gpus=2 --nodes=2`
Total Tasks	GPU: 0 – 1	GPU: 0 – 1	GPU: 0 – 1 GPU: 0 – 1
2	2 – 0	1 – 1	disallowed
8	8 – 0	4 – 4	2 – 2 2 – 2
16	16 – 0	8 – 8	4 – 4 4 – 4
24	disallowed	12 – 12	6 – 6 6 – 6
32	disallowed	16 – 16	8 – 8 8 – 8

6.5.2. Option –gpus-per-node#

Distribution of tasks across requested GPUs using the --gpus-per-node option of sbatch. Examples carried out on the nodes of the ampere_gpu partition with 2 GPUs per node and 16 CPU cores per GPU.

Distribution of tasks with `--gpus-per-node` and `--ntasks`#
`--ntasks`	1 GPU in 1 node `--gpus-per-node=1` `--nodes=1`	2 GPUs in 1 node `--gpus-per-node=2` `--nodes=1`	2 GPUs in 2 nodes `--gpus-per-node=1` `--nodes=2`
Total Tasks	GPU: 0 – 1	GPU: 0 – 1	GPU: 0 – 1 GPU: 0 – 1
2	2 – 0	2 – 2	disallowed
8	8 – 0	8 – 8	4 – 4 4 – 4
16	16 – 0	16 – 16	8 – 8 8 – 8
24	disallowed	24 – 24	12 – 12 12 – 12
32	disallowed	32 – 32	16 – 16 16 – 16

Distribution of tasks with `--gpus-per-node` and `--ntasks-per-node`#
`--ntasks-per-node`	1 GPU in 1 node `--gpus-per-node=1` `--nodes=1`	2 GPUs in 1 node `--gpus-per-node=2` `--nodes=1`	2 GPUs in 2 nodes `--gpus-per-node=1` `--nodes=2`
Total Tasks	GPU: 0 – 1	GPU: 0 – 1	GPU: 0 – 1 GPU: 0 – 1
2	2 – 0	2 – 2	disallowed
8	8 – 0	8 – 8	4 – 4 4 – 4
16	16 – 0	16 – 16	8 – 8 8 – 8
24	disallowed	24 – 24	12 – 12 12 – 12
32	disallowed	32 – 32	16 – 16 16 – 16

Distribution of tasks with `--gpus-per-node` and `--ntasks-per-gpu`#
`--ntasks-per-gpu`	1 GPU in 1 node `--gpus-per-node=1` `--nodes=1`	2 GPUs in 1 node `--gpus-per-node=2` `--nodes=1`	2 GPUs in 2 nodes `--gpus-per-node=1` `--nodes=2`
Total Tasks	GPU: 0 – 1	GPU: 0 – 1	GPU: 0 – 1 GPU: 0 – 1
2	2 – 0	1 – 1	disallowed
8	8 – 0	4 – 4	2 – 2 2 – 2
16	16 – 0	8 – 8	4 – 4 4 – 4
24	disallowed	12 – 12	6 – 6 6 – 6
32	disallowed	16 – 16	8 – 8 8 – 8

6.5.3. Option –gpus-per-task#

Distribution of tasks across requested GPUs using the --gpus-per-task option of sbatch. Examples carried out on the nodes of the ampere_gpu partition with 2 GPUs per node and 16 CPU cores per GPU.

Distribution of tasks with `--gpus-per-task` and `--ntasks`#
`--ntasks`	1 GPU in 1 node `--gpus-per-task=1` `--nodes=1`	2 GPUs in 1 node `--gpus-per-task=2` `--nodes=1`	2 GPUs in 2 nodes `--gpus-per-task=1` `--nodes=2`
Total Tasks	GPU: 0 – 1	GPU: 0 – 1	GPU: 0 – 1 GPU: 0 – 1
1	1 – 0	error	disallowed
2	1 – 1	error	1 – 0 1 – 0
4	disallowed	error	1 – 1 1 – 1

Distribution of tasks with `--gpus-per-task` and `--ntasks-per-node`#
`--ntasks-per-node`	1 GPU in 1 node `--gpus-per-task=1` `--nodes=1`	2 GPUs in 1 node `--gpus-per-task=2` `--nodes=1`	2 GPUs in 2 nodes `--gpus-per-task=1` `--nodes=2`
Total Tasks	GPU: 0 – 1	GPU: 0 – 1	GPU: 0 – 1 GPU: 0 – 1
1	1 – 0	error	disallowed
2	1 – 1	error	1 – 0 1 – 0
4	disallowed	error	1 – 1 1 – 1

6.6. Shared GPUs on Anansi#

The Anansi cluster is meant for testing or debugging jobs. It is smaller than Hydra but its resources can be shared between multiple jobs which leads to lower queueing times. The characteristics of our Anansi GPUs are listed in VSCdocAnansi Hardware.

On Anansi, it is not possible to request whole GPUs for your job. Instead, you can request parts, also called shards, of a GPU. These can for example be requested using:

--gres=shard:N to request N GPU shards.
--tres-per-task=gres/shard:N to request N GPU shards per task.

Note that you can request up to 4 shards per job, which corresponds to a full GPU. It is not possible to request more than 4 shards per job.