3. GPU Job Types#

Jobs for GPUs are not different than standard non-GPU jobs. They will get a certain number of CPU cores and memory as described in the previous sections. GPUs are an extra resource on top of CPU cores and memory and their allocation is controlled by its own family of options. The job scheduler will automatically identify jobs requesting GPUs and send those to nodes with GPU accelerators.

Warning

Not all software modules can be used on GPUs. Only those modules with CUDA, fosscuda, goolfc or gompic in their name support offloading onto GPUs.

Slurm provides several options to request GPUs, but we strongly recommend to always use --gpus-per-node in our HPC clusters to avoid any issues. You might find the following common options in the Slurm documentation or other sources of information:

  • --gpus-per-node sets the amount of GPUs per node:

    Good option for all GPU jobs. Combined with --ntasks-per-gpu, all tasks in your job are guaranteed to have access to a single GPU and use CPU cores located in the same socket of that GPU. This is the optimal configuration for performance.

  • --gpus-per-task sets the amount of GPUs per task:

    Not recommended as it must be combined with --ntasks-per-socket in to ensure that the CPU cores allocated to each task are in the same socket as their corresponding GPU. Instead we recommend using --gpus-per-node with --ntasks-per-gpu for those jobs executing multiple tasks on GPUs.

  • --gpus-per-socket sets the amount of GPUs per CPU socket:

    Not very useful given that all GPU nodes in Hydra currently have 1 GPU per socket, which is the default.

  • --gres gpu:X sets the amount of GPUs per node:

    Older option that is equivalent to --gpus-per-node.

  • --gpus sets the total amount of GPUs of the job:

    Good for single GPU jobs but not recommended for multi-GPU jobs. There is no control over which tasks get assigned to each GPU. Instead we recommend using --gpus-per-node with --ntasks-per-gpu for those jobs executing multiple tasks on several GPUs.

3.1. GPU generation#

Jobs can request a specific GPU generation or model with the following options:

  • -p pascal_gpu for the Nvidia P100

  • -p ampere_gpu for the Nvidia A100

For instance, you might need to use a specific GPU type to reproduce previous results, or if your job needs more GPU memory than what is available in older GPU models. The characteristics of our GPUs are listed in VSC Docs: Hydra Hardware. Keep in mind that more specific job requests will probably have to wait longer in the queue.

3.2. Memory settings of GPU jobs#

The amount of system memory assigned to your job automatically scales with the number of CPU cores requested and follows the same rules as for non-GPU jobs.

Alternatively you can use --mem-per-gpu=X to define the amount of system memory depending on the number of GPU allocated to your job. This setting is not related to the memory of the GPU cards though, it only affects the memory available on the CPUs.

3.3. Single GPU jobs#

Submit your job script with sbatch and request GPUs with --gpus-per-node=X, where X is the number of GPUs on each requested node. In this case, the option --nodes=1 limits the number of nodes to 1 and hence, all GPUs will be physically located in the same node. You can check the maximum number of GPUs per node in the hardware specifications.

Basic multi-core, single-GPU Slurm batch script#
 1#!/bin/bash
 2#SBATCH --job-name=mygpujob
 3#SBATCH --time=04:00:00
 4#SBATCH --nodes=1
 5#SBATCH --gpus-per-node=1
 6#SBATCH --cpus-per-gpu=16
 7
 8module load CoolGPUSoftware/x.y.z-foss-2021a-CUDA-11.3.1
 9
10<cool-gpu-program>

Applications executed on GPUs still need some amount of CPU power to work. By default, all jobs will only get 1 task with 1 CPU core. If your software will execute more than 1 process in parallel or multiple independent tasks on the GPUs, then you can use the option --ntasks-per-gpu to set the number of tasks and/or --cpus-per-gpu to set the amount of CPU cores for the tasks on each GPU.

Important

Never request more cores per GPU than C/G, where C is the number of cores in the node and G is the number of GPUs in the node.

3.4. Multi GPU jobs#

Jobs can also request as many GPUs as available in each partition of GPUs in the cluster (it is not limited to a single node). In this case, request the number of nodes with --nodes=N and adjust how many GPUs on each node it will use use with --gpus-per-node=G. Hence, the total number of GPUs for your job will be N × G. In the example above, the job requests 4 GPUs in total (2 GPUs per node) and 1 task on each GPU with 8 CPU cores. The hardware specifications show the distribution of GPUs and nodes in each partition.

Important

Not all software supports using multiple GPUs in different nodes. In case of doubt, check the documentation of your software or contact VUB-HPC Support

Example Slurm batch script with 4 GPUs in 2 nodes#
 1#!/bin/bash
 2#SBATCH --job-name=mygpujob
 3#SBATCH --time=04:00:00
 4#SBATCH --nodes=2
 5#SBATCH --gpus-per-node=2
 6#SBATCH --ntasks-per-gpu=1
 7#SBATCH --cpus-per-gpu=16
 8
 9module load CoolGPUSoftware/x.y.z-foss-2021a-CUDA-11.3.1
10
11srun -n 1 --exact <cool-gpu-program> <input_1> &
12srun -n 1 --exact <cool-gpu-program> <input_2> &
13srun -n 1 --exact <cool-gpu-program> <input_3> &
14srun -n 1 --exact <cool-gpu-program> <input_4> &
15wait

3.5. Advanced: task distribution in GPUs#

Slurm provides many options to configure the request of GPU resources for your jobs. We have seen that all those options can behave differently depending on which other options are used in the job. Usually, those differences do not manifest in single GPU jobs, but they can impact jobs executing multiple tasks on multiple GPUs.

The following tables show how various options will distribute tasks among the GPUs allocated to the job. The resulting task distribution is color coded in 4 main outcomes:

N – N Correct task distribution

Tasks are evenly distributed among GPUs, each task can access a single GPU.

2N – 2N Undefined task distribution

Tasks are not distributed among GPUs, all tasks have access to all GPUs in the node. This outcome is not necessarily bad, it is up to the software executed in the job to do the right thing.

I – J Wrong task distribution

Tasks are randomly distributed among GPUs, each task can access a single GPU. This outcome will hinder performance as the distribution of tasks is not what was intended for the job.

× – × Bad binding

Tasks are distributed in the wrong CPU socket for the allocated GPU. This outcome can result in the job not starting due the wrong binding of GRES resources.

Our recommendations outlined at beginning of this page are based on this results

  • single GPU jobs: --gpus

  • multi GPU jobs: --gpus-per-node + --ntasks-per-gpu

3.5.1. Option –gpus#

Distribution of tasks with --gpus and --ntasks#

--ntasks
1 GPU in 1 node
--gpus=1 --nodes=1
2 GPUs in 1 node
--gpus=2 --nodes=1

Total Tasks

GPU0 – GPU1

GPU0 – GPU1

2

2 – 0

2 – 2

8

8 – 0

8 – 8

16

16 – 0

16 – 16

24

24 – 0

24 – 24

32

32 – 0

32 – 32

Distribution of tasks with --gpus and --ntasks-per-node#

--ntasks-per-node
1 GPU in 1 node
--gpus=1 --nodes=1
2 GPUs in 1 node
--gpus=2 --nodes=1
2 GPUs in 2 node
--gpus=2 --nodes=2

Total Tasks

GPU0 – GPU1

GPU0 – GPU1

GPU0 – GPU1

2

2 – 0

× – ×

1 – 1

8

8 – 0

× – ×

4 – 4

16

16 – 0

× – ×

8 – 8

24

24 – 0

24 – 24

12 – 12

32

32 – 0

32 – 32

16 – 16

3.5.2. Option –gpus-per-node#

Distribution of tasks with --gpus-per-node and --ntasks#

--ntasks
1 GPU in 1 node
--gpus-per-node=1 --nodes=1
2 GPUs in 1 node
--gpus-per-node=2 --nodes=1
2 GPUs in 2 node
--gpus-per-node=1 --nodes=2

Total Tasks

GPU0 – GPU1

GPU0 – GPU1

GPU0 – GPU1

2

2 – 0

2 – 2

1 – 1

8

8 – 0

8 – 8

1 – 7

16

16 – 0

16 – 16

1 – 15

24

× – ×

24 – 24

8 – 16

32

× – ×

32 – 32

16 – 16

Distribution of tasks with --gpus-per-node and --ntasks-per-node#

--ntasks-per-node
1 GPU in 1 node
--gpus-per-node=1 --nodes=1
2 GPUs in 1 node
--gpus-per-node=2 --nodes=1
2 GPUs in 2 node
--gpus-per-node=1 --nodes=2

Total Tasks

GPU0 – GPU1

GPU0 – GPU1

GPU0 – GPU1

2

2 – 0

2 – 2

1 – 1

8

8 – 0

8 – 8

4 – 4

16

16 – 0

16 – 16

8 – 8

24

× – ×

24 – 24

12 – 12

32

× – ×

32 – 32

16 – 16

Distribution of tasks with --gpus-per-node and --ntasks-per-gpu#

--ntasks-per-gpu
1 GPU in 1 node
--gpus-per-node=1 --nodes=1
2 GPUs in 1 node
--gpus-per-node=2 --nodes=1
2 GPUs in 2 node
--gpus-per-node=1 --nodes=2

Total Tasks

GPU0 – GPU1

GPU0 – GPU1

GPU0 – GPU1

2

2 – 0

1 – 1

1 – 1

8

8 – 0

4 – 4

1 – 7

16

16 – 0

8 – 8

1 – 15

24

× – ×

12 – 12

12 – 12

32

× – ×

16 – 16

16 – 16

3.5.3. Option –gpus-per-task#

Distribution of tasks with --gpus-per-task and --ntasks#

--ntasks
1 GPU in 1 node
--gpus-per-task=1 --nodes=1
2 GPUs in 1 node
--gpus-per-task=2 --nodes=1

Total Tasks

GPU0 – GPU1

GPU0 – GPU1

1

1 – 0

2

× – ×

Distribution of tasks with --gpus-per-task and --ntasks-per-node#

--ntasks-per-node
1 GPU in 1 node
--gpus-per-task=1 --nodes=1
2 GPUs in 1 node
--gpus-per-task=2 --nodes=1
2 GPUs in 2 node
--gpus-per-task=1 --nodes=2

Total Tasks

GPU0 – GPU1

GPU0 – GPU1

GPU0 – GPU1

1

1 – 0

2

× – ×

1 – 1