5. FAQ: Advanced Questions#

5.1. How can I copy the login shell environment to my jobs?#

Jobs in VUB clusters start with a clean environment to guarantee the reproducibility of your simulations. This means that the commands available at job start will be the default in the system, without any software modules loaded and without those commands from software installed in the user directory (i.e. the $PATH environment variable is reset).

Note

If you look for information about Slurm elsewhere, be aware that by default Slurm copies the environment from the submission shell into the job.

We recommend that your job scripts explicitly set the environment from scratch every time (rather than setting it by default for all your jobs in ~/.bashrc or ~/.bash_profile). To minimize the probability of errors, each job should load the minimum number of software modules to function and also enable the minimum amount of software in the user directory. This usually boils down to:

Load the needed software modules
Add to the environment ($PATH) only the needed software from your user directory
Declare any other variables used in the job

Users with complex workflows can override the default behavior of the VUB clusters to export their shell environment directly into their jobs at submission. Be aware that this can lead to unexpected side effects. This functionality is only supported for jobs submitted from within another job and requires that the new job runs within the same partition:

sbatch --export=ALL --partition=$SLURM_JOB_PARTITION <job-script>

5.2. How can I request multiple GPUs in the same node?#

Check the section GPU Job Types for a detailed description and examples on how to use multiple GPUs in your jobs.

5.3. Using GPUs in multi-processing jobs#

In Hydra, by default the GPU cards operate in process shared mode, meaning that an unlimited number of processes can use the GPU. However, at any moment in time only a single process can use the GPU. Using Multi-Process Service (MPS), multiple processes can have access to (parts of) the GPU at the same time, which may greatly improve performance.

To use MPS, launch the nvidia-cuda-mps-control daemon at the beginning of your job script. The daemon will automatically handle the multiple processes and coordinate access to the GPUs:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --gpus-per-node=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8

export CUDA_MPS_PIPE_DIRECTORY=$TMPDIR/nvidia-mps-pipe
export CUDA_MPS_LOG_DIRECTORY=$TMPDIR/nvidia-mps-log
nvidia-cuda-mps-control -d

<more-commands>

5.4. How to run Python in parallel?#

Python without any additional modules can only use a single core and will not parallelize. However, several Python modules can run in parallel

Popular scientific computing packages such as numpy or scipy will automatically use multi-threading with OpenMP
Some Python applications also support more advanced parallelization methods, such as TensorFlow or PyTorch
You can implement your own parallelization in your scripts with the Python module multiprocessing

For optimal parallel scaling performance of CPU-bound tasks, the general rule of thumb is to assign one CPU core to each process. However, the aforementioned parallelization options in Python operate independently from one another and can produce an explosion of processes in the job if they are combined. Usually this translates to jobs inadvertedly spawning hundreds of processes and tanking performance.

Jobs using Python modules that run in parallel require special attention:

Check all modules used in your Python scripts from beginning to end. Combining multiple modules that run in parallel can result in an explosion of processes. For instance, we frequently see users running Python scripts in parallel with the module multiprocessing that under the hood also execute numpy or PyTorch.
Calculate how many processes will be created by your code and check the number of cores requested by your job. It is usually not the same as the total number of cores in the node (unless you specifically request a full node) and you should take into account all parts in your code that run in parallel.

Job scripts using multiprocessing in Python can control the amount of generated processes with the following settings:

In your job script, disable OpenMP parallelization to avoid over-parallelizing with numpy or any other module that uses OpenMP:

#!/bin/bash
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=X

# set number of threads equal to 1
export OMP_NUM_THREADS=1

<load-modules>
python <path-to-script.py>

In your Python script, create as many processes with multiprocessing as cores allocated to your job:

import os
import multiprocessing

# obtain number of cores allocated to the job
ncore = len(os.sched_getaffinity(0))

# set number of processes equal to number of cores
with multiprocessing.Pool(processes=ncore) as p:
    <parallel-code>

5.5. How can I run R in parallel?#

R without any additional packages can only use a single core and will not parallelize. However, there are some options to run R in parallel:

Some R packages such as stats and data.table will automatically use multi-threading with OpenMP
You can implement your own parallelization in your scripts with the R package doParallel

For optimal parallel scaling performance of CPU-bound tasks, the general rule of thumb is to assign one CPU core to each process. However, the aforementioned parallelization options in R operate independently from one another and can produce an explosion of processes in the job if they are combined. Usually this translates to jobs inadvertedly spawning hundreds of processes and tanking performance.

Jobs using R scripts that run in parallel require special attention:

Check all modules used in your R scripts from beginning to end. Combining multiple modules that run in parallel can result in an explosion of processes.
Calculate how many processes will be created by your code and check the number of cores requested by your job. It is usually not the same as the total number of cores in the node (unless you specifically request a full node) and you should take into account all parts in your code that run in parallel.

Job scripts using doParallel in R can control the amount of generated processes with the following settings:

In your job script, disable OpenMP parallelization to avoid over-parallelizing with any other R package that uses OpenMP:

#!/bin/bash
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=X

# set number of threads equal to 1
export OMP_NUM_THREADS=1

<load-modules>
Rscript <path-to-script.R>

In your R code, create as many processes with doParallel as cores allocated to your job:

library(doParallel)

# obtain number of cores allocated to the job
ncore = future::availableCores()

# set number of processes equal to number of cores
registerDoParallel(ncore)

<parallel-code>