5. FAQ: Advanced Questions#
5.1. How can I copy the login shell environment to my jobs?#
Jobs in VUB clusters start with a clean environment to guarantee the
reproducibility of your simulations. This means that the commands available at
job start will be the default in the system, without any software modules loaded
and without those commands from software installed in the user directory (i.e.
the $PATH
environment variable is reset).
Note
If you look for information about Slurm elsewhere, be aware that by default Slurm copies the environment from the submission shell into the job.
We recommend that your job scripts explicitly set the environment from scratch
every time (rather than setting it by default for all your jobs in ~/.bashrc
or ~/.bash_profile
). To minimize the probability of errors, each job should
load the minimum number of software modules to function and also enable the
minimum amount of software in the user directory. This usually boils down to:
Load the needed software modules
Add to the environment (
$PATH
) only the needed software from your user directoryDeclare any other variables used in the job
Users with complex workflows that prefer their jobs to carbon-copy the
environment in the shell from where the job is submitted, can enable this
behaviour by submitting their job with sbatch --export=ALL
. Be aware that
exporting the entire environment can have unexpected and unintended side
effects. Such jobs will only work in the skylake and skylake_mpi partitions,
which share the same hardware as the login nodes.
5.2. How can I request multiple GPUs in the same node?#
Check the section GPU Job Types for a detailed description and examples on how to use multiple GPUs in your jobs.
5.3. Using GPUs in multi-processing jobs#
In Hydra, by default the GPU cards operate in process shared mode, meaning that an unlimited number of processes can use the GPU. However, at any moment in time only a single process can use the GPU. Using Multi-Process Service (MPS), multiple processes can have access to (parts of) the GPU at the same time, which may greatly improve performance.
To use MPS, launch the nvidia-cuda-mps-control
daemon at the beginning of
your job script. The daemon will automatically handle the multiple processes and
coordinate access to the GPUs:
1#!/bin/bash
2#SBATCH --nodes=1
3#SBATCH --gpus-per-node=1
4#SBATCH --ntasks=1
5#SBATCH --cpus-per-task=8
6
7export CUDA_MPS_PIPE_DIRECTORY=$TMPDIR/nvidia-mps-pipe
8export CUDA_MPS_LOG_DIRECTORY=$TMPDIR/nvidia-mps-log
9nvidia-cuda-mps-control -d
10
11<more-commands>
5.4. How to run Python in parallel?#
Python without any additional modules can only use a single core and will not parallelize. However, several Python modules can run in parallel
Popular scientific computing packages such as numpy or scipy will automatically use multi-threading with OpenMP
Some Python applications also support more advanced parallelization methods, such as TensorFlow or PyTorch
You can implement your own parallelization in your scripts with the Python module multiprocessing
For optimal parallel scaling performance of CPU-bound tasks, the general rule of thumb is to assign one CPU core to each process. However, the aforementioned parallelization options in Python operate independently from one another and can produce an explosion of processes in the job if they are combined. Usually this translates to jobs inadvertedly spawning hundreds of processes and tanking performance.
Jobs using Python modules that run in parallel require special attention:
Check all modules used in your Python scripts from beginning to end. Combining multiple modules that run in parallel can result in an explosion of processes. For instance, we frequently see users running Python scripts in parallel with the module
multiprocessing
that under the hood also executenumpy
or PyTorch.Calculate how many processes will be created by your code and check the number of cores requested by your job. It is usually not the same as the total number of cores in the node (unless you specifically request a full node) and you should take into account all parts in your code that run in parallel.
Job scripts using multiprocessing
in Python can control the amount of
generated processes with the following settings:
In your job script, disable OpenMP parallelization to avoid over-parallelizing with
numpy
or any other module that uses OpenMP:1#!/bin/bash 2#SBATCH --ntasks=1 3#SBATCH --cpus-per-task=X 4 5# set number of threads equal to 1 6export OMP_NUM_THREADS=1 7 8<load-modules> 9python <path-to-script.py>
In your Python script, create as many processes with
multiprocessing
as cores allocated to your job:1import os 2import multiprocessing 3 4# obtain number of cores allocated to the job 5ncore = len(os.sched_getaffinity(0)) 6 7# set number of processes equal to number of cores 8with multiprocessing.Pool(processes=ncore) as p: 9 <parallel-code>
5.5. How can I run R in parallel?#
R without any additional packages can only use a single core and will not parallelize. However, there are some options to run R in parallel:
Some R packages such as
stats
anddata.table
will automatically use multi-threading with OpenMPYou can implement your own parallelization in your scripts with the R package doParallel
For optimal parallel scaling performance of CPU-bound tasks, the general rule of thumb is to assign one CPU core to each process. However, the aforementioned parallelization options in R operate independently from one another and can produce an explosion of processes in the job if they are combined. Usually this translates to jobs inadvertedly spawning hundreds of processes and tanking performance.
Jobs using R scripts that run in parallel require special attention:
Check all modules used in your R scripts from beginning to end. Combining multiple modules that run in parallel can result in an explosion of processes.
Calculate how many processes will be created by your code and check the number of cores requested by your job. It is usually not the same as the total number of cores in the node (unless you specifically request a full node) and you should take into account all parts in your code that run in parallel.
Job scripts using doParallel
in R can control the amount of
generated processes with the following settings:
In your job script, disable OpenMP parallelization to avoid over-parallelizing with any other R package that uses OpenMP:
1#!/bin/bash 2#SBATCH --ntasks=1 3#SBATCH --cpus-per-task=X 4 5# set number of threads equal to 1 6export OMP_NUM_THREADS=1 7 8<load-modules> 9Rscript <path-to-script.R>
In your R code, create as many processes with
doParallel
as cores allocated to your job:1library(doParallel) 2 3# obtain number of cores allocated to the job 4ncore = future::availableCores() 5 6# set number of processes equal to number of cores 7registerDoParallel(ncore) 8 9<parallel-code>