4. Advanced Questions#

4.1. How can I copy the login shell environment to my jobs?#

Jobs in VUB clusters start with a clean environment to guarantee the reproducibility of your simulations. This means that the commands available at job start will be the default in the system, without any software modules loaded and without those commands from software installed in the user directory (i.e. the $PATH environment variable is reset).

Note

If you look for information about Slurm elsewhere, be aware that by default Slurm copies the environment from the submission shell into the job.

We recommend that your job scripts explicitly set the environment from scratch every time (rather than setting it by default for all your jobs in ~/.bashrc or ~/.bash_profile). To minimize the probability of errors, each job should load the minimum number of software modules to function and also enable the minimum amount of software in the user directory. This usually boils down to:

  1. Load the needed software modules

  2. Add to the environment ($PATH) only the needed software from your user directory

  3. Declare any other variables used in the job

Users with complex workflows that prefer their jobs to carbon-copy the environment in the shell from where the job is submitted, can enable this behaviour by submitting their job with sbatch --export=ALL. Be aware that exporting the entire environment can have unexpected and unintended side effects. Such jobs will only work in the skylake and skylake_mpi partitions, which share the same hardware as the login nodes.

4.2. How can I request multiple GPUs in the same node?#

Check the section GPU Job Types for a detailed description and examples on how to use multiple GPUs in your jobs.

4.3. Using GPUs in multi-processing jobs#

In Hydra, by default the GPU cards operate in process shared mode, meaning that an unlimited number of processes can use the GPU. However, at any moment in time only a single process can use the GPU. Using Multi-Process Service (MPS), multiple processes can have access to (parts of) the GPU at the same time, which may greatly improve performance.

To use MPS, launch the nvidia-cuda-mps-control daemon at the beginning of your job script. The daemon will automatically handle the multiple processes and coordinate access to the GPUs:

 1#!/bin/bash
 2#SBATCH --nodes=1
 3#SBATCH --gpus-per-node=1
 4#SBATCH --ntasks=1
 5#SBATCH --cpus-per-task=8
 6
 7export CUDA_MPS_PIPE_DIRECTORY=$TMPDIR/nvidia-mps-pipe
 8export CUDA_MPS_LOG_DIRECTORY=$TMPDIR/nvidia-mps-log
 9nvidia-cuda-mps-control -d
10
11<more-commands>

4.4. How to run Python in parallel?#

Python without any additional modules can only use a single core and will not parallelize. However, several Python modules can run in parallel

  • Popular scientific computing packages such as numpy or scipy will automatically use multi-threading with OpenMP

  • Some Python applications also support more advanced parallelization methods, such as TensorFlow or PyTorch

  • You can implement your own parallelization in your scripts with the Python module multiprocessing

For optimal parallel scaling performance of CPU-bound tasks, the general rule of thumb is to assign one CPU core to each process. However, the aforementioned parallelization options in Python operate independently from one another and can produce an explosion of processes in the job if they are combined. Usually this translates to jobs inadvertedly spawning hundreds of processes and tanking performance.

Jobs using Python modules that run in parallel require special attention:

  • Check all modules used in your Python scripts from beginning to end. Combining multiple modules that run in parallel can result in an explosion of processes. For instance, we frequently see users running Python scripts in parallel with the module multiprocessing that under the hood also execute numpy or PyTorch.

  • Calculate how many processes will be created by your code and check the number of cores requested by your job. It is usually not the same as the total number of cores in the node (unless you specifically request a full node) and you should take into account all parts in your code that run in parallel.

Job scripts using multiprocessing in Python can control the amount of generated processes with the following settings:

  1. In your job script, disable OpenMP parallelization to avoid over-parallelizing with numpy or any other module that uses OpenMP:

    1#!/bin/bash
    2#SBATCH --ntasks=1
    3#SBATCH --cpus-per-task=X
    4
    5# set number of threads equal to 1
    6export OMP_NUM_THREADS=1
    7
    8<load-modules>
    9python <path-to-script.py>
    
  2. In your Python script, create as many processes with multiprocessing as cores allocated to your job:

    1import multiprocessing
    2
    3# obtain number of cores allocated to the job
    4ncore = len(os.sched_getaffinity(0))
    5
    6# set number of processes equal to number of cores
    7with multiprocessing.Pool(processes=ncore) as p:
    8    <parallel-code>
    

4.5. How can I run R in parallel?#

R without any additional packages can only use a single core and will not parallelize. However, there are some options to run R in parallel:

  • Some R packages such as stats and data.table will automatically use multi-threading with OpenMP

  • You can implement your own parallelization in your scripts with the R package doParallel

For optimal parallel scaling performance of CPU-bound tasks, the general rule of thumb is to assign one CPU core to each process. However, the aforementioned parallelization options in R operate independently from one another and can produce an explosion of processes in the job if they are combined. Usually this translates to jobs inadvertedly spawning hundreds of processes and tanking performance.

Jobs using R scripts that run in parallel require special attention:

  • Check all modules used in your R scripts from beginning to end. Combining multiple modules that run in parallel can result in an explosion of processes.

  • Calculate how many processes will be created by your code and check the number of cores requested by your job. It is usually not the same as the total number of cores in the node (unless you specifically request a full node) and you should take into account all parts in your code that run in parallel.

Job scripts using doParallel in R can control the amount of generated processes with the following settings:

  1. In your job script, disable OpenMP parallelization to avoid over-parallelizing with any other R package that uses OpenMP:

    1#!/bin/bash
    2#SBATCH --ntasks=1
    3#SBATCH --cpus-per-task=X
    4
    5# set number of threads equal to 1
    6export OMP_NUM_THREADS=1
    7
    8<load-modules>
    9Rscript <path-to-script.R>
    
  2. In your R code, create as many processes with doParallel as cores allocated to your job:

    1library(doParallel)
    2
    3# obtain number of cores allocated to the job
    4ncore = future::availableCores()
    5
    6# set number of processes equal to number of cores
    7registerDoParallel(ncore)
    8
    9<parallel-code>
    

4.6. How can I share a VO subdirectory with the other VO members?#

Suppose that we have created a directory shared_subdir in $VSC_SCRATCH_VO, and we want to give the other VO members read/write permissions to this subdirectory. This can be done by changing the group of the subdirectory:

cd $VSC_SCRATCH_VO
# replace bvoxxxxx with your VO ID
chown -R :bvoxxxxx shared_subdir
chmod g+s shared_subdir
ls -hld shared_subdir

The chmod g+s command ensures that any new subdirs or files added to shared_subdir will get the same group permissions. If everything went well, the ls -hld command should show your VO in the 4th field:

output#
drwxrwsr-x 266 vsc1xxxx bvoxxxxx 16K Sep  1 15:16 shared_subdir