Software

Hydra

1. Where can I find installed software?

Most end-user software on Hydra is available via modules. To obtain a full list of installed software, type:

module av

To get a list of available versions of a given software package (for example, Python), type:

module spider Python

To get a list of extensions included in a given software package (for example, Python 3.6), type:

module show Python/3.6.6-foss-2018b

Modules can be loaded as follows (in this example Python 3.6 is loaded):

module load Python/3.6.6-foss-2018b

You can check which modules are currently loaded with:

module list

Unloading all currently loaded modules can be done like this:

module purge

If you need software that is not yet installed, please contact us at hpc@vub.ac.be.

More information on module usage can be found in the VSC docs on using the module system and in the HPC training slides.

2. The toolchain of software packages

The name of a software package such as Python/3.6.6-foss-2018b not only contains the name of the package Python and its version 3.6.6, but also has the term foss-2018b, which is known as the toolchain. The toolchain is the set of tools used to build that package and it has its own version 2018b. Any specific software can be made available with different toolchains. The most common ones are

  • GCCcore: The toolchain based on the GNU Compiler Collection, which includes front ends for C, C++, Objective-C, Fortran, Java, and Ada.

  • GCC: The GCCcore toolchain including libraries provided by binutils (libstdc++, libgcj,…).

  • foss: The Free and open-source software (FOSS) toolchain is based on GCCcore and also includes support for OpenMPI, OpenBLAS, FFTW and ScaLAPACK.

  • intel: The Intel compilers, Intel MPI and Intel Math Kernel Library (MKL).

  • iomkl: The Intel C/C++ and Fortran compilers, Intel MKL & OpenMPI.

  • fosscuda: The foss toolchain with support for CUDA.

Some toolchains offer improved performance on specialised hardware, for instance certain software may be faster with the intel toolchain if used in compute nodes with Intel hardware; whereas other toolchains offer additional features, such as fosscuda which is needed to execute code on GPUs.

The name of software packages may contain after the toolchain specification additional information regarding other dependencies of the package, such as needed interpreters. For instance, the package R-bundle-Bioconductor/3.7-foss-2018b-R-3.5.1 corresponds to Bioconductor version 3.7 built with the foss toolchain 2018b and requires R version 3.5.1.

It is important to only load modules built with a common toolchain (including its version), otherwise conflicts may occur. Hence, R-bundle-Bioconductor/3.7-foss-2018b-R-3.5.1 can be loaded along with other packages build with the foss-2018b toolchain, but not any other. The only exception to this rule are packages built with the GCCcore toolchain, which is compatible with both the foss and intel toolchains. Note that GCCcore is not the same as GCC, which is incompatible with the intel toolchain. The compatibility between toolchains is version dependent, the most recent ones being:

  • GCCcore-8.2.0 is compatible with foss-2019a and intel-2019a

  • GCCcore-7.3.0 is compatible with foss-2018b and intel-2018b

  • GCCcore-6.4.0 is compatible with foss-2018a, foss-2017b, intel-2018a and intel-2017b

  • GCCcore-6.3.0 is compatible with foss-2017a and intel-2017a

If you cannot find a compatible set of modules that provides the software required for your work, please contact us at hpc@vub.ac.be.

3. How can I build/install additional software/packages?

You should first check if the needed software or package is already available on Hydra, either with its own module or as part of another module. If the software or package you need is not available, the HPC team recommends to kindly request its installation at hpc@vub.ac.be. This has several advantages:

  1. The HPC team will optimize the compilation for each CPU architecture present in Hydra, guaranteeing that your software/package runs efficiently on all nodes (and usually much faster than installations made by the users).

  2. Free software will be available to all users of Hydra and licensed software can be made available to specific groups of users.

  3. The package will be built in a reproducible way with EasyBuild: important for scientific reproducibility.

  4. Different versions of the software can be installed alongside each other.

If you still want to install additional software/packages yourself, there are several resources available:

  • Compiling and testing your software on the HPC

    We highly recommend loading a suitable buildenv module before compiling your software. For example:

    module load buildenv/default-foss-2019b
    

    The buildenv module:

    • loads a compiler toolchain, including math and MPI libraries

    • defines compiler flags for optimal performance: CFLAGS, FFLAGS, CXXFLAGS, LIBBLAS, LIBLAPACK, LIBFFT, …

    • defines flags and paths to make sure the build system finds the right libraries: LIBRARY_PATH, LD_LIBRARY_PATH, LDFLAGS, …

    Depending on the build system of your software, if needed you can load additional tools compatible with the buildenv module, such as:

    CMake/3.15.3-GCCcore-8.3.0
    Autotools/20180311-GCCcore-8.3.0
    pkg-config/0.29.2-GCCcore-8.3.0
    Ninja/1.9.0-GCCcore-8.3.0
    Meson/0.51.2-GCCcore-8.3.0-Python-3.7.4
    

    Users compiling their own software should be aware that software compiled on the login nodes may fail in older compute nodes if full hardware optimization is used. The CPU microarchitecture of the login nodes (Skylake) has some instruction sets not available in Hydra’s older compute nodes (Ivy Bridge and Broadwell). Therefore, there are two options to compile your own software

    • Best performance: compile on the login node (with -march=native). The resulting binaries can only run on Skylake nodes, but they offer the best performance on those nodes. Jobs can be restricted to run on Skylake nodes with -l feature=skylake.

    • Best compatibility: compile on any Ivy Bridge node. Login to an Ivy Bridge node with qsub -I -l feature=ivybridge and compile your code on it. The resulting binaries can run on any node on Hydra with good performance. Alternatively, users knowing how to setup the compilation can compile on the login node with -march=ivybridge -mtune=skylake.

    For more information, see the VSC docs on software development.

    Please contact us at hpc@vub.ac.be in case of problems or questions.

  • How to install additional Python packages

    see the VSC docs on Python package management

  • How to install additional Perl packages

    see the VSC docs on Perl package management

  • How to install additional R packages

    see our documentation below or the VSC docs on R package management

5. How can I run MATLAB?

MATLAB is available as a module, however it is not recommended to run intensive MATLAB calculations on Hydra: it’s performance is not optimal and parallel execution is not fully supported.

First check which MATLAB versions are available:

module spider MATLAB

Next load a suitable version, for example (take the most recent version for new projects):

module load MATLAB/2019b

It is possible to run MATLAB in console mode for quick tests. For example, with a MATLAB script called ‘testmatlab.m’, type:

matlab -batch "run('testmatlab.m');"

MATLAB scripts should be submitted to the queue in a job script. Before submitting, however, we highly recommend to first compile your script using the MATLAB compiler mcc (this can be done on the login node):

mcc -m testmatlab.m

This will generate a testmatlab binary file, as well as a ‘run_testmatlab.sh’ shell script (and a few other files). You can ignore the ‘run_testmatlab.sh’ file.

Now you can submit your matlab calculation as a batch job. Your job script should look like this:

#!/bin/bash
#PBS -l walltime=01:00:00
#PBS -l nodes=1:ppn=1

module load MATLAB/2019b

cd $PBS_O_WORKDIR
./testmatlab 2>&1 >testmatlab.out

The advantage of running a compiled matlab binary is that it does not require a license. We have only a limited number of MATLAB licenses that can be used at the same time, so in this way you can run your simulation even if the all licenses are in use.

More information on using the MATLAB compiler can be found here:

https://nl.mathworks.com/help/mps/ml_code/mcc.html

6. How can I use R?

Depending on your needs there are different methods to use R in Hydra

  • Interactive sessions for light workloads can be performed in the login node

    1. Login to Hydra

    2. Load the following R module:

      module load R/3.5.1-foss-2018b
      
    3. Start R:

      R
      

    Note

    If you need a different version of R, make sure to load one with foss in the name. Those versions are based on the GNU open source toolchain and can be used in the login node.

  • Interactive sessions for heavy workloads must be performed in the compute nodes

    1. Login to Hydra

    2. Start an interactive session in a compute node with the following command:

      qsub -I
      
    3. Load your R module of choice (preferably a recent version):

      module load R/3.5.1-intel-2018b
      
    4. Start R:

      R
      
  • Scripts written in R can be executed with the command Rscript. A minimal job script for R only requires loading the R module and executing your scripts with Rscript:

    #!/bin/bash
    #PBS -l walltime=01:00:00
    #PBS -l nodes=1:ppn=1
    
    module load R/3.5.1-intel-2018b
    
    cd $PBS_O_WORKDIR
    Rscript <path-to-script.R>
    

The quality of the graphics generated by R can be improved by changing the graphical backend to Cairo. Add the following lines in the file ~/.Rprofile to make these changes permanent for your user (create the file ~/.Rprofile if it does not exist):

# Use cairo backend for graphics device
setHook(packageEvent("grDevices", "onLoad"),
    function(...) grDevices::X11.options(type='cairo'))

# Use cairo backend for bitmaps
options(bitmapType='cairo')

7. Packages included in the R library in Hydra

There are already many packages included in the library of R in Hydra. The complete list can be looked up from the shell in Hydra with the following commands

  1. Login to Hydra

  2. List the contents of any R module, for instance R/3.5.1-foss-2018b:

    module show R/3.5.1-foss-2018b
    

R packages missing in the library may be provided with their own module. In that case use the module command to search in the repository of Hydra. Please see 1. Where can I find installed software?

Unavailable R packages in Hydra can be requested for installation at our support service. Please send an email to hpc@vub.ac.be and the HPC team will proceed with the installation. See 3. How can I build/install additional software/packages? for more details.

Developers can compile and install R packages in the local R library of their home directory. However, it is important to note that the microarchitecture of Hydra’s nodes changes from one another and needs to be taken into account to test self-compiled R packages in the nodes.

Note

The packages of a local R library can potentially cause errors due to conflicts with the global R library or due to a version change of R after the installation of local R packages. If you experience errors running R scripts that are related to a failed load of a package, it is helpful to check your script in a clean R environment without a local R library.

  1. Remove all modules and load R:

    module purge
    module load R/3.5.1-foss-2018b
    
  2. Enter into a clean R environment (not loading previous workspace):

    R --no-restore
    
  3. Inside the shell of R or at the begining of your R script:

    .libPaths('')
    <your R code ...>
    

8. How can I run Gaussian jobs?

The available modules for gaussian can be listed with the command:

ml spider Gaussian

We recommend using the module Gaussian/G16.A.03-intel-2017b for general use because its performance has been thoroughly optimized for Hydra. More recent modules, such as Gaussian/G16.B.01, should be used if you need any of their new features.

Gaussian jobs can use significantly more memory than the value specified by %mem in the input file or with g16 -m in the execution command. Therefore, it is recommended to submit Gaussian jobs requesting a total memory that is at least 20% larger than the memory value defined in the calculation.

Gaussian G16 should automatically manage the available resources and parallelization. However, it is known to under-perform in some circumstances and not use all cores allocated to your job. In Hydra, the command myresources will report the actual use of resources of your jobs. If any of your Gaussian calculations is not using all available cores, it is possible to force the total number of cores used by Gaussian G16 with the option g16 -p or by adding the Gaussian directive %nprocshared to the top of the input file.

The following job script is an example to be used for Gaussian calculations running on 1 node with multiple cores. In this case we are running a g16 calculation with 80GB of memory (-m=80GB), but requesting a total of 96GB of memory (20% more). Additionally, we are requesting 20 cores for this job and automatically passing this setting to g16 with the option -p=${PBS_NP}, where ${PBS_NP} is an environment variable that contains the number of cores allocated to your job.:

#!/bin/bash
#PBS -l nodes=1:ppn=20
#PBS -l mem=96GB

ml Gaussian/G16.A.03-intel-2017b

cd $PBS_O_WORKDIR
g16 -p=${PBS_NP} -m=80GB < input_file.com > output_file.log

9. How can I use GaussView?

GaussView is a graphical interface used with the computational chemistry program Gaussian. GaussView is installed in Hydra and can be used alongside Gaussian to enable all property visualizations.

  1. Login to Hydra enabling X11 forwarding. Linux and macOS users can do so by adding the option -Y to the ssh command used for login. See below:

    ssh -Y <username>@login.hpc.vub.be
    
  2. Load the modules of GaussView

    • GaussView 6 with Gaussian/G16.A.03:

      module load GaussView/6.0.16
      
    • GaussView 6 with Gaussian/G16.B.01:

      module load Gaussian/G16.B.01
      module load GaussView/6.0.16
      
  3. Launch GaussView:

    gview.sh
    

Keep in mind that using a graphical interface in Hydra is currently rather slow. Thus, for regular visualization tasks, the HPC team recommends installing GaussView in your personal computer. Binary packages of GaussView are available for Linux, Mac, and Windows users and are provided upon request.

Installation of GaussView on Mac:

  1. Untar G16 and GaussView to /Applications (Two new dirs, g16 and gv will be created)

  2. Create a file ~/Library/LaunchAgents/env.GAUSS_EXEDIR.plist and paste the following content into it:

    <?xml version="1.0" encoding="UTF-8"?>
    <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
    <plist version="1.0">
    <dict>
    <key>Label</key>
    <string>env.GAUSS_EXEDIR</string>
    <key>ProgramArguments</key>
    <array>
    <string>launchctl</string>
    <string>setenv</string>
    <string>GAUSS_EXEDIR</string>
    <string>/Applications/g16/bsd:/Applications/g16</string>
    </array>
    <key>RunAtLoad</key>
    <true/>
    </dict>
    </plist>
    
  3. Issue the following command (only once) (or, alternatively, restart the machine):

    launchctl load ~/Library/LaunchAgents/env.GAUSS_EXEDIR.plist
    

10. How can I use matplotlib with a graphical interface?

The HPC environment is optimized for the execution of non-ineractive applications in job scripts. Therefore, matplotlib is configured with a non-GUI backend (Agg) that can save the resulting plots in a variety of image file formats. The generated image files can be copied to your own computer for visualization or further editing.

If you need to work interactively with matplotlib and visualize its output from within Hydra, you can do so with the following steps

  1. Login to Hydra enabling X11 forwarding. Linux and macOS users can use the following command that is appropriate for their institution:

    ssh -Y username@hydra.hpc.vub.be
    ssh -Y username@hydra.hpc.ulb.be
    
  2. Enable the TkAgg backend at the very beginning of your Python script:

    import matplotlib
    matplotlib.use('TkAgg')
    

Note

The function matplotlib.use() must be done before importing matplotlib.pyplot. Changing the backend parameter in your matplotlibrc file will not have any effect as the system-wide configuration takes precedence over it.

11. How can I use CESM/CIME?

The dependencies required to run CESM in Hydra are provided by the module CESM-deps. This module also contains the XML configuration files for CESM with the specification of machines, compiler and batch system of Hydra. Once CESM-deps is loaded, the configuration files can be found in ${EBROOTCESMMINDEPS}/machines.

The following steps show an example setup of a CESM/CIME case

  1. Login to Hydra

  2. Load the module CESM-deps:

    module load CESM-deps/2-foss-2019a
    
  3. Data files for CESM have to be placed in $VSC_SCRATCH/cesm. Users needing data located elsewhere (e.g. in /projects) can create symlinks in their $VSC_SCRATCH to the corresponding location.:

    mkdir $VSC_SCRATCH/cesm
    
    # In case of available data elesewhere
    ln -sf /projects/our_project/cesm $VSC_SCRATCH/cesm
    
  4. Create the following folder structure for your CESM/CIME cases in $VSC_SCRATCH:

    mkdir $VSC_SCRATCH/cime
    mkdir $VSC_SCRATCH/cime/cases
    mkdir $VSC_SCRATCH/cime/output
    
  5. Download the source code of CESM/CIME into $VSC_SCRATCH/cime:

    cd $VSC_SCRATCH/cime
    git clone -b release-cesm2.1.3 https://github.com/ESCOMP/cesm.git cesm-2.1.3
    cd $VSC_SCRATCH/cime/cesm-2.1.3
    ./manage_externals/checkout_externals
    
  6. Add the configuration settings for Hydra and Breniac to your CESM/CIME source code:

    cd $VSC_SCRATCH/cime/cesm-2.1.3
    update-cesm-machines cime/config/cesm/machines/ $EBROOTCESMMINDEPS/machines/ machines compilers batch
    
  7. (Optional) Add support for iRODS. Determine your version of CIME and apply the patches for the closest version in $EBROOTCESMMINDEPS/irods. For instance:

    $ cd $VSC_SCRATCH/cime/cesm-2.1.3
    $ git -C cime/ describe --tags
    cime5.6.32
    $ git apply $EBROOTCESMMINDEPS/irods/cime-5.6.32/*.patch
    
  8. The creation of a case follows the usual procedure for CESM:

    cd $VSC_SCRATCH/cime/cesm-2.1.3/cime/scripts
    ./create_newcase --case $VSC_SCRATCH/cime/cases/name_of_case --res f19_g17 --compset I2000Clm50BgcCrop
    
  9. Your CESM case in $VSC_SCRATCH/cime/cases/name_of_case can now be setup, built and executed. We provide a job script called case.job to automatically perform all these steps in the compute nodes, minimizing wait times in the queue and ensuring that the nodes building and running the case are compatible. You can copy the template of case.job from $EBROOTCESMMINDEPS/scripts/case.job to your case and modify it with any specific settings needed by adding xmlchange or any other commands. Once the script is adapted to your needs, submit it to the queue with qsub as usual:

    cd $VSC_SCRATCH/cime/cases/name_of_case
    cp $EBROOTCESMMINDEPS/scripts/case.job $VSC_SCRATCH/cime/cases/name_of_case/
    
    # Edit case.job if needed
    $EDITOR case.job
    
    # Adjust requested resources as needed
    qsub -l nodes=2:ppn=20 -l walltime=24:00:00 case.job
    

The module CESM-tools provides a set of tools commonly used to analyse and visualize CESM data. Nonetheless, CESM-tools cannot be loaded at the same time as CESM-deps because their packages have incompatible dependencies. Once you obtain the results of your case, unload any modules with module purge and load CESM-tools/2-foss-2019a to post-process the data of your case.

12. How can I use GAP?

The GAP shell has a strong focus on being used interactively, whereas on Hydra the preferred way to run calculations is by submitting job scripts. Nonetheless, it is possible to use the interactive shell of GAP in our compute nodes with the following steps

  1. Request an interactive job session and wait for it to be allocated:

    $ qsub -I -l nodes=1:ppn=4 -l walltime=3:00:00
    qsub: waiting for job 3036036.master01.hydra.brussel.vsc to start
    qsub: job 3036036.master01.hydra.brussel.vsc ready
    vsc10xxx@node361 ~ $
    
  2. Load the module of GAP and start its shell as usual:

    vsc10xxx@node361 ~ $ module load gap/4.11.0-foss-2019a
    vsc10xxx@node361 ~ $ gap
    ********* GAP 4.11.0 of 29-Feb-2020
    * GAP * https://www.gap-system.org
    ********* Architecture: x86_64-pc-linux-gnu-default64-kv7
    [...]
    gap>
    

Submitting a job script using GAP is also possible and requires preparing two scripts. One is the usual job script to be submitted to the queue and the second one is the script with the commands for GAP.

  • The job script is a standard job script requesting the resources needed by your calculation, loading the required modules and executing the script with the code for GAP. Example:

    #!/bin/bash
    #PBS -l walltime=00:60:00
    #PBS -l nodes=1:ppn=4
    
    module load gap/4.11.0-foss-2019a-modisomTob
    
    cd $PBS_O_WORKDIR
    ./gap-script.sh
    
  • The script gap-script.sh is a shell script that executes GAP and passes your code to it. It is necessary to execute GAP with the -A option and only load the required GAP packages at the beginning of your script to avoid issues. For example:

    #!/bin/bash
    gap -A -r -b -q << EOI
    LoadPackage( "Example" );
    2+2;
    EOI
    

Note

Keep in mind to make gap-script.sh executable with the command chmod +x gap-script.sh

13. How can I use Mathematica?

First check which Mathematica versions are available:

module spider Mathematica

Next load a suitable version, for example (take the most recent version for new projects):

module load Mathematica/12.0.0

Running Mathematica in console mode in the terminal for quick tests:

wolframscript

Mathematica scripts (Wolfram Language Scripts) should be submitted to the queue in a job script. In the following example, we run the Mathematica script testmath.wls:

#!/bin/bash
#PBS -l walltime=01:00:00
#PBS -l nodes=1:ppn=1

module load Mathematica/12.0.0

cd $PBS_O_WORKDIR
wolframscript -file testmath.wls

Note

Mathematica code is not optimized for performance. However, it supports several levels of interfacing to C/C++. For example, you can speed up your compute intensive functions by compiling them with a C compiler from inside your Mathematica script.

14. How can I use Stata?

First check which Stata versions are available:

module spider Stata

Next load a suitable version, for example (take the most recent version for new projects):

module load Stata/16-legacy

Running Stata in console mode in the terminal for quick tests:

stata

Stata do-files should be submitted to the queue in a job script. In the following example, we run the Stata program teststata.do:

#!/bin/bash
#PBS -l walltime=01:00:00
#PBS -l nodes=1:ppn=1

module load Stata/16-legacy

cd $PBS_O_WORKDIR
stata-se -b do teststata

Upon execution, Stata will by default write its output to the log file teststata.log.

Note

The recommended version of Stata in batch mode is stata-se, because it can handle the larger datasets.

15. How can I use GROMACS?

To get good parallel performance, GROMACS must be launched differently depending on the requested resources (#nodes, #cores, and #GPUs). In the example job scripts given below, a molecular dynamics simulation is launched with run input file ‘example.tpr’:

  • single-node, multi-core:

    #!/bin/bash
    #PBS -l walltime=01:00:00
    #PBS -l nodes=1:ppn=4
    
    module load GROMACS/2020.1-foss-2020a-Python-3.8.2
    
    cd $PBS_O_WORKDIR
    gmx mdrun -nt $PBS_NP -s example.tpr
    
  • multi-node:

    #!/bin/bash
    #PBS -l walltime=01:00:00
    #PBS -l nodes=2:ppn=4
    
    module load GROMACS/2020.1-foss-2020a-Python-3.8.2
    
    cd $PBS_O_WORKDIR
    mpirun -np $PBS_NP gmx_mpi mdrun -ntomp 1 -s example.tpr
    
  • single-GPU, single-node, multi-core:

    #!/bin/bash
    #PBS -l walltime=01:00:00
    #PBS -l nodes=1:ppn=4:gpus=1
    
    module load GROMACS/2019.3-fosscuda-2019a
    
    cd $PBS_O_WORKDIR
    gmx mdrun -nt $PBS_NP -s example.tpr
    
  • multi-GPU, single-node, multi-core:

    #!/bin/bash
    #PBS -l walltime=01:00:00
    #PBS -l nodes=1:ppn=8:gpus=2
    
    module load GROMACS/2019.3-fosscuda-2019a
    
    cd $PBS_O_WORKDIR
    gmx mdrun -nt $PBS_NP -s example.tpr
    

Note

  • GROMACS supports two threading models, which can be used together:
    • OpenMP threads

    • thread-MPI threads: MPI-based threading model implemented as part of GROMACS, incompatible with process-based MPI models such as OpenMPI

  • There are two variants of the GROMACS executable:
    • gmx: recommended for all single-node jobs, supports both OpenMP threads and thread-MPI threads

    • gmx_mpi: for multi-node jobs: must be used with mpirun, only supports OpenMP threads

  • The number of threads must always be specified, as GROMACS sets it incorrectly on Hydra:
    • gmx: use option -nt to let GROMACS determine optimal numbers of OpenMP threads and thread-MPI threads

    • gmx_mpi: use option -ntomp (not -ntmpi or -nt), and set number of threads equal to 1.

  • When running on 1 or more GPUs, by default GROMACS will:
    • detect the number of available GPUs, create 1 thread-MPI thread for each GPU, and evenly divide the available CPU cores between the GPUs using OpenMP threads. Therefore, ppn should be a multiple of gpus. Always check in the log file that the correct number of GPUs is indeed detected.

    • optimally partition the force field terms between the GPU(s) and the CPU cores, depending on the number of GPUs and CPU cores and their respective performances.

For more info on running GROMACS efficiently, see: http://manual.gromacs.org/documentation/current/user-guide/mdrun-performance.html

16. How can I use CP2K?

To get good parallel performance with CP2K in Hydra, it is important to disable multi-threading. Below is an example job script which runs the CP2K input file ‘example.inp’:

#!/bin/bash
#PBS -l walltime=01:00:00
#PBS -l nodes=1:ppn=4

module load CP2K/6.1-intel-2018a
export OMP_NUM_THREADS=1

cd $PBS_O_WORKDIR
mpirun cp2k.popt -i example.inp -o example.out

Vega

(TODO)

VSC Tier-1

(TODO)