Software

Hydra

1. Where can I find installed software?

Most end-user software on Hydra is available via modules. To obtain a full list of installed software, type:

module av

To get a list of available versions of a given software package (for example, Python), type:

module spider Python

To get a list of extensions included in a given software package (for example, Python 3.6), type:

module show Python/3.6.6-foss-2018b

Modules can be loaded as follows (in this example Python 3.6 is loaded):

module load Python/3.6.6-foss-2018b

You can check which modules are currently loaded with:

module list

Unloading all currently loaded modules can be done like this:

module purge

If you need software that is not yet installed, please contact us at hpc@vub.ac.be.

More information on module usage can be found in the VSC docs on using the module system and in the HPC training slides.

2. The toolchain of software packages

The name of a software package such as Python/3.6.6-foss-2018b not only contains the name of the package Python and its version 3.6.6, but also has the term foss-2018b, which is known as the toolchain. The toolchain is the set of tools used to build that package and it has its own version 2018b. Any specific software can be made available with different toolchains. The most common ones are

  • GCCcore: The toolchain based on the GNU Compiler Collection, which includes front ends for C, C++, Objective-C, Fortran, Java, and Ada.

  • GCC: The GCCcore toolchain including libraries provided by binutils (libstdc++, libgcj,…).

  • foss: The Free and open-source software (FOSS) toolchain is based on GCCcore and also includes support for OpenMPI, OpenBLAS, FFTW and ScaLAPACK.

  • intel: The Intel compilers, Intel MPI and Intel Math Kernel Library (MKL).

  • iomkl: The Intel C/C++ and Fortran compilers, Intel MKL & OpenMPI.

  • fosscuda: The foss toolchain with support for CUDA.

Some toolchains offer improved performance on specialised hardware, for instance certain software may be faster with the intel toolchain if used in compute nodes with Intel hardware; whereas other toolchains offer additional features, such as fosscuda which is needed to execute code on GPUs.

The name of software packages may contain after the toolchain specification additional information regarding other dependencies of the package, such as needed interpreters. For instance, the package R-bundle-Bioconductor/3.7-foss-2018b-R-3.5.1 corresponds to Bioconductor version 3.7 built with the foss toolchain 2018b and requires R version 3.5.1.

It is important to only load modules built with a common toolchain (including its version), otherwise conflicts may occur. Hence, R-bundle-Bioconductor/3.7-foss-2018b-R-3.5.1 can be loaded along with other packages build with the foss-2018b toolchain, but not any other. The only exception to this rule are packages built with the GCCcore toolchain, which is compatible with both the foss and intel toolchains. Note that GCCcore is not the same as GCC, which is incompatible with the intel toolchain. The compatibility between toolchains is version dependent, the most recent ones being:

  • GCCcore-8.2.0 is compatible with foss-2019a and intel-2019a

  • GCCcore-7.3.0 is compatible with foss-2018b and intel-2018b

  • GCCcore-6.4.0 is compatible with foss-2018a, foss-2017b, intel-2018a and intel-2017b

  • GCCcore-6.3.0 is compatible with foss-2017a and intel-2017a

If you cannot find a compatible set of modules that provides the software required for your work, please contact us at hpc@vub.ac.be.

3. How can I build/install additional software/packages?

You should first check if the needed software or package is already available on Hydra, either with its own module or as part of another module. If the software or package you need is not available, the HPC team recommends to kindly request its installation at hpc@vub.ac.be. This has several advantages:

  1. The HPC team will optimize the compilation for each CPU architecture present in Hydra, guaranteeing that your software/package runs efficiently on all nodes (and usually much faster than installations made by the users).

  2. Free software will be available to all users of Hydra and licensed software can be made available to specific groups of users.

  3. The package will be built in a reproducible way with EasyBuild: important for scientific reproducibility.

  4. Different versions of the software can be installed alongside each other.

If you still want to install additional software/packages yourself, there are several resources available:

  • Compiling and testing your software on the HPC

    We highly recommend loading a suitable buildenv module before compiling your software. For example:

    module load buildenv/default-foss-2019b
    

    The buildenv module:

    • loads a compiler toolchain, including math and MPI libraries

    • defines compiler flags for optimal performance: CFLAGS, FFLAGS, CXXFLAGS, LIBBLAS, LIBLAPACK, LIBFFT, …

    • defines flags and paths to make sure the build system finds the right libraries: LIBRARY_PATH, LD_LIBRARY_PATH, LDFLAGS, …

    Depending on the build system of your software, if needed you can load additional tools compatible with the buildenv module, such as:

    CMake/3.15.3-GCCcore-8.3.0
    Autotools/20180311-GCCcore-8.3.0
    pkg-config/0.29.2-GCCcore-8.3.0
    Ninja/1.9.0-GCCcore-8.3.0
    Meson/0.51.2-GCCcore-8.3.0-Python-3.7.4
    

    Users compiling their own software should be aware that software compiled on the login nodes may fail in older compute nodes if full hardware optimization is used. The CPU microarchitecture of the login nodes (Skylake) has some instruction sets not available in Hydra’s older compute nodes (Ivy Bridge and Broadwell). Therefore, there are two options to compile your own software

    • Best performance: compile on the login node (with -march=native). The resulting binaries can only run on Skylake nodes, but they offer the best performance on those nodes. Jobs can be restricted to run on Skylake nodes with -l feature=skylake.

    • Best compatibility: compile on any Ivy Bridge node. Login to an Ivy Bridge node with qsub -I -l feature=ivybridge and compile your code on it. The resulting binaries can run on any node on Hydra with good performance. Alternatively, users knowing how to setup the compilation can compile on the login node with -march=ivybridge -mtune=skylake.

    For more information, see the VSC docs on software development.

    Please contact us at hpc@vub.ac.be in case of problems or questions.

  • How to install additional Python packages

    see the VSC docs on Python package management

  • How to install additional Perl packages

    see the VSC docs on Perl package management

  • How to install additional R packages

    see our documentation below or the VSC docs on R package management

5. How can I run MATLAB?

MATLAB is available as a module, however it is not recommended to run intensive MATLAB calculations on Hydra: it’s performance is not optimal and parallel execution is not fully supported.

First check which MATLAB versions are available:

module spider MATLAB

Next load a suitable version, for example (take the most recent version for new projects):

module load MATLAB/2019b

It is possible to run MATLAB in console mode for quick tests. For example, with a MATLAB script called ‘testmatlab.m’, type:

matlab -batch "run('testmatlab.m');"

MATLAB scripts should be submitted to the queue in a job script. Before submitting, however, we highly recommend to first compile your script using the MATLAB compiler mcc (this can be done on the login node):

mcc -m testmatlab.m

This will generate a testmatlab binary file, as well as a ‘run_testmatlab.sh’ shell script (and a few other files). You can ignore the ‘run_testmatlab.sh’ file.

Now you can submit your matlab calculation as a batch job. Your job script should look like this:

#!/bin/bash
#PBS -l walltime=01:00:00
#PBS -l nodes=1:ppn=1

module load MATLAB/2019b

cd $PBS_O_WORKDIR
./testmatlab 2>&1 >testmatlab.out

The advantage of running a compiled matlab binary is that it does not require a license. We have only a limited number of MATLAB licenses that can be used at the same time, so in this way you can run your simulation even if the all licenses are in use.

More information on using the MATLAB compiler can be found here:

https://nl.mathworks.com/help/mps/ml_code/mcc.html

6. How can I use R?

Depending on your needs there are different methods to use R in Hydra

  • Interactive sessions for light workloads can be performed in the login node

    1. Login to Hydra

    2. Load the following R module

      module load R/3.5.1-foss-2018b
      
    3. Start R

      R
      

    Note

    If you need a different version of R, make sure to load one with foss in the name. Those versions are based on the GNU open source toolchain and can be used in the login node.

  • Interactive sessions for heavy workloads must be performed in the compute nodes

    1. Login to Hydra

    2. Start an interactive session in a compute node with the following command

      qsub -I
      
    3. Load your R module of choice (preferably a recent version)

      module load R/3.5.1-intel-2018b
      
    4. Start R

      R
      
  • Scripts written in R can be executed with the command Rscript. A minimal job script for R only requires loading the R module and executing your scripts with Rscript

    #!/bin/bash
    #PBS -l walltime=01:00:00
    #PBS -l nodes=1:ppn=1
    
    module load R/3.5.1-intel-2018b
    
    cd $PBS_O_WORKDIR
    Rscript <path-to-script.R>
    

The quality of the graphics generated by R can be improved by changing the graphical backend to Cairo. Add the following lines in the file ~/.Rprofile to make these changes permanent for your user (create the file ~/.Rprofile if it does not exist)

# Use cairo backend for graphics device
setHook(packageEvent("grDevices", "onLoad"),
    function(...) grDevices::X11.options(type='cairo'))

# Use cairo backend for bitmaps
options(bitmapType='cairo')

7. Packages included in the R library in Hydra

There are already many packages included in the library of R in Hydra. The complete list can be looked up from the shell in Hydra with the following commands

  1. Login to Hydra

  2. List the contents of any R module, for instance R/3.5.1-foss-2018b

    module show R/3.5.1-foss-2018b
    

R packages missing in the library may be provided with their own module. In that case use the module command to search in the repository of Hydra. Please see 1. Where can I find installed software?

Unavailable R packages in Hydra can be requested for installation at our support service. Please send an email to hpc@vub.ac.be and the HPC team will proceed with the installation. See 3. How can I build/install additional software/packages? for more details.

Developers can compile and install R packages in the local R library of their home directory. However, it is important to note that the microarchitecture of Hydra’s nodes changes from one another and needs to be taken into account to test self-compiled R packages in the nodes.

Note

The packages of a local R library can potentially cause errors due to conflicts with the global R library or due to a version change of R after the installation of local R packages. If you experience errors running R scripts that are related to a failed load of a package, it is helpful to check your script in a clean R environment without a local R library.

  1. Remove all modules and load R

    module purge
    module load R/3.5.1-foss-2018b
    
  2. Enter into a clean R environment (not loading previous workspace)

    R --no-restore
    
  3. Inside the shell of R or at the begining of your R script

    .libPaths('')
    <your R code ...>
    

8. How can I run Gaussian jobs?

The available modules for gaussian can be listed with the command:

ml spider Gaussian

We recommend using the module Gaussian/G16.A.03-intel-2017b for general use because its performance has been thoroughly optimized for Hydra. More recent modules, such as Gaussian/G16.B.01, should be used if you need any of their new features.

Gaussian jobs can use significantly more memory than the value specified by %mem in the input file or with g16 -m in the execution command. Therefore, it is recommended to submit Gaussian jobs requesting a total memory that is at least 20% larger than the memory value defined in the calculation.

Gaussian G16 should automatically manage the available resources and parallelization. However, it is known to under-perform in some circumstances and not use all cores allocated to your job. In Hydra, the command myresources will report the actual use of resources of your jobs. If any of your Gaussian calculations is not using all available cores, it is possible to force the total number of cores used by Gaussian G16 with the option g16 -p or by adding the Gaussian directive %nprocshared to the top of the input file.

The following job script is an example to be used for Gaussian calculations running on 1 node with multiple cores. In this case we are running a g16 calculation with 80GB of memory (-m=80GB), but requesting a total of 96GB of memory (20% more). Additionally, we are requesting 20 cores for this job and automatically passing this setting to g16 with the option -p=${PBS_NP}, where ${PBS_NP} is an environment variable that contains the number of cores allocated to your job.

#!/bin/bash
#PBS -l nodes=1:ppn=20
#PBS -l mem=96GB

ml Gaussian/G16.A.03-intel-2017b

cd $PBS_O_WORKDIR
g16 -p=${PBS_NP} -m=80GB < input_file.com > output_file.log

9. How can I use GaussView?

GaussView is a graphical interface used with the computational chemistry program Gaussian. GaussView is installed in Hydra and can be used alongside Gaussian to enable all property visualizations.

  1. Login to Hydra enabling X11 forwarding. Linux and macOS users can do so by adding the option -Y to the ssh command used for login. See below

    ssh -Y <username>@login.hpc.vub.be
    
  2. Load the modules of GaussView

    • GaussView 6 with Gaussian/G16.A.03

      module load GaussView/6.0.16
      
    • GaussView 6 with Gaussian/G16.B.01

      module load Gaussian/G16.B.01
      module load GaussView/6.0.16
      
  3. Launch GaussView

    gview.sh
    

Keep in mind that using a graphical interface in Hydra is currently rather slow. Thus, for regular visualization tasks, the HPC team recommends installing GaussView in your personal computer. Binary packages of GaussView are available for Linux, Mac, and Windows users and are provided upon request.

Installation of GaussView on Mac:

  1. Untar G16 and GaussView to /Applications (Two new dirs, g16 and gv will be created)

  2. Create a file ~/Library/LaunchAgents/env.GAUSS_EXEDIR.plist and paste the following content into it:

    <?xml version="1.0" encoding="UTF-8"?>
    <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
    <plist version="1.0">
    <dict>
    <key>Label</key>
    <string>env.GAUSS_EXEDIR</string>
    <key>ProgramArguments</key>
    <array>
    <string>launchctl</string>
    <string>setenv</string>
    <string>GAUSS_EXEDIR</string>
    <string>/Applications/g16/bsd:/Applications/g16</string>
    </array>
    <key>RunAtLoad</key>
    <true/>
    </dict>
    </plist>
    
  3. Issue the following command (only once) (or, alternatively, restart the machine):

    launchctl load ~/Library/LaunchAgents/env.GAUSS_EXEDIR.plist
    

10. How can I use matplotlib with a graphical interface?

The HPC environment is optimized for the execution of non-ineractive applications in job scripts. Therefore, matplotlib is configured with a non-GUI backend (Agg) that can save the resulting plots in a variety of image file formats. The generated image files can be copied to your own computer for visualization or further editing.

If you need to work interactively with matplotlib and visualize its output from within Hydra, you can do so with the following steps

  1. Login to Hydra enabling X11 forwarding. Linux and macOS users can use the following command that is appropriate for their institution

    ssh -Y username@hydra.hpc.vub.be
    ssh -Y username@hydra.hpc.ulb.be
    
  2. Enable the TkAgg backend at the very beginning of your Python script

    import matplotlib
    matplotlib.use('TkAgg')
    

Note

The function matplotlib.use() must be done before importing matplotlib.pyplot. Changing the backend parameter in your matplotlibrc file will not have any effect as the system-wide configuration takes precedence over it.

11. How can I use CESM?

The dependencies required to run CESM in Hydra are provided by the module CESM-deps. This module also contains the XML configuration files for CESM with the specification of machines, compiler and batch system of Hydra. Once CESM-deps is loaded, the configuration files can be found in ${EBROOTCESMMINDEPS}/machines.

The following steps show an example setup of a CESM case

  1. Login to Hydra

  2. Load the module CESM-deps

    module load CESM-deps/2-foss-2019a
    
  3. Data files for CESM have to be placed in $VSC_SCRATCH/cesm. Users needing data located elsewhere (e.g. in /projects) can create symlinks in their $VSC_SCRATCH to the corresponding locations.

    mkdir $VSC_SCRATCH/cesm
    
    # In case of available data elesewhere
    ln -sf /projects/our_project/cesm $VSC_SCRATCH/cesm
    
  4. Create the following folder structure for your CESM cases in $VSC_SCRATCH

    mkdir ${VSC_SCRATCH}/cime
    mkdir ${VSC_SCRATCH}/cime/cases
    mkdir ${VSC_SCRATCH}/cime/output
    
  5. The creation of a case follows the usual procedure for CESM. Just remember to always copy the configuration files for Hydra found in ${EBROOTCESMMINDEPS}/machines to the source code of CESM and create your case with ./create_newcase --machine hydra

    cd ${VSC_SCRATCH}/cime
    git clone -b release-cesm2.1.1 https://github.com/ESCOMP/cesm.git cesm-2.1.1
    
    cd ${VSC_SCRATCH}/cime/cesm-2.1.1
    ./manage_externals/checkout_externals
    cp ${EBROOTCESMMINDEPS}/machines/2.1/config_{machines,compilers,batch}.xml cime/config/cesm/machines/
    
    cd ${VSC_SCRATCH}/cime/cesm-2.1.1/cime/scripts
    ./create_newcase --machine hydra --case ${VSC_SCRATCH}/cime/cases/name_of_case --res f19_g17 --compset I2000Clm50BgcCrop
    
  6. Your CESM case can now be setup, built and executed. This is done in Hydra with a single job script called case.job. This job script sets the CESM environment, configures the case, compiles it and runs the simulation in one go, minimizing wait times in the queue. Copy the template available in ${EBROOTCESMMINDEPS}/scripts/case.job to the directory of your case. You can modify case.job with any specific settings for your case, through xmlchange commands or any other means. Once the script is adapted to your needs, submit it to the queue with qsub as usual

    cp ${EBROOTCESMMINDEPS}/scripts/case.job ${VSC_SCRATCH}/cime/cases/name_of_case/
    
    # Edit case.job if needed
    $EDITOR case.job
    
    # Adjust requested resources as needed
    qsub -l nodes=2:ppn=20 -l walltime=24:00:00 case.job
    

Note

If your case is already built and you only want to submit it to the queue, it is still strongly recommended to use the case.job script. Just remove the commands ./case.setup and ./case.build from case.job.

The module CESM-tools provides a set of tools commonly used to analyse and visualize CESM data. Nonetheless, CESM-tools cannot be loaded at the same time as CESM-deps because their packages have incompatible dependencies. Once you obtain the results of your case, unload any modules with module purge and load CESM-tools/2-foss-2019a to post-process the data of your case.

12. How can I use GAP?

The GAP shell has a strong focus on being used interactively, whereas on Hydra the preferred way to run calculations is by submitting job scripts. Nonetheless, it is possible to use the interactive shell of GAP in our compute nodes with the following steps

  1. Request an interactive job session and wait for it to be allocated:

    $ qsub -I -l nodes=1:ppn=4 -l walltime=3:00:00
    qsub: waiting for job 3036036.master01.hydra.brussel.vsc to start
    qsub: job 3036036.master01.hydra.brussel.vsc ready
    vsc10xxx@node361 ~ $
    
  2. Load the module of GAP and start its shell as usual

    vsc10xxx@node361 ~ $ module load gap/4.11.0-foss-2019a
    vsc10xxx@node361 ~ $ gap
    ********* GAP 4.11.0 of 29-Feb-2020
    * GAP * https://www.gap-system.org
    ********* Architecture: x86_64-pc-linux-gnu-default64-kv7
    [...]
    gap>
    

Submitting a job script using GAP is also possible and requires preparing two scripts. One is the usual job script to be submitted to the queue and the second one is the script with the commands for GAP.

  • The job script is a standard job script requesting the resources needed by your calculation, loading the required modules and executing the script with the code for GAP. Example

    #!/bin/bash
    #PBS -l walltime=00:60:00
    #PBS -l nodes=1:ppn=4
    
    module load gap/4.11.0-foss-2019a-modisomTob
    
    cd $PBS_O_WORKDIR
    ./gap-script.sh
    
  • The script gap-script.sh is a shell script that executes GAP and passes your code to it. It is necessary to execute GAP with the -A option and only load the required GAP packages at the beginning of your script to avoid issues. For example:

    #!/bin/bash
    gap -A -r -b -q << EOI
    LoadPackage( "Example" );
    2+2;
    EOI
    

Note

Keep in mind to make gap-script.sh executable with the command chmod +x gap-script.sh

13. How can I use Mathematica?

First check which Mathematica versions are available:

module spider Mathematica

Next load a suitable version, for example (take the most recent version for new projects):

module load Mathematica/12.0.0

Running Mathematica in console mode in the terminal for quick tests:

wolframscript

Mathematica scripts (Wolfram Language Scripts) should be submitted to the queue in a job script. In the following example, we run the Mathematica script testmath.wls:

#!/bin/bash
#PBS -l walltime=01:00:00
#PBS -l nodes=1:ppn=1

module load Mathematica/12.0.0

cd $PBS_O_WORKDIR
wolframscript -file testmath.wls

Note

Mathematica code is not optimized for performance. However, it supports several levels of interfacing to C/C++. For example, you can speed up your compute intensive functions by compiling them with a C compiler from inside your Mathematica script.

14. How can I use Stata?

First check which Stata versions are available:

module spider Stata

Next load a suitable version, for example (take the most recent version for new projects):

module load Stata/16-legacy

Running Stata in console mode in the terminal for quick tests:

stata

Stata do-files should be submitted to the queue in a job script. In the following example, we run the Stata program teststata.do:

#!/bin/bash
#PBS -l walltime=01:00:00
#PBS -l nodes=1:ppn=1

module load Stata/16-legacy

cd $PBS_O_WORKDIR
stata-se -b do teststata

Upon execution, Stata will by default write its output to the log file teststata.log.

Note

The recommended version of Stata in batch mode is stata-se, because it can handle the larger datasets.

Vega

(TODO)

VSC Tier-1

(TODO)