4. Installing additional software#

Before you start building/installing your own software/packages, you should first check if the software is already available in Hydra, either with its own module or as part of another module. If the software package you need is not available, we strongly recommend to request its installation to VUB-HPC Support. Installations carried out by VUB-HPC have several advantages:

  1. The HPC team will optimize the compilation for each CPU architecture present in Hydra, guaranteeing that your software/package runs efficiently on all nodes (and usually much faster than installations made by the users).

  2. Free software will be available to all users of Hydra and licensed software can be made available to specific groups of users.

  3. The package will be built in a reproducible way with EasyBuild: important for scientific reproducibility.

  4. Different versions of the software can be installed alongside each other.

If you still want to install additional software/packages yourself, you can find guidance for specific development environments in the sections below.

4.1. Compiling and testing your software on the HPC#

We strongly recommend to use a suitable buildenv module to compile software in Hydra. A buildenv module loads pre-defined collections of build tools and compilers, ensuring that you work in a controlled and reproducible environment.

Example loading a build environment for toolchain foss/2023a#
module load buildenv/default-foss-2023a

The buildenv module:

  • loads the compiler and any math and/or MPI libraries that may be included in the respective toolchain

  • defines compiler flags for optimal performance: CFLAGS, FFLAGS, CXXFLAGS, LIBBLAS, LIBLAPACK, LIBFFT, …

  • defines flags and paths to make sure the build system finds the right libraries: LIBRARY_PATH, LD_LIBRARY_PATH, LDFLAGS, …

If needed you can load additional development tools compatible with the buildenv module. For instance, the following modules can be loaded alongside buildenv/default-foss-2023a:

  • CMake/3.26.3-GCCcore-12.3.0

  • Autotools/20220317-GCCcore-12.3.0

  • pkgconf/1.9.5-GCCcore-12.3.0

  • Ninja/1.11.1-GCCcore-12.3.0

  • Meson/1.1.1-GCCcore-12.3.0

Users compiling their own software should be aware that software compiled on the login nodes may fail in older compute nodes if full hardware optimization is used. The CPU microarchitecture of the login nodes (Skylake) has some instruction sets not available in Hydra’s older compute nodes (e.g. Broadwell). Therefore, there are two options to compile your own software

Best performance

Compile on the login node (with -march=native). The resulting binaries can only run on Skylake nodes, but they offer the best performance on those nodes. Jobs can be restricted to run on Skylake nodes with the Slurm option --partition=skylake.

Best compatibility

Compile on any Broadwell node. Login to a Broadwell node with

srun --partition=broadwell --pty bash -l

and compile your code on it. The resulting binaries can run on any node on Hydra with decent performance. Alternatively, users knowing how to setup the compilation can compile on the login node with -march=broadwell -mtune=skylake.

Helpdesk The environment in the HPC might differ significantly from your development system. We can help you in case of problems or questions to transfer your development to the HPC.

See also

VSCdocSoftware development for more information.

4.2. Installing additional Python packages#

Helpdesk There is a large number of Python packages already available in Hydra, see the question How can I find specific Python or R packages? If the package you need is not available, we can install it for you.

  • If you would like to test some new Python package before requesting its installation, you can do so by using your personal site-packages directory in your home:

    1. Load the appropriate Python module, i.e. the specific Python version to use with the package:

      module load Python/3.11.3-GCCcore-12.3.0
      
    2. Install the Python package with pip in your user account. The following command will also download and install all required dependencies. The new package and all missing dependencies will be installed by default in ~/.local/lib/pythonX.Y/site-packages:

      pip install --user <new_python_package>
      
  • Developers, who require using/testing software in Python that is in active development, can also use pip to install their own packages in a personal site-packages directory:

    1. Load the appropriate Python module and install the Python package from the local directory containing the source code. In this case pip will also install any missing dependencies:

      module load Python/3.11.3-GCCcore-12.3.0
      pip install --user /path/to/your/source/code
      
    2. Optional Updating the installation can be done at any time with the command:

      pip install --user --no-deps --ignore-installed /path/to/your/source/code
      

4.3. Python virtual environments#

A virtual environment is an isolated Python environment in which you can safely install Python packages, independent from those installed in the system or in other virtual environments. For Python developers, using virtual environments is very convenient as it allows working on multiple software projects at the same time. On the other hand, as explained in section Additional Software, it is highly recommended to use the software modules as much as possible.

In this section, we show how you can combine modules with virtual environments in the HPC to get the best of two worlds.

  1. Select the cluster partition that you want to use with the new virtual environment and start an interactive shell in it

    Warning

    Virtual environments are tied to the cluster partition used for their creation. The login nodes of Hydra can be used to create virtual environments that will run on the skylake partitions of the cluster.

    Example command to start an interactive shell in the broadwell partition#
    srun --partition=broadwell --pty bash -l
    
  2. Load a Python module as base of the virtual environment. Choose a Python version that is suitable for the additional Python packages that will be installed in the virtual environment:

    module load Python/3.11.3-GCCcore-12.3.0
    
  3. Optional Load modules with additional Python packages

    Python modules in the HPC include a limited list of Python packages, but many other modules are also available. A common module is SciPy-bundle, a bundle of data science packages such as numpy, pandas, and scipy:

    module load SciPy-bundle/2023.07-gfbf-2023a
    
  4. Create a virtual environment with virtualenv

    Example command to create a new virtual environment in the directory myenv#
    python -m venv myenv --system-site-packages
    

    Option --system-site-packages ensures using the Python packages already available via the loaded modules instead of installing them in the virtual environment.

  5. Before we can use the virtual environment, we must activate it

    Once the virtual environment is active, its name will be displayed in front of the prompt#
    $ source myenv/bin/activate
    (myenv) $
    
  6. We recommend to always upgrade pip to the latest version:

    (myenv) $ python -m pip install pip --upgrade
    
  7. Now we can install additional Python packages, or different versions of available packages, in the the virtual environment.

    Example command to install a version of the requests package that is different from the version included in the Python module#
    (myenv) $ python -m pip install requests==2.27.1 --no-cache-dir --no-build-isolation
    

    Option --no-cache-dir ensures installing the most recent compatible versions of the dependencies, ignoring the versions available in your cache.

    Option --no-build-isolation ensures using the available Cython compiler and other (build) dependencies instead of building in an isolated environment.

  8. Once you finish your work in the virtual environment, use the command deactivate to exit it

    The command deactivate will bring you to the standard shell#
    (myenv) $ deactivate
    $
    

Whenever you want to go back to any of your virtual environments make sure to:

  1. Load the same software modules that you used in the creation of the virtual environment:

    module load Python/3.11.3-GCCcore-12.3.0 SciPy-bundle/2023.07-gfbf-2023a
    
  2. Reactivate the virtual environment:

    $ source myenv/bin/activate
    (myenv) $
    

4.4. Installing additional R packages#

Developers can compile and install R packages in the local R library of their home directory. The R function install.packages() will specifically ask to use your personal library. Keep in mind that if your software requires code compilation beyond R, you might need a build environment as described in Compiling and testing your software on the HPC.

Handling a personal R library in Hydra can be tricky though, it can easily break the rest of R packages provided by software modules (i.e the R-bundle-CRAN module). This can be due to conflicts with the global R library, issues with the multiple CPU micro-architectures in the compute nodes or due to a version change of R after the installation of local R packages.

If you experience errors running R scripts that are related to a failed load of a package, it is helpful to check your script in a clean R environment without your personal R library:

  1. Remove all modules and load the desired version of R:

    module purge
    module load R/4.3.2-gfbf-2023a
    module load R-bundle-CRAN/2023.12-foss-2023a
    
  2. Disable the R library in your home directory:

    export R_LIBS_USER=''
    
  3. Enter into a clean R environment (not loading previous workspace):

    R --no-restore
    

Note

You can check the paths where R will look for the requested packages (i.e. after a call to library()) with the function .libPaths(). The paths at the beginning of the list have precedence over the rest.

4.5. Installing additional Perl packages#

See VSCdocPerl package management

4.6. Installing additional Julia packages#

The Julia installations loaded by the Julia software modules in our HPC clusters provide a base installation of Julia that allows installing your own packages on top of it.

Recommended Software installations of Julia packages can be heavy. Installing many packages or a single package that pulls many dependencies can quickly fill the storage quota of your home directory. We recommend to move your personal depot to your $VSC_SCRATCH and link ~/.julia to it.

Move personal Julia depot to scratch storage#
$ mkdir -p ~/.julia
$ mv ~/.julia $VSC_SCRATCH/julia
$ ln -s $VSC_SCRATCH/julia ~/.julia

Loading a Julia software module will enable the julia command to launch the Julia shell or execute commands in Julia. Your personal depot located in your home directory under ~/.julia or any custom project environments in your account will continue to be usable, as long as they were created with the same major and minor version of Julia as the loaded module. For instance, if you load Julia/1.9.2-linux-x86_64, all project environments for 1.9 will be usable.

You can use the Pkg.add() command as usual to install additional packages in Julia. New packages will be installed in the active project environment, which by default is the shared environment in your personal depot (~/.julia).

Installation of Julia package CSV in personal depot#
$ module load Julia/1.9.2-linux-x86_64
$ julia -e 'using Pkg; Pkg.add("CSV")'
  Installing known registries into `~/.julia`
   Updating registry at `~/.julia/registries/General.toml`
  Resolving package versions...
  Installed DataValueInterfaces ───────── v1.0.0
  Installed Parsers ───────────────────── v2.8.1
  [...]
  Installed CSV ───────────────────────── v0.10.13
   Updating /rhea/scratch/brussel/101/vsc10122/julia/environments/v1.9/Project.toml
  [336ed68f] + CSV v0.10.13
   Updating /rhea/scratch/brussel/101/vsc10122/julia/environments/v1.9/Manifest.toml
  [336ed68f] + CSV v0.10.13
  [...]
Precompiling project...
  22 dependencies successfully precompiled in 50 seconds. 1 already precompiled.
$ julia -e 'using Pkg; Pkg.status()'
Status /rhea/scratch/brussel/101/vsc10122/julia/environments/v1.9/Project.toml
  [336ed68f] CSV v0.10.13

You will also find software modules in the HPC that provide additional Julia packages. Those modules can bundle one or more Julia packages and will automatically load any dependencies needed for their correct function, including a Julia base module. Once you load a software module with Julia packages, they will become usable through the Julia command using, as usual. Moreover, your own Julia packages installed in your depot will continue to be usable as well with using.

Loading Julia package Circuitscape through software modules#
$ module load Circuitscape/5.12.3-Julia-1.9.2
$ julia -e 'using Circuitscape'
$ julia -e 'using CSV'

Warning Julia packages loaded through software modules like Circuitscape/5.12.3-Julia-1.9.2 will be usable in Julia, but they will not be part of your project environment. This means that you are free to install your own versions of any of the Julia packages provided by loaded software modules.

For instance, one of the Julia packages in Circuitscape/5.12.3-Julia-1.9.2 is AlgebraicMultigrid v0.5.1. We can upgrade that specific package to a newer version if needed:

Upgrading Graphs in Circuitscape software module#
$ module help Circuitscape/5.12.3-Julia-1.9.2
[...]
Included extensions
===================
AbstractFFTs-1.2.1, Adapt-3.4.0, AlgebraicMultigrid-0.5.1, ArchGDAL-0.8.5,
ArnoldiMethod-0.2.0, Arrow_jll-10.0.0+1, boost_jll-1.76.0+1,
[...]
$ module load Circuitscape/5.12.3-Julia-1.9.2
$ julia -e 'using Pkg; Pkg.add("AlgebraicMultigrid")'
[...]
  Updating `/rhea/scratch/brussel/101/vsc10122/julia/environments/v1.9/Project.toml`
[2169fc97] + AlgebraicMultigrid v0.6.0
  Updating `/rhea/scratch/brussel/101/vsc10122/julia/environments/v1.9/Manifest.toml`
[...]
65 dependencies successfully precompiled in 110 seconds. 28 already precompiled
$ julia -e 'using Circuitscape'

On the other hand, you can also use all the packages provided by some software module as the base for a new custom project environment. In such a case, you have to manually create the new project environment by copying the environment of the loaded software module.

Creating new Julia environment based on a loaded software module#
$ module load Circuitscape/5.12.3-Julia-1.9.2
$ base_project=$(julia -E 'Base.load_path()[end]')
$ cp -r "$(dirname ${base_project:1:-1})" myNewEnv
$ julia -e 'using Pkg; Pkg.activate("myNewEnv"); Pkg.status()'
  Activating project at /vscmnt/brussel_pixiu_data/_data_brussel/vo/000/bvo00005/vsc10122/tests/julia/myNewEnv
  Status /vscmnt/brussel_pixiu_data/_data_brussel/vo/000/bvo00005/vsc10122/tests/julia/myNewEnv/Project.toml
  [...]
  [2b7a1792] Circuitscape v5.12.3 /apps/brussel/RL8/skylake-ib/software/Circuitscape/5.12.3-Julia-1.9.2/packages/Circuitscape
  [...]

Once this new environment is active, installations of new Julia packages will take into account the packages already provided by the loaded modules. For instance, the following example installs CSV on top of the packages provided by Circuitscape/5.12.3-Julia-1.9.2, which results in only 8 new packages installed, compared to the 22 installed on an empty environment.

Installation of Julia package CSV in custom environment on top of Circuitscape module#
$ module load Circuitscape/5.12.3-Julia-1.9.2
$ julia -e 'using Pkg; Pkg.activate("myNewEnv"); Pkg.add("CSV")'
  Activating project at /vscmnt/brussel_pixiu_data/_data_brussel/vo/000/bvo00005/vsc10122/tests/julia/myNewEnv
   Resolving package versions...
    Updating /vscmnt/brussel_pixiu_data/_data_brussel/vo/000/bvo00005/vsc10122/tests/julia/myNewEnv/Project.toml
  [336ed68f] + CSV v0.10.13
    Updating /vscmnt/brussel_pixiu_data/_data_brussel/vo/000/bvo00005/vsc10122/tests/julia/myNewEnv/Manifest.toml
  [336ed68f] + CSV v0.10.13
  [...]
Precompiling project...
  7 dependencies successfully precompiled in 53 seconds. 100 already precompiled.