4. Installing additional software#
Before you start building/installing your own software/packages, you should first check if the software is already available in Hydra, either with its own module or as part of another module. If the software package you need is not available, we strongly recommend to request its installation to VUB-HPC Support. Installations carried out by VUB-HPC have several advantages:
The HPC team optimizes the compilation for each CPU and GPU architecture present in the cluster, guaranteeing that your software runs efficiently on all compute nodes. As explained below, this is difficult to carry out by users due to the heterogeneous architecture of the cluster.
Free software is made available to all users, benefiting the whole community of researchers. On the other hand, licensed software is restricted to the group of users owning the license.
Software packages are built in a systematic way with EasyBuild, which is critical for scientific reproducibility.
Different versions of the same software can be installed alongside each other.
There are still multiple reasons that justify installing additional software by yourself, such as doing your own development, for testing or being tied to very old software. In those case, please follow the guidance for specific development environments in the sections below.
4.1. Compiling and testing your software on the HPC#
You can load so-called buildenv
modules in the cluster. They provide
ready-to-use development environments with a pre-defined collection of
compilers and build tools that match those used in common toolchains (i.e.
foss/2023a
or intel/2022a
). This ensures that you work in a controlled
and reproducible environment.
module load buildenv/default-foss-2023a
The buildenv
module:
loads the compiler and any math and/or MPI libraries that may be included in the respective toolchain
defines compiler and linker flags for optimal performance: CFLAGS, FFLAGS, CXXFLAGS, LDFLAGS, LIBBLAS, LIBLAPACK, LIBFFT, …
defines search paths to make sure the build system finds the right libraries and headers: LIBRARY_PATH, CPATH …
If needed you can load additional development tools compatible with the buildenv
module. For instance, the following modules can be loaded alongside
buildenv/default-foss-2023a
:
CMake/3.26.3-GCCcore-12.3.0
Autotools/20220317-GCCcore-12.3.0
pkgconf/1.9.5-GCCcore-12.3.0
Ninja/1.11.1-GCCcore-12.3.0
Meson/1.1.1-GCCcore-12.3.0
You should follow different compilation and optimization strategies depending on the performance requirements of your code:
- Best performance:
Compile on those nodes that will be used by your job and fully enable the features available in the CPU with the compilation option
-march=native
. The resulting binaries can only run on that specific CPU model, but they offer the best performance on those nodes. Jobs can be restricted to run on a specific partition with the Slurm option--partition
. We recommend to organize your binaries based on their architecture and automatically load the correct one using the environment variables$VSC_ARCH_LOCAL
and$VSC_OS_LOCAL
.- Best compatibility:
Compile on any node (including login nodes) with the compilation option
-march=x86-64-v4
. The resulting binaries can run on any node in Hydra with decent performance.
Helpdesk The environment in the cluster might differ significantly from your development system. We can help you in case of problems or questions to transfer your development to the HPC.
See also
VSCdocSoftware development for more information.
4.2. Installing additional Python packages#
Helpdesk There is a large number of Python packages already available in Hydra, see the question How can I find specific Python or R packages? If the package you need is not available, we can install it for you.
If you would like to test some new Python package before requesting its installation, you can do so by using Python virtual environments as described below.
4.2.1. Python virtual environments#
A Python virtual environment is an isolated environment in which you can safely install Python packages, independent from those installed in the system or in other virtual environments. For instance, using virtual environments is very convenient for Python developers as it allows working on multiple software projects at the same time.
As explained in section Additional Software, it is highly recommended to use the software modules already installed in the cluster as much as possible. They provide a robust and performant base to build your virtual environments.
In this section, we show how you can combine modules with virtual environments in the HPC to get the best of two worlds.
Warning Virtual environments are tied to the cluster partition used for their creation. The login nodes of Hydra can only be used to create virtual environments that will run on the skylake and skylake_mpi partitions of the cluster.
Start by launching an interactive shell in the cluster partition of choice:
srun --partition=zen4 --pty bash -l
Load a Python module as base of the virtual environment. Choose a Python version that is suitable for the additional Python packages that will be installed in the virtual environment:
module load Python/3.11.3-GCCcore-12.3.0
Optional Load other modules with additional Python packages:
module load SciPy-bundle/2023.07-gfbf-2023a
The Python software modules in the HPC include a very limited list of Python packages, but many other modules are also available. A common software module is
SciPy-bundle
, a bundle of data science packages such asnumpy
,pandas
, andscipy
.Create your virtual environment.
Warning Avoid making your virtual environments in your home directory. The storage of your home is very small and can quickly be filled with installation files. Use a folder in your personal
$VSC_SCRATCH
or$VSC_DATA
storage; or in your Virtual Organization if you are part of one.python -m venv myenv --system-site-packages
Option
--system-site-packages
ensures using the Python packages already available via the loaded modules instead of installing them in the virtual environment.Before we can use the virtual environment, we must activate it:
$ source myenv/bin/activate (myenv) $
Once the virtual environment is active, its name will be displayed in front of the shell prompt.
We recommend to upgrade
pip
to the latest version:(myenv) $ python -m pip install pip --upgrade
Now we can install additional Python packages, or different versions of available packages, inside this virtual environment:
(myenv) $ python -m pip install requests==2.27.1 --no-cache-dir --no-build-isolation
Option
--no-cache-dir
ensures installing the most recent compatible versions of the dependencies, ignoring the versions available in your cache.Option
--no-build-isolation
ensures using the Cython compiler and other (build) dependencies from loaded modules instead of building in an isolated environment.Once you finish your work in the virtual environment, use the command
deactivate
to exit it:(myenv) $ deactivate $
Whenever you want to go back to any of your virtual environments make sure to:
Load the same software modules that you used in the creation of the virtual environment:
module load Python/3.11.3-GCCcore-12.3.0 SciPy-bundle/2023.07-gfbf-2023a
Reactivate the virtual environment:
$ source myenv/bin/activate (myenv) $
4.3. Installing additional R packages#
Developers can compile and install R packages in the local R library of their
home directory. The R function install.packages()
will specifically ask to
use your personal library. Keep in mind that if your software requires code
compilation beyond R, you might need a build environment as described in
Compiling and testing your software on the HPC.
Handling a personal R library in Hydra can be tricky though, it can easily break
the rest of R packages provided by software modules (i.e the
R-bundle-CRAN
module). This can be due to conflicts with the global R
library, issues with the multiple CPU micro-architectures in the compute nodes
or due to a version change of R after the installation of local R packages.
If you experience errors running R scripts that are related to a failed load of a package, it is helpful to check your script in a clean R environment without your personal R library:
Remove all modules and load the desired version of R:
module purge module load R/4.3.2-gfbf-2023a module load R-bundle-CRAN/2023.12-foss-2023a
Disable the R library in your home directory:
export R_LIBS_USER=''
Enter into a clean R environment (not loading previous workspace):
R --no-restore
Note
You can check the paths where R will look for the requested packages
(i.e. after a call to library()
) with the function .libPaths()
. The
paths at the beginning of the list have precedence over the rest.
4.4. Installing additional Perl packages#
4.5. Installing additional Julia packages#
The Julia installations loaded by the Julia software modules in our HPC clusters provide a base installation of Julia that allows installing your own packages on top of it.
Recommended Software installations of Julia packages can be heavy. Installing
many packages or a single package that pulls many dependencies can quickly fill
the storage quota of your home directory.
We recommend to move your personal depot to your $VSC_SCRATCH
and link
~/.julia
to it.
$ mkdir -p ~/.julia
$ mv ~/.julia $VSC_SCRATCH/julia
$ ln -s $VSC_SCRATCH/julia ~/.julia
Loading a Julia software module will enable the julia
command to launch the
Julia shell or execute commands in Julia. Your personal depot located in your
home directory under ~/.julia
or any custom project environments in your
account will continue to be usable, as long as they were created with the same
major and minor version of Julia as the loaded module. For instance, if you load
Julia/1.9.2-linux-x86_64
, all project environments for 1.9 will be
usable.
You can use the Pkg.add()
command as usual to install additional packages
in Julia. New packages will be installed in the active project environment,
which by default is the shared environment in your personal depot (~/.julia
).
$ module load Julia/1.9.2-linux-x86_64
$ julia -e 'using Pkg; Pkg.add("CSV")'
Installing known registries into `~/.julia`
Updating registry at `~/.julia/registries/General.toml`
Resolving package versions...
Installed DataValueInterfaces ───────── v1.0.0
Installed Parsers ───────────────────── v2.8.1
[...]
Installed CSV ───────────────────────── v0.10.13
Updating /rhea/scratch/brussel/101/vsc10122/julia/environments/v1.9/Project.toml
[336ed68f] + CSV v0.10.13
Updating /rhea/scratch/brussel/101/vsc10122/julia/environments/v1.9/Manifest.toml
[336ed68f] + CSV v0.10.13
[...]
Precompiling project...
22 dependencies successfully precompiled in 50 seconds. 1 already precompiled.
$ julia -e 'using Pkg; Pkg.status()'
Status /rhea/scratch/brussel/101/vsc10122/julia/environments/v1.9/Project.toml
[336ed68f] CSV v0.10.13
You will also find software modules in the HPC that provide additional Julia
packages. Those modules can bundle one or more Julia packages and will
automatically load any dependencies needed for their correct function,
including a Julia base module. Once you load a software module with Julia
packages, they will become usable through the Julia command using
, as
usual. Moreover, your own Julia packages installed in your depot will
continue to be usable as well with using
.
$ module load Circuitscape/5.12.3-Julia-1.9.2
$ julia -e 'using Circuitscape'
$ julia -e 'using CSV'
Warning Julia packages loaded through software modules like
Circuitscape/5.12.3-Julia-1.9.2
will be usable in Julia, but they will
not be part of your project environment. This means that you are free to
install your own versions of any of the Julia packages provided by loaded
software modules.
For instance, one of the Julia packages in Circuitscape/5.12.3-Julia-1.9.2
is AlgebraicMultigrid v0.5.1. You can upgrade that specific package to a newer
version in your own project environment:
$ module help Circuitscape/5.12.3-Julia-1.9.2
[...]
Included extensions
===================
AbstractFFTs-1.2.1, Adapt-3.4.0, AlgebraicMultigrid-0.5.1, ArchGDAL-0.8.5,
ArnoldiMethod-0.2.0, Arrow_jll-10.0.0+1, boost_jll-1.76.0+1,
[...]
$ module load Circuitscape/5.12.3-Julia-1.9.2
$ julia -e 'using Pkg; Pkg.add("AlgebraicMultigrid")'
[...]
Updating `/rhea/scratch/brussel/101/vsc10122/julia/environments/v1.9/Project.toml`
[2169fc97] + AlgebraicMultigrid v0.6.0
Updating `/rhea/scratch/brussel/101/vsc10122/julia/environments/v1.9/Manifest.toml`
[...]
65 dependencies successfully precompiled in 110 seconds. 28 already precompiled
$ julia -e 'using Circuitscape'
On the other hand, you can also use all the packages provided by some software module as the base for a new custom project environment. In such a case, you have to manually create the new project environment by copying the environment of the loaded software module.
$ module load Circuitscape/5.12.3-Julia-1.9.2
$ base_project=$(julia -E 'Base.load_path()[end]')
$ cp -r "$(dirname ${base_project:1:-1})" myNewEnv
$ julia -e 'using Pkg; Pkg.activate("myNewEnv"); Pkg.status()'
Activating project at /vscmnt/brussel_pixiu_data/_data_brussel/vo/000/bvo00005/vsc10122/tests/julia/myNewEnv
Status /vscmnt/brussel_pixiu_data/_data_brussel/vo/000/bvo00005/vsc10122/tests/julia/myNewEnv/Project.toml
[...]
[2b7a1792] Circuitscape v5.12.3 /apps/brussel/RL8/skylake-ib/software/Circuitscape/5.12.3-Julia-1.9.2/packages/Circuitscape
[...]
Once this new environment is active, installations of new Julia packages will
take into account the packages already provided by the loaded modules. For
instance, the following example installs CSV on top of the packages provided
by Circuitscape/5.12.3-Julia-1.9.2
, which results in only 8 new packages
installed, compared to the 22 installed on an empty environment.
$ module load Circuitscape/5.12.3-Julia-1.9.2
$ julia -e 'using Pkg; Pkg.activate("myNewEnv"); Pkg.add("CSV")'
Activating project at /vscmnt/brussel_pixiu_data/_data_brussel/vo/000/bvo00005/vsc10122/tests/julia/myNewEnv
Resolving package versions...
Updating /vscmnt/brussel_pixiu_data/_data_brussel/vo/000/bvo00005/vsc10122/tests/julia/myNewEnv/Project.toml
[336ed68f] + CSV v0.10.13
Updating /vscmnt/brussel_pixiu_data/_data_brussel/vo/000/bvo00005/vsc10122/tests/julia/myNewEnv/Manifest.toml
[336ed68f] + CSV v0.10.13
[...]
Precompiling project...
7 dependencies successfully precompiled in 53 seconds. 100 already precompiled.
4.6. Installing additional packages with conda#
Conda is a very popular tool to install software packages that is not restricted to a single language environment, such as Python, R or Julia. Moreover, the library of available software is very extensive and you will very likely find all needed packages ready to be installed. Mostly thanks to the community driven conda-forge repository.
However, the convenience of conda comes with serious drawbacks:
In 2024, Anaconda changed their licensing model to require any organization with 200 or more employees or contractors to purchase a paid license to use Anaconda’s software, including government entities and non-profit organizations. This affects the use of their package repositories in the so-called defaults channel (pkg/main, pkgs/r and pkg/msys2). However,
conda
itself (the package manager) is not covered and continues to be open-source. Other repositories not managed by Anaconda, like the community driven conda-forge, are not covered by this license either.Software is not optimized. In most cases provided programs are generic binaries compiled to be compatible across multiple systems. This means that those programs will not take advantage of the specific hardware features available in our clusters and run slower than the software we provide in modules.
Tip
Some packages might provide variants with better optimizations. Look for the following suffixes in the package name: AVX (better vectorization), MKL (Intel numeric libraries), CUDA (Nvidia GPU support)
Software installations with
conda
are isolated from the rest system. This is a good feature in general to avoid conflicts, but it also implies thatconda
will download all dependencies needed to run the requested application, which can potentially result in hundreds of thousands of files installed in your account. Depending on the characteristics of the storage, this can push your quota to the limit or have a performance impact.
Therefore, our recommendation is to use conda
only as a last resort option
for software that cannot be installed in modules (e.g. software needing old
libraries) or that is not tied to a single high-level language (i.e.
Python, R or
Julia). All other options will provide equal
or better performance.
Once you are settled on using conda
, we recommend an alternative
implementation such as Mamba, which is
open source and not tied to Anaconda’s license restrictions; or
Miniconda, which does not enable
by default the license encumbered defaults channel from Anaconda.
Alert Your $VSC_HOME
has a very small quota on
purpose and is not meant to hold software installations. You should place your
~/.conda
folder on $VSC_DATA
to avoid filling up your home directory.
You can also use a conda
installation provided centrally (e.g. with
modules), avoiding to install conda, miniconda
or mamba on your own. This will save you downloading around 25k files and
500 MB into your account.
Recommended Follow these steps to avoid the aforementioned pitfalls of working with conda environments:
(Only needed once) Place
~/.conda
on$VSC_DATA
$ mkdir -p ~/.conda $ mv ~/.conda $VSC_DATA/ $ ln –s $VSC_DATA/.conda ~/.conda
Load a central installation of conda based on an open-source alternative not managed by Anaconda
$ module load Mamba
Install your software as usual with the
conda
command in its own environment$ conda create -c conda-forge -n environment_name <...> $ conda install -c conda-forge package_name <...>
Note
Follow installation instructions from the developers if available.