Notebooks#
Computational notebooks are an alternative interface to the traditional terminal to access and use the HPC. We provide a notebook platform integrated with our Tier-2 HPC cluster (Hydra) at notebooks.hpc.vub.be. This platform is based on the popular Jupyter project and allows to manage and launch your notebooks directly on the HPC from a JupyterLab environment.
Warning
The notebook platform is currently in a pilot phase. Some of its features are still in progress and there might be changes without previous notice. If you find anything that is not working as expected, please contact VUB-HPC Support
Access to the notebook platform#
All VSC users can use the notebook platform of VUB-HPC. The same access policies apply to request a new VSC account to use the notebook platform than the regular terminal interface.
Access to the notebook platform does not require the upload of any SSH key to your VSC account. This means that if you will only use the HPC through the notebook interface, the process of creating your VSC account is much simpler and you can skip all steps related to the creation and upload of the SSH key. This can be specially useful for teaching, as students carrying out exercises on the HPC can now create their VSC accounts and access the cluster entirely from the web browser.
Once you log in to notebooks.hpc.vub.be, the following screen will request read access to your VSC account:
Click on Authorize and you will be automatically redirected to the notebook platform.
Computational resources#
The notebook platform allows to launch JupyterLab environments directly on the Tier-2 HPC cluster of VUB (Hydra). After a successful login, you will be presented with a panel to select the computational resources dedicated to your notebooks.
Notebooks can be launched on almost all cluster partitions of Hydra. You can start your JupyterLab session on a generic compute node (Intel Broadwell or Skylake), on GPUs (Nvidia Pascal) or even on nodes with InfiniBand interconnect (advanced). The only limitation are the Nvidia Ampere GPUs, which are left out of the pool of resources for notebooks as they are already in very high demand by regular computational jobs.
The maximum amount of resources available to notebooks is smaller than those of regular jobs due to the interactive nature of this interface. Hence, each user is only allowed to start a single JupyterLab session at a time, with a maximum of 10 dedicated CPU cores (e.g. to run 10 notebooks simultaneously) and 12 hours of execution time on generic nodes and 6 hours on GPUs. We consider that these restrictions fit well in one day of work running multiple notebooks. Longer or bigger simulations should continue to use regular computational jobs.
The resources available to the notebook platform are subject to change. If the current options are not sufficient for your workflow, please contact us at VUB-HPC Support
Jupyter environment#
The main work environment provided by the notebook platform is JupyterLab. If you are not familiar with it, please check its official documentation at jupyterlab.readthedocs.io
You will find several options in the menu Jupyter environment of the resource selection panel. All the options will launch a JupyterLab interface running on the HPC, integrated with the software module system and with all notebook kernels available. The differences between these lab environments concern:
version of Python and JupyterLab used in the environment
pre-installed lab extensions
available software modules (software toolchain)
Users are not allowed to install JupyterLab extensions on their own, those are managed by VUB-HPC. Therefore, you will typically find a default environment of the lab with just the software module extension plus some others environments with extra extensions.
Environments are grouped in the menu Jupyter environment by the software toolchain and its underlying Python version running JupyterLab. Some examples:
- <year> Default: minimal with all modules available
This is a default JupyterLab environment without any extensions beyond the integration with the module system of the HPC. It uses modules in the corresponding <year> toolchains and the indicated version of Python for the kernel of its Python notebooks. All other notebook kernels can be loaded on-demand through the module system.
- <year> DataScience: SciPy-bundle + matplotlib + dask
Default JupyterLab environment with pre-loaded data science Python packages such as numpy, scipy, pandas and matplotlib; plus the capability to display in your notebooks interactive graphs with matplotlib and an integrated Dask dashboard to manage and monitor your workflows with Dask.
- <year> Molecules: DataScience + nglview + 3Dmol
DataScience JupyterLab environment plus the nglview and 3Dmol lab extensions to visualize molecular structures in 3D.
- <year> RStudio with R
Default JupyterLab environment plus a pre-loaded R kernel and a lab extension to launch RStudio from within the lab interface.
- <year> MATLAB
Default JupyterLab environment plus a pre-loaded MATLAB kernel and a lab extension to launch MATLAB Desktop from within the lab interface.
File browsing#
JupyterLab and notebooks will be launched from
your VSC_DATA storage by default. You can change this
starting location by setting ServerApp.root_dir
in the
configuration file of jupyter-server
in your home directory, as shown in the example below:
1## The directory to use for notebooks and kernels.
2# Default: ''
3import os
4c.ServerApp.root_dir = os.environ['VSC_SCRATCH']
Regardless of your starting directory in the lab, you can access all your files
and folders in the HPC from your notebooks. We recommend to move around your
personal storage partitions and those of your Virtual Organization (VO) by relying on the
environment variables $VSC_HOME
, $VSC_DATA
, $VSC_SCRATCH
and their
variants for VOs. See below for an example:
import os
vsc_scratch = os.environ['VSC_SCRATCH']
# change current working directory
os.chdir(vsc_scratch)
# open file by absolute path
filename = os.path.join(vsc_scratch, 'some_folder', 'some_data.txt')
with open(filename) as f:
content = f.readlines()
The file browser in JupyterLab can be accessed through the tab on
the left panel (see screenshot on the right). This file browser can be used to
navigate your starting location (VSC_DATA
by default) and all its
sub-folders, but it does not allow to jump to other storage partitions.
You can open notebooks in any storage partition in Hydra from the menu File > Open from Path…. A pop-up will open where you can write the absolute path to the notebook file.
Alternatively, you can use symbolic links to quickly access any other storage
from the file browser of the lab. Symbolic links are a feature of the
underlying Linux system that allows to link any existing file or folder from
any location. You can create new symbolic links from a Linux shell on the HPC
by using the ln -s
command. The screenshot on the right shows the resulting
home
and scratch
symbolic links on VSC_DATA
from the commands
below:
ln -s $VSC_SCRATCH $VSC_DATA/scratch
ln -s $VSC_HOME $VSC_DATA/home
Software modules#
The JupyterLab environment launched by the notebook platform is integrated with the software module system in the HPC. This means that you can load and use in your notebooks the same software packages used in your computational jobs.
You can load software modules from the tab with a blue hexagon icon on the left panel of JupyterLab. This tab opens a list of loaded modules followed by a list of available modules.
Upon launch, the list of loaded modules will already show some modules loaded
by JupyterLab itself. These modules are necessary for the correct function of
the lab and notebooks and should not be unloaded. For instance, you will always
see a module of Python loaded which determines the version of Python of the
kernel used by your Python notebooks on this session. On the screenshot on the
right, the module of Python is Python/3.9.5-GCCcore-10.3.0
and hence the
python kernel is v3.9.5.
Below loaded modules, you will find the list of available modules that can be loaded on-demand. Point your cursor to the right of the module name and a Load button will appear (see screenshot on the right). All modules shown in the list are compatible with each other, so you can load any combination of modules.
All available JupyterLab environments use a single module toolchain. You can select the toolchain of your JupyterLab session on launch from the resource selection panel. The menu Jupyter environment lists the available environments indicating the Python version and the generation of its toolchain.
Warning Any change to the list of loaded modules requires rebooting the kernel of any open notebook. After loading/unloading modules, click the button on the notebook toolbar (see screenshot below) to restart the active kernel. If a kernel restart is not sufficient to import recently loaded modules (might happen in some lab environments, such as 2022a), then you can force a kernel reboot by shutting it down and reloading it. Click on the top-right button of the notebook toolbar, labelled Python 3 (ipykernel) in the screenshot below, and re-select your notebook kernel from the menu.
Notebook kernels#
The following table shows the notebook kernels available in all JupyterLab environments of this platform and the corresponding modules that have to be loaded to enable them:
Kernel |
2021a Module |
2022a Module |
---|---|---|
Python |
(loaded by default) |
(loaded by default) |
R |
IRkernel |
IRkernel |
Julia |
IJulia |
IJulia |
MATLAB |
MATLAB-Kernel |
jupyter-matlab-proxy + MATLAB |
The default lab environment only loads the Python kernel on launch. You can
activate any other kernel by loading its corresponding module. Upon load a
launcher will automatically appear to start a notebook with that kernel.
Some specific Jupyter environments have extra kernels already loaded by
default, for instance 2021a: Python v3.9.5 + RStudio also loads
IRkernel
which makes notebooks with R readily available on it.
RStudio environment#
You can launch RStudio from the notebook platform. This environment is specific to the R language. If you are not familiar with it, please check its documentation site at education.rstudio.com.
RStudio is available through any Jupyter environment with RStudio in its name. Launching any of these environments will start a JupyerLab with the R kernel readily available for your notebooks and a launcher for RStudio.
MATLAB environment#
The Desktop interface of MATLAB is available on the notebook platform as well. This graphical interface is analog to MATLAB Desktop but it works on the web browser. If you are not familiar with this environment, please check its documentation site at mathworks.com/help/matlab.
MATLAB Desktop is available through any Jupyter environment with MATLAB in its name. Launching these environments will start a JupyterLab with the MATLAB kernel readily available for your notebooks and a launcher for MATLAB Desktop.
Warning
The start-up time of MATLAB v2021a is very slow. Launching a MATLAB notebook or MATLAB desktop with this version will take several minutes (~ 5 minutes in our tests). We recommend using at least version 2022a.
Custom Python environments#
You can use Python virtual environments to generate custom kernels for your notebooks. Virtual environments provide a layer of isolation allowing users to install additional Python packages on top of the software modules without conflicts. Each of your virtual environments can be added as a new kernel for your notebooks and launched from the lab interface.
The main step in adding a new kernel to your JupyterLab environment from one of your virtual environments is to create the virtual environment itself.
Start a new session in notebooks.hpc.vub.be in the cluster partition and with the Jupyter environment of choice
Note
Software installed in virtual environments will only work in the cluster partition and Jupyter environment used for its creation.
Open the Terminal from your lab interface
Follow the instructions in Python virtual environments to create a new virtual environment and install any Python packages in it. Keep in mind that loading the Python module is not necessary as that is already done by the JupyterLab session. This new virtual environment can be placed anywhere you like in the storage of the cluster.
$ virtualenv --system-site-packages myenv $ source myenv/bin/activate (myenv) $ (myenv) $ python -m pip install --upgrade pip (myenv) $ python -m pip install <insert_cool_package>
Add your new virtual environment as a new Jupyter kernel (from the same terminal shell)
$ python -m ipykernel install --user --name=myenv
A new launcher will appear in the lab interface to start notebooks using this new virtual environment
Note
Whenever you want to reuse your existing virtual environments in the lab, keep in mind to load any software modules used in its creation beforehand.