Notebooks#

Notebook Platform of VUB-HPC

notebooks.hpc.vub.be

Computational notebooks are an alternative interface to the traditional terminal to access and use the HPC. We provide a notebook platform integrated with our Tier-2 HPC cluster (Hydra) at notebooks.hpc.vub.be. This platform is based on the popular Jupyter project and allows to manage and launch your notebooks directly on the HPC from a JupyterLab environment.

Warning

The notebook platform is currently in a pilot phase. Some of its features are still in progress and there might be changes without previous notice. If you find anything that is not working as expected, please contact VUB-HPC Support

Access to the notebook platform#

All VSC users can use the notebook platform of VUB-HPC. The same access policies apply to request a new VSC account to use the notebook platform than the regular terminal interface.

Access to the notebook platform does not require the upload of any SSH key to your VSC account. This means that if you will only use the HPC through the notebook interface, the process of creating your VSC account is much simpler and you can skip all steps related to the creation and upload of the SSH key. This can be specially useful for teaching, as students carrying out exercises on the HPC can now create their VSC accounts and access the cluster entirely from the web browser.

Once you log in to notebooks.hpc.vub.be, the following screen will request read access to your VSC account:

../../_images/jupyterhub-oauth-request.png

Request to get read access to your VSC account.#

Click on Authorize and you will be automatically redirected to the notebook platform.

Computational resources#

The notebook platform allows to launch JupyterLab environments directly on the Tier-2 HPC cluster of VUB (Hydra). After a successful login, you will be presented with a panel to select the computational resources dedicated to your notebooks.

../../_images/jupyterhub-moss-simple.png

Panel with simple selection of computational resources for JupyterLab session.#

Notebooks can be launched on almost all cluster partitions of Hydra. You can start your JupyterLab session on a generic compute node (Intel Broadwell or Skylake), on GPUs (Nvidia Pascal) or even on nodes with InfiniBand interconnect (advanced). The only limitation are the Nvidia Ampere GPUs, which are left out of the pool of resources for notebooks as they are already in very high demand by regular computational jobs.

The maximum amount of resources available to notebooks is smaller than those of regular jobs due to the interactive nature of this interface. Hence, each user is only allowed to start a single JupyterLab session at a time, with a maximum of 10 dedicated CPU cores (e.g. to run 10 notebooks simultaneously) and 12 hours of execution time on generic nodes and 6 hours on GPUs. We consider that these restrictions fit well in one day of work running multiple notebooks. Longer or bigger simulations should continue to use regular computational jobs.

The resources available to the notebook platform are subject to change. If the current options are not sufficient for your workflow, please contact us at VUB-HPC Support

Jupyter environment#

The main work environment provided by the notebook platform is JupyterLab. If you are not familiar with it, please check its official documentation at jupyterlab.readthedocs.io

You will find several options in the menu Jupyter environment of the resource selection panel. All the options will launch a JupyterLab interface running on the HPC, integrated with the software module system and with all notebook kernels available. The differences between these lab environments concern:

  • version of Python and JupyterLab used in the environment

  • pre-installed lab extensions

  • available software modules (software toolchain)

Users are not allowed to install JupyterLab extensions on their own, those are managed by VUB-HPC. Therefore, you will typically find a default environment of the lab with just the software module extension plus some others environments with extra extensions.

Environments are grouped in the menu Jupyter environment by the software toolchain and its underlying Python version running JupyterLab. Some examples:

<year> Default: minimal with all modules available

This is a default JupyterLab environment without any extensions beyond the integration with the module system of the HPC. It uses modules in the corresponding <year> toolchains and the indicated version of Python for the kernel of its Python notebooks. All other notebook kernels can be loaded on-demand through the module system.

<year> DataScience: SciPy-bundle + matplotlib + dask

Default JupyterLab environment with pre-loaded data science Python packages such as numpy, scipy, pandas and matplotlib; plus the capability to display in your notebooks interactive graphs with matplotlib and an integrated Dask dashboard to manage and monitor your workflows with Dask.

<year> Molecules: DataScience + nglview + 3Dmol

DataScience JupyterLab environment plus the nglview and 3Dmol lab extensions to visualize molecular structures in 3D.

<year> RStudio with R

Default JupyterLab environment plus a pre-loaded R kernel and a lab extension to launch RStudio from within the lab interface.

<year> MATLAB

Default JupyterLab environment plus a pre-loaded MATLAB kernel and a lab extension to launch MATLAB Desktop from within the lab interface.

File browsing#

JupyterLab and notebooks will be launched from your VSC_DATA storage by default. You can change this starting location by setting ServerApp.root_dir in the configuration file of jupyter-server in your home directory, as shown in the example below:

Example ~/.jupyter/jupyter_server_config.py to change starting location of JupyterLab to VSC_SCRATCH#
1## The directory to use for notebooks and kernels.
2#  Default: ''
3import os
4c.ServerApp.root_dir = os.environ['VSC_SCRATCH']

Regardless of your starting directory in the lab, you can access all your files and folders in the HPC from your notebooks. We recommend to move around your personal storage partitions and those of your Virtual Organization (VO) by relying on the environment variables $VSC_HOME, $VSC_DATA, $VSC_SCRATCH and their variants for VOs. See below for an example:

import os

vsc_scratch = os.environ['VSC_SCRATCH']

# change current working directory
os.chdir(vsc_scratch)

# open file by absolute path
filename = os.path.join(vsc_scratch, 'some_folder', 'some_data.txt')
with open(filename) as f:
    content = f.readlines()
../../_images/jupyterhub-file-browser.png

File browser in JupyterLab.#

The file browser in JupyterLab can be accessed through the tab on the left panel (see screenshot on the right). This file browser can be used to navigate your starting location (VSC_DATA by default) and all its sub-folders, but it does not allow to jump to other storage partitions.

You can open notebooks in any storage partition in Hydra from the menu File > Open from Path…. A pop-up will open where you can write the absolute path to the notebook file.

Alternatively, you can use symbolic links to quickly access any other storage from the file browser of the lab. Symbolic links are a feature of the underlying Linux system that allows to link any existing file or folder from any location. You can create new symbolic links from a Linux shell on the HPC by using the ln -s command. The screenshot on the right shows the resulting home and scratch symbolic links on VSC_DATA from the commands below:

create symbolic links in VSC_DATA to your scratch and home#
ln -s $VSC_SCRATCH $VSC_DATA/scratch
ln -s $VSC_HOME $VSC_DATA/home

Software modules#

The JupyterLab environment launched by the notebook platform is integrated with the software module system in the HPC. This means that you can load and use in your notebooks the same software packages used in your computational jobs.

../../_images/jupyterhub-lmod-tab.png

Module tab in JupyterLab.#

You can load software modules from the tab with a blue hexagon icon on the left panel of JupyterLab. This tab opens a list of loaded modules followed by a list of available modules.

Upon launch, the list of loaded modules will already show some modules loaded by JupyterLab itself. These modules are necessary for the correct function of the lab and notebooks and should not be unloaded. For instance, you will always see a module of Python loaded which determines the version of Python of the kernel used by your Python notebooks on this session. On the screenshot on the right, the module of Python is Python/3.9.5-GCCcore-10.3.0 and hence the python kernel is v3.9.5.

../../_images/jupyterhub-lmod-load.png

Loading a module from the module tab in JupyterLab.#

Below loaded modules, you will find the list of available modules that can be loaded on-demand. Point your cursor to the right of the module name and a Load button will appear (see screenshot on the right). All modules shown in the list are compatible with each other, so you can load any combination of modules.

All available JupyterLab environments use a single module toolchain. You can select the toolchain of your JupyterLab session on launch from the resource selection panel. The menu Jupyter environment lists the available environments indicating the Python version and the generation of its toolchain.

Warning Any change to the list of loaded modules requires rebooting the kernel of any open notebook. After loading/unloading modules, click the button on the notebook toolbar (see screenshot below) to restart the active kernel. If a kernel restart is not sufficient to import recently loaded modules (might happen in some lab environments, such as 2022a), then you can force a kernel reboot by shutting it down and reloading it. Click on the top-right button of the notebook toolbar, labelled Python 3 (ipykernel) in the screenshot below, and re-select your notebook kernel from the menu.

../../_images/jupyterhub-kernel-reload.png

Notebook toolbar.#

Notebook kernels#

The following table shows the notebook kernels available in all JupyterLab environments of this platform and the corresponding modules that have to be loaded to enable them:

Notebooks kernels and the software modules needed to use them#

Kernel

2021a Module

2022a Module

Python

(loaded by default)

(loaded by default)

R

IRkernel

IRkernel

Julia

IJulia

IJulia

MATLAB

MATLAB-Kernel

jupyter-matlab-proxy + MATLAB

The default lab environment only loads the Python kernel on launch. You can activate any other kernel by loading its corresponding module. Upon load a launcher will automatically appear to start a notebook with that kernel. Some specific Jupyter environments have extra kernels already loaded by default, for instance 2021a: Python v3.9.5 + RStudio also loads IRkernel which makes notebooks with R readily available on it.

../../_images/jupyterhub-all-kernels.png

Launchers for notebooks for Python, Julia, MATLAB and R.#

RStudio environment#

You can launch RStudio from the notebook platform. This environment is specific to the R language. If you are not familiar with it, please check its documentation site at education.rstudio.com.

RStudio is available through any Jupyter environment with RStudio in its name. Launching any of these environments will start a JupyerLab with the R kernel readily available for your notebooks and a launcher for RStudio.

../../_images/jupyterhub-rstudio-launcher.png

Launchers of Python notebook, R notebook and R Studio.#

MATLAB environment#

The Desktop interface of MATLAB is available on the notebook platform as well. This graphical interface is analog to MATLAB Desktop but it works on the web browser. If you are not familiar with this environment, please check its documentation site at mathworks.com/help/matlab.

MATLAB Desktop is available through any Jupyter environment with MATLAB in its name. Launching these environments will start a JupyterLab with the MATLAB kernel readily available for your notebooks and a launcher for MATLAB Desktop.

../../_images/jupyterhub-matlab-launcher.png

Launchers of Python notebook, MATLAB notebook and MATLAB Desktop.#

Warning

The start-up time of MATLAB v2021a is very slow. Launching a MATLAB notebook or MATLAB desktop with this version will take several minutes (~ 5 minutes in our tests). We recommend using at least version 2022a.

Custom Python environments#

You can use Python virtual environments to generate custom kernels for your notebooks. Virtual environments provide a layer of isolation allowing users to install additional Python packages on top of the software modules without conflicts. Each of your virtual environments can be added as a new kernel for your notebooks and launched from the lab interface.

The main step in adding a new kernel to your JupyterLab environment from one of your virtual environments is to create the virtual environment itself.

  1. Start a new session in notebooks.hpc.vub.be in the cluster partition and with the Jupyter environment of choice

    Note

    Software installed in virtual environments will only work in the cluster partition and Jupyter environment used for its creation.

  2. Open the Terminal from your lab interface

  3. Follow the instructions in Python virtual environments to create a new virtual environment and install any Python packages in it. Keep in mind that loading the Python module is not necessary as that is already done by the JupyterLab session. This new virtual environment can be placed anywhere you like in the storage of the cluster.

    Example sequence of commands to create a new virtual environment in the directory myenv#
    $ virtualenv --system-site-packages myenv
    $ source myenv/bin/activate
    (myenv) $
    (myenv) $ python -m pip install --upgrade pip
    (myenv) $ python -m pip install <insert_cool_package>
    
  4. Add your new virtual environment as a new Jupyter kernel (from the same terminal shell)

    $ python -m ipykernel install --user --name=myenv
    
  5. A new launcher will appear in the lab interface to start notebooks using this new virtual environment

    ../../_images/jupyterhub-custom-launcher.png

    Launchers of standard Python notebook and custom Python kernel from virtual environment#

Note

Whenever you want to reuse your existing virtual environments in the lab, keep in mind to load any software modules used in its creation beforehand.