1. Basic use of the HPC#
1.1. How can I connect to Hydra?#
Connecting to Hydra requires an active VSC account. If you are eligible, you can already create your account.
Once you have your VSC account, you need a client software in your computer to connect to Hydra. We provide instructions for Windows, macOS, and Linux to setup your connection.
1.2. How can I simplify my login to the HPC?#
On Windows, once your connection to any VSC cluster is properly configured in the SSH client software of your choice (i.e. PuTTY or MobaXterm), the login process becomes a click in the respective connection icon. The SSH client will automatically use your VSC ID and pick your SSH key.
On Linux and MacOS, it is possible to create a shortcut in SSH for your
connections to the VSC clusters as well. You can configure all the details of
your connection in the file
~/.ssh/config as follows:
Create the file
~/.ssh/config if it does not exist.
1Host login.hpc.vub.be login.hpc.uantwerpen.be login.hpc.ugent.be login.hpc.kuleuven.be login1-tier1.hpc.kuleuven.be login2-tier1.hpc.kuleuven.be 2 User vsc10xxx 3 IdentityFile ~/.ssh/your_vsc_key_file 4Host vubhpc 5 Hostname login.hpc.vub.be 6Host ugenthpc 7 Hostname login.hpc.ugent.be 8Host breniac 9 Hostname login1-tier1.hpc.kuleuven.be login2-tier1.hpc.kuleuven.be
Host login.hpc…: this first block defines the connection settings for all existing VSC login nodes
User: replace vsc10xxx with your VSC ID
~/.ssh/your_vsc_key_filewith the path to your SSH key
Host vubhpc: this block creates a shortcut that is specific for
Once these settings are added to
~/.ssh/config, you will be able to connect
to Hydra with the command
Or to the Tier-1 Breniac cluster in KU Leuven with
More information and advanced options are available in VSCdocSSH config
1.3. What can I do in the login node?#
The login node is your main interface with the compute nodes. This is where you can access your data, write the scripts and input files for your calculations and submit them to the job scheduler of Hydra. It is also possible to run small scripts in the login nodes, for instance to process the resulting data from your calculations or to test your scripts before submission to the scheduler. However, the login node is not the place to run your calculations and hence, the following restrictions apply:
Any single user can use a maximum of 12GB of memory in the login node.
The amount of CPU time that can be used is always fairly divided over all users. A single user cannot occupy all CPU cores.
The allowed network connections to the outside are limited to SSH, FTP, HTTP and HTTPS.
Lightweight graphical applications can be launched through X11 forwarding. For more complex programs and visualization tools, a graphical desktop environment is available through a VNC. More information in the Software section: Graphical applications
Jobs submitted to the scheduler are preprocessed before placement in the queues to ensure that their requirements of resources are correct. For instance, the system automatically assigns memory limits to your job if you didn’t specify it manually. Detailed information can be found in the section Job Submission.
Users compiling their own software should check Installing additional software
1.4. What software is available?#
Software in Hydra is provided with modules that can be dynamically loaded by the users. Please, read our documentation on the Module System.
1.5. How can I check my disk quota?#
Your VSC account page shows up to
date information (updated every 15 min) about data usage and the quota of your
$VSC_SCRATCH, as well as your
Virtual Organization (
$VSC_SCRATCH_VO). You can get more up to
date information about the scratch storage with the command
(updated every 5 min).
You will receive a warning notification by email whenever you reach 90% of your quota in any of the partitions in Hydra.
To prevent your account from becoming unusable, you should regularly check your disk quota and cleanup any files that are no longer necessary for your active projects.
1.6. How can I check my resource usage?#
Making an efficient use of the HPC cluster is important for you and your colleagues. Requesting too many resources is detrimental in several ways:
Your jobs will stay more time in queue: the larger the pool of requested resources, the more difficult is it for the resource manager to free them
The bigger the job the larger impact it will have on the queue and big enough jobs can cause a general slow down of the speed of the queue
Any computational resources not used are a waste of energy, which directly translate to carbon emissions
slurm_jobinfo <jobID> shows the resource usage of a given job, where
<jobID> has to be replaced by the 8-digit number that identifies your job.
Name : my-job01 User : vscXXXXX Partition : skylake Nodes : node300 Cores : 1 State : COMPLETED Submit : 2023-09-26T11:01:12 Start : 2023-09-26T11:01:14 End : 2023-09-26T11:07:45 Reserved walltime : 00:10:00 Used walltime : 00:06:31 Used CPU time : 00:05:29 % User (Computation): 83.81% % System (I/O) : 16.19% Mem reserved : 4500M Max Mem used : 171.21M (node300) Max Disk Write : 75.88M (node300) Max Disk Read : 555.45M (node300) Working directory : /theia/scratch/brussel/XXX/vscXXXXX
The 3 main resources to keep an eye on are:
- time limit (Reserved walltime, Used walltime)
Maximum is 5 days. Always set a time limit that is close to the duration of your job (with some margin). Longer time limits cause longer wait times in queue as it is more difficult for the scheduler to find a window in the schedule for your job.
- memory (Max Mem used, Mem reserved)
Jobs get ~4 GB per core by default. Always set a slightly higher (10%) amount of memory than needed by your job. Requesting excessive memory will not make your job any faster but it will make it wait longer in queue.
- core activity (Used CPU time)
Amount of time spent by the job using its CPU cores. Very efficient jobs should have a Used CPUTime close to Used walltime multiplied by the number of cores requested by the job. Keep in mind that not all software can exploit any arbitrary number of cores. For instance,
Rare limited to 1 core by default and any additional cores will be ignored (see How to run Python in parallel? and How can I run R in parallel? for more information).
mysacct also shows the resource usage of recently finished jobs,
including individual job steps (advanced usage).
1.7. How can I get more storage?#
The storage provided in the individual partitions
$VSC_SCRATCH is relatively limited on purpose. Users needing a larger
storage space are expected to be part of a Virtual Organization (VO) and use the shared
storage in it.
1.8. How can I use GPUs in my jobs?#
The available GPUs in Hydra are listed in VSCdocHydra hardware
To use GPUs with Slurm have a look at Jobs for GPUs.
1.9. Where can I find public datasets and databases?#
We provide storage for datasets and databases that are public and free to use in
the shared directory
/databases. The data in there is accessible by all
users of the HPC cluster. Users who need public data to run their calculations
should always check first if it is already available in
Helpdesk We can add new databases to the HPC cluster upon request.
The PDB database can be found in
/databases/bio/PDB and is automatically updated on a weekly basis.
1.10. Can I run containers on Hydra?#
We support Singularity containers in
Hydra. Singularity allows any user to run a container without root privileges.
You can either use any of the containers already installed in Hydra that are
/apps/brussel/singularity/, use your own container or request the
installation of a container to VUB-HPC Support.
Users can create their own Singularity image in their personal computer (with root privileges) either manually or by importing an existing Docker image. Then the resulting container can then be transferred to Hydra and run with your VSC user account. We recommend using Singularity containers to use software with very specific and complex requirements (dependencies, installation, …) and for quickly testing any software.
The documentation by Sylabs, the developers of Singularity.
The Module System is still the preferred method to run software in Hydra. Singularity containers are usually not optimized for the different CPU architectures present in the cluster and put more network pressure on the system.