Data Storage#

Hydra provides multiple storage systems for user data. Each storage system is accessible from its own directory in the cluster. Those entry points are just regular folders, such as the path in the $VSC_HOME environment variable, which points to the folder containing your home directory.

There are 3 main types of storage in our HPC clusters:

Each of your data partitions in the cluster can be easily accessed through their corresponding $VSC_ environment variables. Please avoid using full directory paths whenever possible, as those might not work in all compute nodes or might stop working in the future.

The disk quota on Home, Data, and Scratch are fixed. If you need more disk space, we highly recommend to join the Virtual Organization (VO) of your group. If your group does not have a VO yet, please ask your group leader to create a VO.

Warning

Long-term archival of data in our HPC clusters is forbidden. The storage provided in the cluster is meant to be used for data needed by or generated by computational jobs in active research projects.

The HPC team is not responsible for any data loss, although we will do everything we can to avoid it. Users should regularly backup important data outside of the HPC cluster and regularly clean up their folders in the HPC.

User storage#

Home

Location

$VSC_HOME, $HOME, or ~

Purpose

Storage of configuration files and user preferences.

Availability: Very high

Accessible from login nodes, compute nodes and from all clusters in VSC.

Capacity: Low

5.7 GB (soft limit), 6 GB (hard limit, 7 days grace time)

Perfomance: Low

Jobs must never use files in your home directory.

Reliability: Very High

Data is stored in a redundant file system, with data replication off-site.

Back-ups

7 days (daily data snapshots are kept for 7 days)

Data

Location

$VSC_DATA

Purpose

Storage of datasets or resulting data that must be stored in the cluster to carry out further computational jobs.

Availability: Very high

Accessible from login nodes, compute nodes and from all clusters in VSC.

Capacity: Medium

47.50 GB (soft limit), 50 GB (hard limit, 7 days grace time)

Perfomance: Low

Jobs must always copy any data needed from $VSC_DATA to the scratch before the run and save any results from scratch into $VSC_DATA after the run.

Reliability: Very High

Data is stored in a redundant file system, with data replication off-site.

Back-ups

7 days (daily data snapshots are kept for 7 days)

Scratch

Location

$VSC_SCRATCH, $VSC_SCRATCH_SITE

Purpose

Storage of temporary or transient data.

Availability: High

Accessible from login nodes and compute nodes in the local cluster. Not accessible from other VSC clusters.

Capacity: High

95 GB (soft limit), 100 GB (hard limit, 7 days grace time)

Perfomance: High

Preferred location for all data files read or written during the execution of a job. Suitable for all workload types.

Reliability: Medium

Data is stored in a redundant filesystem, but without off-site replication.

Back-ups

None

Node local scratch

Location

$VSC_SCRATCH_NODE, $TMPDIR

Purpose

Storage of temporary or transient data.

Availability: Low

Only accessible from the compute node running the job.

Capacity: Variable

Maximum data usage depends on the VSC Docs: local disk space. Note that the available disk space is shared among all jobs running in the node.

Perfomance: High

Might be beneficial for special workloads that require lots of random I/O. Users should always confirm the need for a node local scracth through benchmarking.

Reliability: Low

Data is stored in a non-redundant filesystem without replication.

Back-ups

All data is automatically deleted from the compute nodes once your job has ended.

More technical details are available at VSC Docs: VUB storage

Virtual Organization storage#

VO Data

Location

$VSC_DATA_VO, $VSC_DATA_VO_USER

Purpose

Storage of datasets or resulting data that must be stored in the cluster to carry out further computational jobs, and that can be shared with co-workers.

Availability: Very high

Accessible from login nodes, compute nodes and from all clusters in VSC.

Capacity: High and expandable

By default, 112.5 GB (soft limit), 125.0 GB (hard limit, 7 days grace time). Can be expanded upon request.

Perfomance: Low

Jobs must always copy any data needed from $VSC_DATA_VO to the scratch before the run and save any results from scratch into $VSC_DATA_VO after the run.

Reliability: Very High

Data is stored in a redundant file system, with data replication off-site.

Back-ups

7 days (daily data snapshots are kept for 7 days)

VO Scratch

Location

$VSC_SCRATCH_VO, $VSC_SCRATCH_VO_USER

Purpose

Storage of temporary or transient data that can be shared with co-workers.

Availability: High

Accessible from login nodes and compute nodes in the local cluster. Not accessible from other VSC clusters.

Capacity: High and expandable

225 GB (soft limit), 250 GB (hard limit, 7 days grace time). Can be expanded upon request.

Perfomance: High

Preferred location for all data files read or written during the execution of a job. Suitable for all workload types.

Reliability: Medium

Data is stored in a redundant filesystem, but without off-site replication.

Data management in jobs#

As described in the previous sections, all job scripts must read/write data from/to scratch to run with optimal performance. It can be either the personal VSC_SCRATCH or the scratch of your VO VSC_SCRATCH_VO. However, important datasets and results should not be left in the scratch and instead be transferred to the more reliable storage of VSC_DATA or VSC_DATA_VO, which has backups.

This management of data can be cumbersome to handle for those users running many jobs regularly. The solution is to automatize as much as possible in your job scripts. The shell scripting language used in job scripts is quite powerful and can be used to automatically create directories, copy data and change files before executing the code running your simulations.

The job script below is a simple example to create a transient working directory on-the-fly in VSC_SCRATCH that will be used to read/write all data during execution. The job will perform the following steps:

  1. Create a new unique working directory in VSC_SCRATCH

  2. Copy all input files from a folder in VSC_DATA into the working directory

  3. Execute the simulation using the files from the working directory

  4. Copy the output file to the same directory from where the job was launched

  5. Delete the working directory

Job script to automatically manage data between VSC_DATA and VSC_SCRATCH#
 1#!/bin/bash
 2
 3#SBATCH --job-name="your-job"
 4#SBATCH --output="%x-%j.out"
 5
 6module load AwesomeSoftware/1.1.1
 7
 8# Input data directory in VSC_DATA
 9DATADIR="${VSC_DATA}/my-project/dataset01"
10
11# Working directory for this job in VSC_SCRATCH
12WORKDIR="${VSC_SCRATCH:-/tmp}/${SLURM_JOB_NAME:-$USER}.${SLURM_JOB_ID:-0}"
13
14# Populate working directory
15echo "== Populating new working directory: $WORKDIR"
16mkdir -p "$WORKDIR"
17rsync -av "$DATADIR/" "$WORKDIR/"
18cd "$WORKDIR"
19
20# Start simulation
21# note: adapt to your case, input/output files might be handled differently
22<your-command> data.inp > results.out
23
24# Save output and clean the scratch
25# (these steps are optional, you can also perform these manually once the job finishes)
26cp -a results.out "$SLURM_SUBMIT_DIR/"
27rm -r "$WORKDIR"