HPC Data Storage#

Hydra provides multiple storage systems for user data. Each storage system is accessible from its own directory in the cluster. Those entry points are just regular folders, such as the path in the $VSC_HOME environment variable, which points to the folder containing your home directory.

There are 3 main types of storage in our HPC clusters:

Each of your storage partitions in the cluster can be easily accessed through their corresponding $VSC_ environment variables. Please avoid using full directory paths whenever possible, as those might not work in all compute nodes or might stop working in the future.

The disk quota on Home, Data, and Scratch are fixed. If you need more disk space, we highly recommend to join the Virtual Organization (VO) of your group. If your group does not have a VO yet, please ask your group leader to create a VO.

Our storage has protection mechanisms to recover lost files. We take regular snapshots of the data in all shared storage partitions at regular time intervals and keep them for at least 7 days. Some partitions such as your Home or Data have longer historical snapshots going back to several months. The blocks below show the exact details on the available snapshots of each storage system and the FAQ “I have accidentally deleted some data, how can I restore it?” explains how to recover lost files.

Warning

Long-term archival of data in our HPC clusters is forbidden. The storage provided in the cluster is meant to be used for data needed by or generated by computational jobs in active research projects.

The HPC team is not responsible for any data loss, although we will do everything we can to avoid it. Users should regularly backup important data outside of the HPC cluster and regularly clean up their folders in the HPC.

User storage#

Home

Location

$VSC_HOME, $HOME, or ~

Purpose

Storage of configuration files and user preferences.

Availability: Very high

Accessible from login nodes, compute nodes and from all clusters in VSC.

Capacity: Low

5.7 GB (soft limit), 6 GB (hard limit, 7 days grace time)

Perfomance: Low

Jobs must never use files in your home directory.

Reliability: Very High

Data is stored in a redundant file system, with data replication off-site.

Back-ups: short and mid-term

Regular data snapshots allow to recover data at multiple points in time. Daily snapshots are kept for the past 7 days and weekly snapshots for the past 3 months.

Data

Location

$VSC_DATA

Purpose

Storage of datasets or resulting data that must be stored in the cluster to carry out further computational jobs.

Availability: Very high

Accessible from login nodes, compute nodes and from all clusters in VSC.

Capacity: Medium

47.50 GB (soft limit), 50 GB (hard limit, 7 days grace time)

Perfomance: Low

Jobs must always copy any data needed from $VSC_DATA to the scratch before the run and save any results from scratch into $VSC_DATA after the run.

Reliability: Very High

Data is stored in a redundant file system, with data replication off-site.

Back-ups: short and mid-term

Regular data snapshots allow to recover data at multiple points in time. Daily snapshots are kept for the past 7 days and weekly snapshots for the past 3 months.

Scratch

Location

$VSC_SCRATCH, $VSC_SCRATCH_SITE

Purpose

Storage of temporary or transient data.

Availability: High

Accessible from login nodes and compute nodes in the local cluster. Not accessible from other VSC clusters.

Capacity: High

95 GB (soft limit), 100 GB (hard limit, 7 days grace time)

Perfomance: High

Preferred location for all data files read or written during the execution of a job. Suitable for all workload types.

Reliability: Medium

Data is stored in a redundant filesystem, but without off-site replication.

Back-ups: short-term

Regular data snapshots allow to recover data lost recently. Daily snapshots are kept for the past 7 days.

Node local scratch

Location

$VSC_SCRATCH_NODE, $TMPDIR

Purpose

Storage of temporary or transient data.

Availability: Low

Only accessible from the compute node running the job.

Capacity: Variable

Maximum data usage depends on VSCdoclocal disk space. Note that the available disk space is shared among all jobs running in the node.

Perfomance: High

Might be beneficial for special workloads that require lots of random I/O. Users should always confirm the need for a node local scratch through benchmarking.

Reliability: Low

Data is stored in a non-redundant filesystem without replication.

Back-ups: None

All data is automatically deleted from the compute nodes once your job has ended.

More technical details are available at VSCdocVUB storage

Virtual Organization storage#

VO Data

Location

$VSC_DATA_VO, $VSC_DATA_VO_USER

Purpose

Storage of datasets or resulting data that must be stored in the cluster to carry out further computational jobs, and that can be shared with co-workers.

Availability: Very high

Accessible from login nodes, compute nodes and from all clusters in VSC.

Capacity: High and expandable

By default, 112.5 GB (soft limit), 125.0 GB (hard limit, 7 days grace time). Can be expanded upon request.

Perfomance: Low

Jobs must always copy any data needed from $VSC_DATA_VO to the scratch before the run and save any results from scratch into $VSC_DATA_VO after the run.

Reliability: Very High

Data is stored in a redundant file system with data replication off-site.

Back-ups: short and mid-term

Regular data snapshots allow to recover data at multiple points in time. Daily snapshots are kept for the past 7 days and weekly snapshots for the past 3 months.

VO Scratch

Location

$VSC_SCRATCH_VO, $VSC_SCRATCH_VO_USER

Purpose

Storage of temporary or transient data that can be shared with co-workers.

Availability: High

Accessible from login nodes and compute nodes in the local cluster. Not accessible from other VSC clusters.

Capacity: High and expandable

225 GB (soft limit), 250 GB (hard limit, 7 days grace time). Can be expanded upon request.

Perfomance: High

Preferred location for all data files read or written during the execution of a job. Suitable for all workload types.

Reliability: Medium

Data is stored in a redundant file system, but without off-site replication.

Back-ups: short-term

Regular data snapshots allow to recover data lost recently. Daily snapshots are kept for the past 7 days.