HPC Data Storage#

Hydra provides multiple storage systems for user data. Each storage system is accessible from its own directory in the cluster. Those entry points are just regular folders, such as the path in the $VSC_HOME environment variable, which points to the folder containing your home directory.

There are 3 main types of storage in our HPC clusters:

Home: the user’s home directory
Data: storage of datasets for users and virtual organizations
Scratch: fast storage to run jobs of users and virtual organizations

Each of your storage partitions in the cluster can be easily accessed through their corresponding $VSC_ environment variables. Please avoid using full directory paths whenever possible, as those might not work in all compute nodes or might stop working in the future.

The disk quota on Home, Data, and Scratch are fixed. If you need more disk space, we highly recommend to join the Virtual Organization (VO) of your group. If your group does not have a VO yet, please ask your group leader to create a VO.

Our storage has protection mechanisms to recover lost files. We take regular snapshots of the data in all shared storage partitions at regular time intervals and keep them for at least 7 days. Some partitions such as your Home or Data have longer historical snapshots going back to several months. The blocks below show the exact details on the available snapshots of each storage system and the FAQ “I have accidentally deleted some data, how can I restore it?” explains how to recover lost files.

Warning

Long-term archival of data in our HPC clusters is forbidden. The storage provided in the cluster is meant to be used for data needed by or generated by computational jobs in active research projects.

The HPC team is not responsible for any data loss, although we will do everything we can to avoid it. Users should regularly backup important data outside of the HPC cluster and regularly clean up their folders in the HPC.

User storage#

Home

Location: $VSC_HOME, $HOME, or ~
Purpose: Storage of configuration files and user preferences.
Availability: Very high: Accessible from login nodes, compute nodes and from all clusters in VSC.
Capacity: Low: 5.7 GB (soft limit), 6 GB (hard limit, 7 days grace time)
Perfomance: Low: Jobs must never use files in your home directory.
Reliability: Very High: Data is stored in a redundant file system, with data replication off-site.
Back-ups: short and mid-term: Regular data snapshots allow to recover data at multiple points in time. Daily snapshots are kept for the past 7 days and weekly snapshots for the past 3 months.

Data

Location: $VSC_DATA
Purpose: Storage of datasets or resulting data that must be stored in the cluster to carry out further computational jobs.
Availability: Very high: Accessible from login nodes, compute nodes and from all clusters in VSC.
Capacity: Medium: 47.50 GB (soft limit), 50 GB (hard limit, 7 days grace time)
Perfomance: Low: Jobs must always copy any data needed from $VSC_DATA to the scratch before the run and save any results from scratch into $VSC_DATA after the run.
Reliability: Very High: Data is stored in a redundant file system, with data replication off-site.
Back-ups: short and mid-term: Regular data snapshots allow to recover data at multiple points in time. Daily snapshots are kept for the past 7 days and weekly snapshots for the past 3 months.

Scratch

Location: $VSC_SCRATCH, $VSC_SCRATCH_SITE
Purpose: Storage of temporary or transient data.
Availability: High: Accessible from login nodes and compute nodes in the local cluster. Not accessible from other VSC clusters.
Capacity: High: 95 GB (soft limit), 100 GB (hard limit, 7 days grace time)
Perfomance: High: Preferred location for all data files read or written during the execution of a job. Suitable for all workload types.
Reliability: Medium: Data is stored in a redundant filesystem, but without off-site replication.
Back-ups: short-term: Regular data snapshots allow to recover data lost recently. Daily snapshots are kept for the past 7 days.

Node local scratch

Location: $VSC_SCRATCH_NODE, $TMPDIR
Purpose: Storage of temporary or transient data.
Availability: Low: Only accessible from the compute node running the job.
Capacity: Variable: Maximum data usage depends on VSCdoclocal disk space. Note that the available disk space is shared among all jobs running in the node.
Perfomance: High: Might be beneficial for special workloads that require lots of random I/O. Users should always confirm the need for a node local scratch through benchmarking.
Reliability: Low: Data is stored in a non-redundant filesystem without replication.
Back-ups: None: All data is automatically deleted from the compute nodes once your job has ended.

More technical details are available at VSCdocVUB storage

Virtual Organization storage#

VO Data

Location: $VSC_DATA_VO, $VSC_DATA_VO_USER
Purpose: Storage of datasets or resulting data that must be stored in the cluster to carry out further computational jobs, and that can be shared with co-workers.
Availability: Very high: Accessible from login nodes, compute nodes and from all clusters in VSC.
Capacity: High and expandable: By default, 112.5 GB (soft limit), 125.0 GB (hard limit, 7 days grace time). Can be expanded upon request.
Perfomance: Low: Jobs must always copy any data needed from $VSC_DATA_VO to the scratch before the run and save any results from scratch into $VSC_DATA_VO after the run.
Reliability: Very High: Data is stored in a redundant file system with data replication off-site.
Back-ups: short and mid-term: Regular data snapshots allow to recover data at multiple points in time. Daily snapshots are kept for the past 7 days and weekly snapshots for the past 3 months.

VO Scratch

Location: $VSC_SCRATCH_VO, $VSC_SCRATCH_VO_USER
Purpose: Storage of temporary or transient data that can be shared with co-workers.
Availability: High: Accessible from login nodes and compute nodes in the local cluster. Not accessible from other VSC clusters.
Capacity: High and expandable: 225 GB (soft limit), 250 GB (hard limit, 7 days grace time). Can be expanded upon request.
Perfomance: High: Preferred location for all data files read or written during the execution of a job. Suitable for all workload types.
Reliability: Medium: Data is stored in a redundant file system, but without off-site replication.
Back-ups: short-term: Regular data snapshots allow to recover data lost recently. Daily snapshots are kept for the past 7 days.