HPC Data Recovery#
We keep regular data snapshots for all our shared storage systems. This
includes the storage of VSC_HOME, VSC_DATA and VSC_SCRATCH in your
account as well as VSC_DATA_VO and VSC_SCRATCH_VO in your Virtual Organization.
See also
The documentation on HPC Data Storage has a detailed description of each storage in our clusters.
Storage snapshots keep the state of files and directories at fixed moments in time. We take regular snapshots at varying time intervals depending on their age. The older the snapshots, the less frequent they are. This means that if you accidentally delete or make changes to files in the cluster, you are more likely to recover that data in the same state it was lost the sooner you act.
Snapshots age |
Snapshots frequency |
Maximum risk of data loss |
|---|---|---|
Less than 3 days |
Hourly |
1 hour |
Between 3 and 7 days |
Daily |
1 day |
Between 1 and 4 weeks |
Weekly |
1 week |
More than 4 weeks |
Monthly |
1 month |
Restoring old files/folders#
Restoring a file or directory from one of the snapshots is relatively easy with the command tool mysnapshots.
First of all you have to change directory cd to the directory that
contained (or still contains) the file/directory to be restored. If that parent
directory does not exist anymore, you can jump up to the last parent directory that still
exists. In the worse case where a lot of data is lost, this location might be
the top directory of $VSC_DATA or $VSC_SCRATCH for instance.
$ cd $VSC_DATA/labo
$ ls -l
total 8
-rw-rw-r-- 1 vsc10930 vsc10930 2261 5 nov 2024 job.sh
-rw-rw-r-- 1 vsc10930 vsc10930 907 5 nov 2024 results_dir
The command mysnapshots always works on the snapshots of the current working directory. You can use it without arguments to list the available snapshots for that directory ordered by date/time.
$ mysnapshots
Snapshots of $VSC_DATA/labo:
2025-05-08T17:02:44 2025-05-08T22:02:08 2025-05-09T08:01:56 2025-05-09T13:02:02
The output of the previous command shows the timestamps of the four snapshots
with data for the working directory. Now if we want to know which
files/directories are available in a specific snapshot, we can execute
mysnapshots with the timestamp of that snapshot in the given
YYYY-MM-DDTHH:mm:ss format as argument:
$ mysnapshots 2025-05-09T13:02:02
Inventory of $VSC_DATA/labo at 2025-05-09T13:02:02:
job.sh results_dir old_data
In this case mysnapshots shows that for the snapshot 2025-05-09T13:02:02 there are three entries available. The entries job.sh and results_dir are already present in the current directory, while the entry old_data is not.
You can restore any of these entries to their previous state at 2025-05-09T13:02:02 by calling mysnapshots with the timestamps of the snapshot and the path to this item as argument. For instance, we can recover the missing entry old_data from this snapshot:
$ mysnapshots 2025-05-09T13:02:02 old_data
Restoring file '$VSC_HOME/labo/old_data' from snapshot at 2025-05-09T13:02:02
Restoration complete
$ ls -l
total 12
-rw-rw-r-- 1 vsc10930 vsc10930 2261 5 nov 2024 job.sh
-rw-rw-r-- 1 vsc10930 vsc10930 907 5 nov 2024 results_dir
-rw-rw-r-- 1 vsc10930 vsc10930 97 9 mei 2025 old_data
Warning
By default, mysnapshots will refuse to overwrite existing
files/directories in your working directory. Use the -f option to force
the restoration of existing items. Beware that restoring an already existing
directory with the -f option will also restore all files and folders
within it as they were at that point in time. More recent files will be
removed and all updates since then will be lost.