Posts tagged hydra
On Wednesday 23 November 2022 around 18:00 we’ll reboot the Hydra login nodes.
We are pleased to announce the immediate availability of data snapshots for the shared scratch storage of Hydra. Data snapshots save the state of your files and folders in the past, allowing to recover lost data from your account.
On Monday October 24th 2022 we’ve reinstalled two key components in the foss/2021b and foss/2022a toolchains to fix two important bugs.
We have a new Globus endpoint to access the storage of Hydra:
VSC VUB Hydra. The old endpoint has been removed.
13:00 The issues with the storage have been solved. Login to Hydra and all other VSC clusters is back to normal. Please check all jobs that were running when the storage issues started for errors, and resubmit if necessary.
We are happy to announce the immediate availability of 2 more GPU nodes in
Hydra adding 4 new Nvidia A100 GPUs to the cluster. These new nodes are
ampere_gpu partition and share the same CPU, memory and GPU
configuration with the other nodes in this partition.
23:00 The issues with the network have been solved and the connection of Hydra to external systems is back to normal.
15:00 All maintenance tasks have been completed ahead of time and the cluster is back online. The job queue is resumed and all users can now log in and submit new jobs.
Since the move of Hydra to the Pleinlaan campus, the synchronisation of quota usage to the VSC accountpage was not working. We have now been able to fix it.
10:00 The issues with the network have been solved and the connection of Hydra to external systems is back to full speed.
14:30 The firewall has been repaired, and the connections between Hydra and the other VSC clusters are again working.
16:00 We have pinned this issue down to a specific configuration of the
login nodes. Today, April 21st, at 21:00 (CEST) we will apply an urgent fix
to the login nodes that will increase the filesystem perfomance of
VSC_DATA back to normal levels. The login nodes will be
rebooted during the process and they will be unavailable for 10 min.
We are happy to announce that the move of Hydra to the new data center of VUB in Etterbeek campus is complete. The cluster is now back open to VSC users.
The start of the operation to move the Hydra cluster to VUB Etterbeek campus is upon us. Hydra will be closed to access from Monday, April 11th at 00:00 (CEST). The move is planned to last 1 week. We will continue to be reachable at email@example.com and we will communicate the progress of the move on this post.
We have published the results of our yearly user survey, conducted in February 2022. Feedback from users’ experiences in Hydra adds valuable insight into their needs, what works well, and what needs further improvement.
We’ve managed to get Hydra back to normal without data loss. The scheduler is running again and the login nodes are open. We’ve requeued all jobs that were running at the time of the power cut.
Hydra will very soon move from ULB Solbosch campus to the recently upgraded VUB data center on the Etterbeek campus. This change will bring improved power supply, cooling and security measures to our HPC cluster. The move is scheduled in the week of April 11-15.
The HPC team has been hard at work in the past month updating and testing many of the major software packages in Hydra. We invite all users to check these updates which include minor revisions of exiting versions with bug fixes and major new releases:
Please take some time (~10 minutes) to fill out the short user survey at https://hpc.vub.be/survey2022
The most recent release of GAMESS-US with version 2021-R2-patch1 is now installed in Hydra.
We’re happy to announce that 6 new GPU nodes have been added to Hydra. They are equipped with the latest generation of NVIDIA GPUs: the Ampere A100.
We are pleased to announce the availability of AlphaFold v2.1.1 in Hydra. This recent release of AlphaFold has been installed for our top of the line Nvidia A100 GPUs with 40 GB of memory.
Warning We are carrying out a routine update of the operating system in Hydra. The nodes will be progressively updated without impacting running jobs, but queue times during the roll out might be longer.
The HPC team is proud to announce one of the biggest improvements for Hydra in recent years and certainly one of the most impactful for its users. Hydra is replacing its job scheduler for the Slurm Workload Manager.
The job scheduler in Hydra will be put on pause on Thursday, 16th September from 20:00 to 22:00. We will perform minor updates to the MPI stack in
intel toolchains. Running jobs will not be affected and users will be able to continue submitting jobs to the queue during the pause.
Login nodes updated successfully. The system update for the rest of the nodes continues as planned.
The last stable version of TensorFlow is now installed in Hydra. You can load TensorFlow v2.5.0 with the following modules:
The recent major release of ORCA version 5.0.0 is now installed in Hydra. You can load it with the module
Warning On Tuesday June 15 at 20:00 the login nodes will be unavailable for 30 minutes.
We have published the results of our yearly user survey, conducted in February 2021. Feedback from users’ experiences in Hydra adds valuable insight into their needs, what works well, and what needs further improvement.
We have improved the recommended usage of the OneDrive client in Hydra. Check the FAQ in the link below.
The maintenance on the storage is done and jobs are again running as normal. We are still busy with (automatically) updating all worker nodes. Until that is done, the queuing time will be slightly longer than normal.
Due to maintenance work, it will not be possible to submit new jobs or view the status of submitted jobs on Saturday 29/8 between 20:00 and 23:00. Running jobs will not be impacted.
Warning Until next Friday (14th of August) IvyBridge nodes will be offline due to the current heatwave. All other nodes are online, including GPU nodes. Since the scope of this mesure is rather limited, the waiting time of jobs in the queue should not be affected.
From 2 to 3 June 2020, works on an electric board of the SISC cooling system will be conducted. The operations will require to stop the cooling system for short periods of time. The computing power of the HPC clusters Hydra and Vega will be reduced to a minimal level to avoid heat problems in the data center. Therefore, waiting time for jobs in the queue will be higher then normal.
Due to the recent wave of cyberattacks on HPC centers throughout Europe, the SISC HPC team will take security hardening measures on Hydra, in consultation with the VSC.
There are a couple of hardware issues with the Hydra storage which we cannot safely fix while the cluster is running. We are going to shut down Hydra on Monday 18 May, starting at 09:00 to perform the needed repairs. We expect that we can resume normal operations by the late afternoon.
The network issues are fixed. Everything is back to normal.
login1 server has to be rebooted to apply critical updates. This will be done on Saturday at 20:00. You can use
login2.hpc.vub.be as an alternative login server during the downtime.
The queue is back to normal operation.