Posts tagged hydra
Hydra OS upgrade
- 25/06/2024
On Wednesday, June 26, at 06:00, we will upgrade the login nodes to Rocky Linux 8.10. During this upgrade, Hydra will be unavailable for approximately 30 minutes, but running jobs will not be impacted. At 08:00, we will also upgrade the Hydra gateway. During this time, internet access in Hydra will be unavailable; however, the login nodes will remain accessible and running jobs will not be impacted unless they require internet access.
Removal of broadwell nodes from Hydra
- 12/06/2024
The broadwell partition will be removed from Hydra on Sunday 16 of June at midnight. Only jobs that can finish before that date will be able to run on the broadwell partition.
Hydra Overview 2023
- 31/05/2024
Year 2023 of the SDC team was marked by the award of the new Flemish Tier-1 supercomputer, which will be managed by us and hosted by VUB in the Green Energy Park. Our commitment with the rest of our services stays unchanged and now it is time to look at the state of the Hydra cluster, VUB’s Tier-2 HPC infrastructure. 2023 was an eventful year for Hydra and we have collected some statistics on the usage of the cluster by our users.
Hydra system upgrade with brand new nodes
- 23/04/2024
We are very pleased to announce the ready availability of 20 new nodes in Hydra with last-generation CPUs and a larger memory and cache per core than usual. Alongside the addition of these new nodes, we are also upgrading the operating system of rest of the cluster, which will require a planned reboot of the login nodes on Friday 26/4 at 06:00 (CEST) rendering the cluster inaccessible for a maximum of 15 minutes. Queued and running jobs will not be disrupted.
Planned internet interruptions on 6 April 2024
- 21/03/2024
As announced on WeAreVUB Staff, on Saturday April 6th, the VUB network team will perform network updates on the VUB campus, which will impact any internet connection between VUB and the outside world.
Softening environment of Julia modules
- 18/03/2024
We improved how we handle installations of Julia and Julia packages in Hydra. From now on, software modules in the cluster providing a base installation of Julia or any Julia package will not lock the working environment in any way upon load. This translates to the following features:
Hydra scratch flash storage upgrade part 2 on 10 January 2024
- 09/01/2024
On 10 January 2024 between 9:00-15:00, we will finalize the flash storage extension of Hydra’s scratch filesystem. During the upgrade, we will stop the scheduler to avoid putting load on the storage, and the performance of the scratch will be decreased. Running jobs will continue to run, and jobs in queue will remain queued.
Organization of Python, Perl and R packages in bundles
- 22/12/2023
The organization of Python, Perl and R packages in Hydra is changing with the introduction of the 2023a toolchains. The base software module for each of these languages now provides a basic installation with the core features of the language plus any tools needed to build and install your own software on top of it. Extra packages are delivered in bundle modules or with their own specific modules.
Hydra scratch flash storage upgrade on 19 December 2023
- 18/12/2023
The scheduler is again running, jobs are starting again. Unfortunately, we’ve not been able to complete the flash storage upgrade as some necessary hardware components are missing. We’ll announce another upgrade moment once all hardware is available.
Hydra upgrade 27-28 November finished
- 28/11/2023
Hydra has gained a major system upgrade, a brand new scratch storage, extra compute nodes, and a new GPU cluster for interactive workflows.
Big upgrade for Hydra on 27-28 November 2023
- 27/10/2023
We are happy to announce a big upgrade to our beloved tier-2 HPC cluster in VUB. On Monday, November 27th 2023 at 03:00 (CET), Hydra will be shut down for 48 hours to apply a major system upgrade, renew the scratch storage and add some extra compute nodes.
Job scheduler outage in Hydra
- 11/10/2023
On Tuesday, October 10th at around 17:30 (CEST), there was an outage of the Slurm scheduler in Hydra. Please check the output of your jobs carefully if it was in queue or running between October 10th at 17:00 CEST and October 11th at 09:00 CEST.
License server upgrade
- 21/08/2023
On Wednesday 23 August 2023 at 09:00 the license server for licensed software in Hydra will be down for upgrades. The operation is expected to take ~30 minutes. Affected software: MATLAB, Gurobi, Mathematica, Q-Chem, QuantumATK.
Reboot of Hydra login nodes
- 26/07/2023
The login nodes of VUB’s Tier-2 cluster Hydra will be rebooted today (July 26th, 2023) at 22:00 CEST.
Network outage at VUB main campus
- 14/07/2023
16:30 The network outage in VUB main campus is resolved. Access to both the Hydra HPC cluster and the Pixiu object storage has been reestablished.
Legacy software change
- 27/03/2023
On 27 March 2023, we’ve made a change in Hydra impacting only very old software modules.
Extra Ampere GPUs in Hydra
- 17/03/2023
We are happy to announce the immediate availability of 2 more GPU nodes in
Hydra adding 4 new Nvidia A100 GPUs to the cluster. These new nodes are
joining the ampere_gpu
partition and share the same CPU, memory and GPU
configuration with the other nodes in this partition.
New command prompt in Hydra
- 23/01/2023
We have updated the default command prompt in Hydra with a more informative and better looking one.
Hydra login nodes reboot
- 21/11/2022
On Wednesday 23 November 2022 around 18:00 we’ll reboot the Hydra login nodes.
Data snapshots of the scratch in Hydra
- 02/11/2022
We are pleased to announce the immediate availability of data snapshots for the shared scratch storage of Hydra. Data snapshots save the state of your files and folders in the past, allowing to recover lost data from your account.
Bug fixes for GCC and OpenBLAS
- 24/10/2022
On Monday October 24th 2022 we’ve reinstalled two key components in the foss/2021b and foss/2022a toolchains to fix two important bugs.
New Globus endpoint for Hydra
- 22/08/2022
We have a new Globus endpoint to access the storage of Hydra:
VSC VUB Hydra
. The old endpoint has been removed.
Hydra storage down
- 11/08/2022
13:00 The issues with the storage have been solved. Login to Hydra and all other VSC clusters is back to normal. Please check all jobs that were running when the storage issues started for errors, and resubmit if necessary.
More Ampere GPUs in Hydra
- 05/08/2022
We are happy to announce the immediate availability of 2 more GPU nodes in
Hydra adding 4 new Nvidia A100 GPUs to the cluster. These new nodes are
joining the ampere_gpu
partition and share the same CPU, memory and GPU
configuration with the other nodes in this partition.
External network connection cut
- 30/06/2022
23:00 The issues with the network have been solved and the connection of Hydra to external systems is back to normal.
Scheduled maintenance for Hydra in July
- 28/06/2022
15:00 All maintenance tasks have been completed ahead of time and the cluster is back online. The job queue is resumed and all users can now log in and submit new jobs.
Quota data synchronisation is now working again
- 03/06/2022
Since the move of Hydra to the Pleinlaan campus, the synchronisation of quota usage to the VSC accountpage was not working. We have now been able to fix it.
Slow network connection to external systems
- 25/05/2022
10:00 The issues with the network have been solved and the connection of Hydra to external systems is back to full speed.
Hydra-VSC network failure
- 05/05/2022
14:30 The firewall has been repaired, and the connections between Hydra and the other VSC clusters are again working.
Irregular low performance of Hydra’s new storage
- 21/04/2022
16:00 We have pinned this issue down to a specific configuration of the
login nodes. Today, April 21st, at 21:00 (CEST) we will apply an urgent fix
to the login nodes that will increase the filesystem perfomance of
VSC_HOME
and VSC_DATA
back to normal levels. The login nodes will be
rebooted during the process and they will be unavailable for 10 min.
Hydra move to VUB Etterbeek campus completed
- 19/04/2022
We are happy to announce that the move of Hydra to the new data center of VUB in Etterbeek campus is complete. The cluster is now back open to VSC users.
Kickoff of operation to move Hydra into campus
- 09/04/2022
The start of the operation to move the Hydra cluster to VUB Etterbeek campus is upon us. Hydra will be closed to access from Monday, April 11th at 00:00 (CEST). The move is planned to last 1 week. We will continue to be reachable at hpc@vub.be and we will communicate the progress of the move on this post.
User survey 2022
- 28/03/2022
We have published the results of our yearly user survey, conducted in February 2022. Feedback from users’ experiences in Hydra adds valuable insight into their needs, what works well, and what needs further improvement.
Hydra power cut
- 20/03/2022
We’ve managed to get Hydra back to normal without data loss. The scheduler is running again and the login nodes are open. We’ve requeued all jobs that were running at the time of the power cut.
Hydra moves to VUB Etterbeek campus
- 17/03/2022
Hydra will very soon move from ULB Solbosch campus to the recently upgraded VUB data center on the Etterbeek campus. This change will bring improved power supply, cooling and security measures to our HPC cluster. The move is scheduled in the week of April 11-15.
Multiple software packages updated in Hydra
- 03/03/2022
The HPC team has been hard at work in the past month updating and testing many of the major software packages in Hydra. We invite all users to check these updates which include minor revisions of exiting versions with bug fixes and major new releases:
VUB-HPC user survey 2022
- 01/02/2022
Please take some time (~10 minutes) to fill out the short user survey at https://hpc.vub.be/survey2022
GAMESS-US v2021-R2 available in Hydra
- 27/01/2022
The most recent release of GAMESS-US with version 2021-R2-patch1 is now installed in Hydra.
New GPU nodes in Hydra
- 16/12/2021
We’re happy to announce that 6 new GPU nodes have been added to Hydra. They are equipped with the latest generation of NVIDIA GPUs: the Ampere A100.
AlphaFold 2.1.1 available in Hydra
- 15/12/2021
We are pleased to announce the availability of AlphaFold v2.1.1 in Hydra. This recent release of AlphaFold has been installed for our top of the line Nvidia A100 GPUs with 40 GB of memory.
System update in Hydra
- 10/12/2021
Warning We are carrying out a routine update of the operating system in Hydra. The nodes will be progressively updated without impacting running jobs, but queue times during the roll out might be longer.
Migration of Hydra to Slurm
- 04/10/2021
The HPC team is proud to announce one of the biggest improvements for Hydra in recent years and certainly one of the most impactful for its users. Hydra is replacing its job scheduler for the Slurm Workload Manager.
Update of the MPI stack in Hydra
- 15/09/2021
The job scheduler in Hydra will be put on pause on Thursday, 16th September from 20:00 to 22:00. We will perform minor updates to the MPI stack in foss
and intel
toolchains. Running jobs will not be affected and users will be able to continue submitting jobs to the queue during the pause.
Urgent system update in Hydra
- 22/07/2021
Login nodes updated successfully. The system update for the rest of the nodes continues as planned.
TensorFlow 2.5.0 is available in Hydra
- 19/07/2021
The last stable version of TensorFlow is now installed in Hydra. You can load TensorFlow v2.5.0 with the following modules:
ORCA 5.0.0 is available in Hydra
- 07/07/2021
The recent major release of ORCA version 5.0.0 is now installed in Hydra. You can load it with the module ORCA/5.0.0-gompi-2021a
.
Rolling system updates
- 14/06/2021
Warning On Tuesday June 15 at 20:00 the login nodes will be unavailable for 30 minutes.
User survey 2021
- 28/04/2021
We have published the results of our yearly user survey, conducted in February 2021. Feedback from users’ experiences in Hydra adds valuable insight into their needs, what works well, and what needs further improvement.
OneDrive Client in Hydra
- 26/04/2021
We have improved the recommended usage of the OneDrive client in Hydra. Check the FAQ in the link below.
Hydra storage maintenance scheduled on 26/11
- 25/11/2020
The maintenance on the storage is done and jobs are again running as normal. We are still busy with (automatically) updating all worker nodes. Until that is done, the queuing time will be slightly longer than normal.
Hydra maintenance scheduled on 29/08
- 27/08/2020
Due to maintenance work, it will not be possible to submit new jobs or view the status of submitted jobs on Saturday 29/8 between 20:00 and 23:00. Running jobs will not be impacted.
Preventive measures for second heatwave
- 03/08/2020
Warning Until next Friday (14th of August) IvyBridge nodes will be offline due to the current heatwave. All other nodes are online, including GPU nodes. Since the scope of this mesure is rather limited, the waiting time of jobs in the queue should not be affected.
Electricity works on the cooling system
- 02/06/2020
From 2 to 3 June 2020, works on an electric board of the SISC cooling system will be conducted. The operations will require to stop the cooling system for short periods of time. The computing power of the HPC clusters Hydra and Vega will be reduced to a minimal level to avoid heat problems in the data center. Therefore, waiting time for jobs in the queue will be higher then normal.
Hydra security hardening
- 27/05/2020
Due to the recent wave of cyberattacks on HPC centers throughout Europe, the SISC HPC team will take security hardening measures on Hydra, in consultation with the VSC.
Hydra storage maintenance scheduled on 18/05
- 11/05/2020
There are a couple of hardware issues with the Hydra storage which we cannot safely fix while the cluster is running. We are going to shut down Hydra on Monday 18 May, starting at 09:00 to perform the needed repairs. We expect that we can resume normal operations by the late afternoon.
Hydra network maintenance scheduled on 20/04
- 16/04/2020
The network issues are fixed. Everything is back to normal.
Urgent reboot of Hydra login server
- 02/04/2020
The login1
server has to be rebooted to apply critical updates. This will be done on Saturday at 20:00. You can use login2.hpc.vub.be
as an alternative login server during the downtime.
Fixed node crashes caused by MPI jobs
- 17/02/2020
We fixed an issue where a MPI job could crash an entire node.