Hydra system upgrade with brand new nodes#

../../../_images/hpc-nodes-zen4.jpg

We are very pleased to announce the ready availability of 20 new nodes in Hydra with last-generation CPUs and a larger memory and cache per core than usual. Alongside the addition of these new nodes, we are also upgrading the operating system of rest of the cluster, which will require a planned reboot of the login nodes on Friday 26/4 at 06:00 (CEST) rendering the cluster inaccessible for a maximum of 15 minutes. Queued and running jobs will not be disrupted.

New AMD Genoa-X nodes#

Each of the 20 new nodes in Hydra has 2x AMD Genoa-X (zen4) CPUs: with 64 cores per node that can reach 3.9GHz, plus a very large L3-cache of 1.5GB and a total system memory of 384GB (~6GB/core). All combined with a 25 Gbps network connection. Our tests show that these nodes are a lot faster than any other non-GPU node in Hydra (easily reaching speedups by a factor of 2x-5x) and arguably the fastest non-GPU nodes in the entire VSC at the time of this writing. VSC users can use these new nodes right now by submitting jobs to the zen4 partition of Hydra (Slurm option --partition zen4) and starting from next Friday (April 26th) all jobs with compatible resources will run on them by default.

Cluster system upgrade#

The system upgrade of the cluster is a regular update to keep its high performance and security standards. This operation will have a minor impact on our users, the worker nodes will be upgraded gradually without downtime and the reboot of the login nodes is expected to last for a maximum of 15 minutes. Logged in users on the terminal or on the notebook platform will be disconnected from the system prior to the reboot and it will not be possible to connect to Hydra in any way during this short operation.

The upgrade of the worker nodes will be carried out on a rolling basis: we will put them offline in small batches, upgrade them and bring them back online gradually. There will be no impact on queued or running jobs, but the available capacity of the partitions in the cluster will be reduced momentarily, which may lead to longer queueing times. This operation will also start on Friday morning (April 26th).

Legacy software in jobs#

Finally, the automatic loading in jobs of the legacy software stack with pre-2022a modules is now disabled in Hydra. The vast majority of users already moved to modern software modules since the major system upgrade 6 months ago and moreover, the legacy software is not present on the new zen4 nodes.

Nonetheless, the legacy software will continue to be available (although not supported) and users needing old software can manually load the legacy-software module in their jobs. For more information see the section about How can I find legacy software?. If you cannot find any recent software module that provides the applications needed in your jobs, please contact us at VUB-HPC Support and we will install them for you.

Feel free to contact us at VUB-HPC Support with any comments or questions about this announcement.