Job scheduler outage in Hydra#

On Tuesday, October 10th at around 17:30 (CEST), there was an outage of the Slurm scheduler in Hydra. Please check the output of your jobs carefully if it was in queue or running between October 10th at 17:00 CEST and October 11th at 09:00 CEST.

We’re working on updating the cluster to a newer OS version. During the preparation of these updates we have accidentally pushed a configuration change to Hydra that caused the Slurm job scheduler to lose track of all running jobs.

The error has now been corrected but as the job scheduler didn’t know anymore which jobs were still running and which already ended, it might happen that your jobs got requeued and are again queued or running.

We apologize for the inconvenience. Please contact VUB-HPC Support if you have any other questions.

11/10/2023

Recent Posts

Archives

Job scheduler outage in Hydra#