Migration of Hydra to Slurm#

The HPC team is proud to announce one of the biggest improvements for Hydra in recent years and certainly one of the most impactful for its users. Hydra is replacing its job scheduler for the Slurm Workload Manager.

The management of computational resources and the job scheduling will be completely overhauled with this change. Slurm provides a modern environment to manage your jobs in the cluster that is becoming the standard de facto in the HPC systems around the globe. It is more stable, has better performance and more features.

Users in our clusters will notice that the queue is more responsive. Submitting hundreds or thousands of jobs is handled without delay and those jobs that have available resources to start will do so immediately.

Implications for the users#

The command line tools provided by Slurm are different to those currently being used in Hydra (i.e. qsub, qstat, etc.). Therefore, we encourage all users to adapt their workflows and job scripts to this new toolbox with the sbatch and srun commands at its core. We have added a specific page to our documentation to help you migrate from Torque/Moab to Slurm, which includes an extensive table with direct translations between the old commands and options to the new ones from Slurm.

Users who are not yet ready to jump to the Slurm world have the option to use a compatibility layer that will automatically translate and submit your current job scripts for Torque/Moab to Slurm. This compatibility layer is only meant to be used for the simpler job scripts as it does not cover all possible combinations of options, but it should work in most cases.

Timeline of the migration#

You can try out Slurm in Hydra right now! The new system is already in place and running with a subset of nodes in the cluster. The existing queue with the classic Torque/Moab will be open until October 24th, 2021, after that date new jobs will only be accepted on Slurm. Jobs already submitted in Torque/Moab will continue to work until November 8th, 2021.

  • October 4th, 2021: new Slurm scheduler starts accepting jobs

  • October 25th, 2021: old Torque/Moab scheduler stops accepting new jobs

  • November 8th, 2021: old Torque/Moab scheduler shuts down

During this month, we plan to dynamically move nodes from the current scheduler to Slurm following the demand from users and how fast they switch from one to the other. On October 25th we’ll stop accepting new jobs under Torque, and on November 8th 2021 all jobs in Hydra will be running under Slurm.

User support#

We have prepared new documentation for Slurm that will help you to quickly get up and running in the new environment. It introduces the main new tools and settings and provides a handy translation table to convert your current workflow to Slurm. Furthermore, you can find example job scripts for the main job types in Slurm, which are good starting points to create your new jobs.

We will also organize remote Q&A sessions through MS Teams every Friday during this month of October. This will allow you to get more direct support from the HPC team. Please check the event page for our Slurm Q&A sessions for more information.

If you have any questions or comments that need immediate attention, please contact us at VUB-HPC Support.