2. Job Queues#

2.1. Main job queue#

The Hydra cluster has a single job queue. Jobs are automatically assigned to those partitions in the cluster that have the computational resources needed to fulfil the requirements of the job. For instance, jobs requesting a GPU will automatically queue to compute nodes with a GPU, and jobs requesting more than 1 node will automatically queue on a partition with the fastest network interconnect.

Important

Jobs in Hydra can run for a maximum of 120 hours (5 days)

Jobs submitted from the login nodes of Hydra will be send to job queue of Hydra by default.

2.2. Test/Debug queue#

Jobs for testing or debugging purposes can be quickly carried out on the Anansi cluster. This smaller cluster sister to Hydra is specifically designed for such short tasks and to avoid wait time in queue.

Compared to the much bigger Hydra, the main characteristic of Anansi is that its resources can be shared between multiple jobs. This approach is specially useful for low intensity jobs like testing/debugging jobs that can involve frequent idle periods. Hence, even though the resources of Anansi are relatively limited, users should be able to easily find an available slot to run their short jobs. Moreover, the maximum time a job can run in Anansi is limited to 12 hours, which further increases the availability of its resources.

Important

Jobs in Anansi can run for a maximum of 12 hours

Jobs can be submitted, managed and monitored in Anansi from the login nodes of Hydra. The target cluster of any Slurm command can be changed to Anansi by adding the option -M anansi to the command arguments or as a #SBATCH option in the header of your job script.

Command to check the job queue in Anansi#
$ mysqueue -M anansi
CLUSTER: anansi
     JOBID PARTITION   NAME          USER     STATE  TIME TIME_LIMIT NODES CPUS MIN_MEMORY NODELIST(REASON)
  50000540 pascal_gpu  test.job  vsc10122   RUNNING  0:05      10:00     1    4      3900M node500