Job submission

Hydra

Running calculations on Hydra requires submitting a job to the job queue. Hydra uses Moab for scheduling jobs, and TORQUE for managing resources.

See the VSC docs on running jobs for more info.

In Hydra, before a job goes into the queue, a submit filter checks the job script for missing options and errors. The following options are added if not specified by the user:

  • send email when job aborts

  • request 4 GB per core by default

  • assign jobs requesting 1 node and more than 245 GB to the high-memory queue

  • assign jobs requesting 1 or more GPUs to the GPU queue

The job submission will be aborted if the requested resources fulfill any of the following conditions:

  • requested RAM memory per core is less than 1 GB

  • requested RAM memory per node is more than total RAM memory

  • requested number of cores per node is higher than total number of cores of the node

  • requested number of cores is less than requested number of GPUs (must request at least 1 core per GPU)

  • requested job queue does not match requested resources

  • features do not exist, do not match requested resources, or are mutually incompatible

Vega

Running calculations on Vega and the rest of CECI clusters requires submitting a job to the job queue. Vega as the rest of CECI clusters uses Slurm for scheduling/managing jobs.

See the CECI docs Slurm Quick Start Tutorial for more info.

PBS to Slurm cheatsheet

The goal of this section is to help users having experience already with a PBS based resource manager (such as Torque) get up and running with Slurm . For a more detailed introduction on how to run your jobs with any of those resource managers, please follow the corresponding links above to the VSC or CECI documentation.

The tables below contain a quick reference with translations of typical commands and options to request resources in PBS in Slurm that can help you with the migration. After the tables there are some extra remarks to consider in adapting your jobs.

Submitting and monitoring jobs

PBS

Slurm

Comments

qsub job.sh

sbatch job.sh

Submit a job with the batch script job.sh

qsub -I

salloc <resources options>

Starting an interactive job

qdel job_id

scancel job_id

Delete your job

qstat

squeue

Show the status of the jobs queue

qstat -f job_id

scontrol show job job_id

Show details about your scheduled job

Requesting resources

PBS

Slurm

Comments

-N job_name

--job-name=job_name

Set the name of your job

-l walltime=HH:MM:SS

--time=DD-HH:MM:SS

Requested maximum time for your job to run

-l nodes=1:ppn=1

--ntasks=1

Request a single CPU core

-l nodes=X:ppn=Y

--ntasks=X --cpus-per-task=Y

See the explanation in the section below

-l pmem=N

--mem-per-cpu=N

The amount of memory per CPU core in megabytes

-M email@example.com

--mail-user=email@example.com

Email to send job alerts

-m <a|b|e>

--mail-type=<BEGIN|END|FAIL|REQUEUE|ALL>

Condition for email alerts, for slurm choose one or comma separated list

-o out_file

--output out_file

File to write stdout, in slurm if no --error is given it will combine stdout/stderr

-e err_file

--error err_file

File to write stderr, in slurm if no --output is given it will combine stdout/stderr

-j oe

In slurm joining stdout/stderr is achieved by providing just one of above

Variables defined by the resource managers

PBS

Slurm

Comments

$PBS_JOBID

$SLURM_JOB_ID

The Job ID value

$PBS_O_WORKDIR

$SLURM_SUBMIT_DIR

Directory where the job was submitted from

$PBS_NODEFILE

$SLURM_JOB_NODELIST

List of nodes assigned to job

$PBS_JOBNAME

$SLURM_JOB_NAME

The job name

$PBS_ARRAYID

$SLURM_ARRAY_TASK_ID

Job array ID (index) number

$PBS_NUM_PPN

$SLURM_CPUS_PER_TASK

Number of cores per task

CPU cores allocation

Requesting CPU cores in a PBS scheduler is done with the option -l nodes=X:ppn:Y, where it is mandatory to specify the number of nodes even for single core jobs (-l nodes=1:ppn:1). The concept behind the keyword nodes is different between PBS and Slurm though. While PBS nodes do not necessarily represent a single physical server of the cluster, the option --nodes in Slurm is directly linked to each server in the cluster as will be explained below.

In Slurm the only mandatory request for CPU resources is the number of tasks of the job, which is set with the option --ntasks=X (1 by default). Each task gets allocated one CPU core per task. If you don’t specify anything else these tasks can be distributed among any number of different nodes in the cluster.

Applications that are only capable of using multiple processors in a single server or physical computer, usually called shared memory applications (eg. parallelized using OpenMP, Pthreads, Python multiprocessing, etc…), require additional settings to ensure that all the allocated processors reside on the same node. A practical option is requesting a single task with --ntasks=1 and then ask to assign X cores in the same physical server to this task with --cpus-per-task=X.

Parallel applications based on a distributed memory paradigm, such as the ones using MPI, can be execute by just specify the option --ntasks=X where X is the total number of cores you need. CPU cores will be allocated in any fashion among the nodes in the cluster.

If you want to keep some extra control on how the tasks will be distributed in the cluster, it is possible to limit the number of nodes with the option --nodes. For instance, minimizing the amount of nodes assigned to the job can lead to better performance if the interconnect between nodes is not very fast. Imagine a cluster composed of nodes with 24 cores where you want to submit a job using 72 cores, but using precisely 3 full nodes. You can do so by asking for --ntasks=72 and adding the extra option --nodes=3.

If you want to provide some flexibility to the resources allocated to your job, it is also possible to provide a range of values to --nodes, for instance with --nodes=3-5. In such a case, cores will be allocated in any amount of nodes in the range. It could still end up allocated in just 3 full nodes, but also in other possible combination, e.g. two full nodes with their 24 cores plus three other nodes with 8 cores each.

Memory allocation

We highly recommend to specify memory allocation of your job with the Slurm option --mem-per-cpu=X, which sets the memory per core. It is also possible to request the total amount of memory per node of your job with the option --mem=X. However, requesting a proper amount of memory with --mem is not trivial for multi-node jobs in which you want to leave some freedom for node allocation. In any case, these two options are mutually exclusive, so should only use one of them. If you do not define any specific memory request, your job will get a default assignment, which is typically 1 GB per core.

Notice that by default providing just an integer value to this option is taken as the memory in megabytes. But you can specify different units using one of the following one letter suffixes: K, M, G or T. For instance, to request 2 gigabytes per core you can use --mem-per-cpu=2000 or --mem-per-cpu=2G.

Batch scripts

PBS job scripts define the resource manager directives in their preface by using the #PBS keyword. In Slurm the equivalent is the #SBATCH keyword. To illustrate its usage with some of the resource request options discussed before, we provide below a basic job script in both systems requesting a single core, 7 hours of maximum walltime and 3 gigabytes of memory:

Basic single core PBS batch script
1
2
3
4
5
6
7
8
9
#!/bin/sh
#PBS -N myjob
#PBS -l walltime=07:00:00
#PBS -l nodes=1:ppn=1
#PBS -l pmem=3gb

module load somemodule/1.1.1

my_code
Basic single core Slurm batch script
1
2
3
4
5
6
7
8
9
#!/bin/bash
#SBATCH --job-name=myjob
#SBATCH --time=07:00:00
#SBATCH --ntasks=1
#SBATCH --mem-per-cpu=3000

module load somemodule/1.1.1

my_code