A SLURM queue manager is a slightly more unpopular queue scheduler. Unlike a PBS or SGE scheduler, the commands are slightly less straightforward. If this quick guide doesn’t provide enough detail, there is more information available on UNT’s HPC website or the Slurm website.
CPU Run Scripts
Run scripts, or jobfiles, contain all the necessary information to run a job.
An example is the run-R.job
script.
#!/bin/bash
#SBATCH -p public # partition
#SBATCH --qos general # quality of service (priority)
module load R/R-devel
R CMD BATCH clogit.R OUT1.R
The very first line specifies that it is a bash script (feel free to read more
about that on WikiBooks).
The next two lines specify where the job should be run on the cluster.
The module load
line tells the cluster to locate the shared location for the
cluster’s program.
In this case, instead of a local installation of R in an individual’s home
directory, the entire cluster can use the installed R.
Finally, the last line is the command used to run a specific function.
In this case, it’s to use R on the pre-created R script, clogit.R
, and give
the output file as OUT1.R
.
The list of available modules can be checked through the module avail
command.
The next script is a Gaussian run script for Talon3. It follows the same idea as the one for PBS, but with some extra SLURM commands.
#!/bin/bash
#SBATCH -J My_Gauss_Job # name in queue
#SBATCH -o Gauss_job.o%j # output
#SBATCH -e Gauss_job.e%j # error
#SBATCH -C c6320 # contraint -- specific nodes
#SBATCH -p public # partition
#SBATCH -N 1 # Nodes
#SBATCH -n 16 # Tasks per node
#SBATCH --mem-per-cpu=150MB # memory allocation
#SBATCH -t 12:00:00 # Wallclock time (hh:mm:ss)
## Loading Gaussian module
module load gaussian/g16-RevA.03-ax2
## input is the name of your job input without the file extension
## and ext is the file extension
## so this job is for test.gjf
## Old versions would resemble test.com
input=test
ext=gjf
## Define scratch directory
export GAUSS_SCRDIR=/storage/scratch2/$USER/$SLURM_JOB_ID
mkdir -p $GAUSS_SCRDIR
## Copy your current folder to the scratch directory
cp $SLURM_SUBMIT_DIR/$input.$ext $GAUSS_SCRDIR
## Go to the scratch directory to run the calculation
cd $GAUSS_SCRDIR
## Run the program
time g16 < $input.$ext > $input.log
## Bring log file, checkpoint, wavefunction and info files
## back to the place you submitted the job from
cp -r $GAUSS_SCRDIR/$input.log $SLURM_SUBMIT_DIR
cp -r $GAUSS_SCRDIR/$input.chk $SLURM_SUBMIT_DIR
#cp -r $GAUSS_SCRDIR/$input.wfn $SLURM_SUBMIT_DIR
#echo "Job finished at"
#date
exit 0
GPU Run Scripts
The following is an example script to run GPU AMBER jobs on Talon3.
#!/bin/bash
#SBATCH -J WT_protein # name of job in queue
#SBATCH -o WT_protein.o%j # output file (%j appends job name)
#SBATCH -p public # partition
#SBATCH --qos general # quality of service
#SBATCH --ntasks=1 # Number of nodes
#SBATCH --gres=gpu:2 # 2 GPUs
#SBATCH -t 12:00:00 # Wallclock time (hh:mm:ss)
### Loading modules
module load amber/18-cuda-mpi
e=0
f=1
while [ $f -lt 101 ]; do
$AMBERHOME/bin/pmemd.cuda -O -i mdin.4 \
-o WT_protein_system_wat_md$f.out \
-p WT_protein_system_wat.prmtop \
-c WT_protein_system_wat_md$e.rst \
-r WT_protein_system_wat_md$f.rst \
-x WT_protein_system_wat_md$f.mdcrd \
-ref WT_protein_system_wat_md$e.rst
e=$[$e+1]
f=$[$f+1]
done
sbatch
Submitting jobs is done with the sbatch
command.
To submit a jobfile named jobfile
, the command would simply be:
$ sbatch jobfile
The jobfile has a string of information for running the job, and must include
the #!/bin/bash
line at the start.
Dependent submissions i.e. job B needs the output from job A but job A isn’t finished and you want to submit job B right now going to sleep, can be accomplished through something like:
$ sbatch --dependency=afterok:12345 jobfile
where 12345
is the job ID of job A
and jobfile
is the jobfile of job B
.
The job ID is given when job A
is submitted, but it can also be checked in
the queue with squeue.
squeue
If you want to see what jobs you have running or waiting to run (queued jobs),
then use squeue
.
Using squeue
alone will show the jobs status for every single user within the
SLURM manager.
To check the queue for a specific job, then you would need to do something like
$ squeue 1323523
where 1323523
is the job number that was given when the job was submitted.
Alternatively, to show just what you, a specific user, are running, use
squeue -u
.
scancel
Sometimes you scream out in horror when you realize that you shouldn’t have
submitted a job yet, or it’s taking too long and you’d rather just kill it.
On SLURM systems, this can be done with scancel
.
Again, the job number will need to be added, so that it’s practically look like:
$ scancel 1323523
where 1323523
is the job number that was given when the job was submitted.
scontrol
The scontrol
command lets you modify some attributes of a submission without
outright cancelling it.
This can be really helpful if you want to modify the name of the job that
appears in the queue after submitting it.
For example, if you copied a script but wanted to change the replicate number
for the queue, you could use:
$ scontrol 1323523 name=WT.Prot.R2
where 1323523
is the job number that was given when the job was submitted.