Using PBS Queues

This tutorial is aimed at helping the user or developer to submit jobs, both serial or parallel, to a PBS Queuing system.

Submitting a job

To submit a job you should first place the commands that prepare your run in a script and then submit this script as the job.

Both the script and the commands differ if you plan to run it serial or in parallel using multiple nodes.

Serial

Script

Create some script where you place your shell commands:

#!/bin/bash
# change to the directory where the run should take place
cd $HOME/workspace/prog/exec/
myprog [options]

Command

Then submit the script with the command:

$> qsub -q [queue] [script]

Example: Running COOLFluiD on the VKI cluster

A skeleton of a script to use when running COOLFluiD solver in serial can be found here: source:Documents/Scripts/serial-job.sh

#!/bin/bash
cd $HOME/workspace/COOLFluiD/optim/app/core/
lamwipe 
lamboot
./run-testcase.pl -- --scase testcases/test.CFcase
lamhalt
lamwipe
$> qsub -q long coolsolver-serial.sh

Parallel

Mind that the examples here use MPI for the parallelization. Other environments might differ in their setup.

Script

Create some script where you place your shell commands:

#!/bin/bash

cd $HOME/workspace/run/dir

export MDATE=$(date +%H%M%S%d%m%Y)
export NPROC=$(wc -l < $PBS_NODEFILE | cut -b 7-)
export MACHINEFILE="machine-file-$MDATE"

# setup a machines file for mpi environment
cat  $PBS_NODEFILE > $MACHINEFILE
echo localhost >> $MACHINEFILE

lamwipe $MACHINEFILE
lamboot $MACHINEFILE
mpirun -np $NPROC myprog [options]
lamhalt
lamwipe $MACHINEFILE
rm $MACHINEFILE

In this case, you would need to edit the $HOME/workspace/run/dir directory and the myprog program name and options.

Command

Then submit the script with the command:

$> qsub -q [queue] -lnodes=8:[properties] [script]

Where 8 is the number of CPUs to use and properties defines which nodes you want to use. This can be chosen from the properties field from the list of node characteristics which can be accessed with the pbdnodes -a command. See below for more details.

Example: Running COOLFluiD parallel on the VKI cluster

A skeleton of a script to use when running COOLFluiD solver in parallel can be found here: source:Documents/Scripts/parallel-job.sh

$> qsub -q par_long -lnodes=4:par_long coolsolver-par.sh

Getting information

qstat

The command qstat can give you many informations about the current situation of the queuing system.

  • list all the submitted jobs
    $> qstat
    
  • list all the submitted jobs of a given user
    $> qstat -u username
    
  • list all the queues
    $> qstat -q
    
  • which jobs are waiting?
    $> qstat -i [queue]
    
  • which jobs are running?
    $> qstat -r [queue]
    
  • which jobs are currently submited in a queue?
    $> qstat -a [queue]
    
  • which nodes are being taken by each run?
    $> qstat -n [queue]
    $> qstat -n [job]
    
  • which are the details of the job?
    $> qstat -f [job]
    
  • which are the details of each queue?
    $> qstat -Qf [queue]
    

pbsnodes

The command pbsnodes can give you information about the nodes which are used for the computations.

  • list details of all nodes
    $> pbsnodes -a
    
  • list nodes which are down
    $> pbsnodes -l
    

qdel

The command qdel can be used to delete a job that is either running or waiting to be run.

$> qdel [job]