Using PBS Queues
This tutorial is aimed at helping the user or developer to submit jobs, both serial or parallel, to a PBS Queuing system.
Submitting a job
To submit a job you should first place the commands that prepare your run in a script and then submit this script as the job.
Both the script and the commands differ if you plan to run it serial or in parallel using multiple nodes.
Serial
Script
Create some script where you place your shell commands:
#!/bin/bash # change to the directory where the run should take place cd $HOME/workspace/prog/exec/ myprog [options]
Command
Then submit the script with the command:
$> qsub -q [queue] [script]
Example: Running COOLFluiD on the VKI cluster
A skeleton of a script to use when running COOLFluiD solver in serial can be found here: source:Documents/Scripts/serial-job.sh
#!/bin/bash cd $HOME/workspace/COOLFluiD/optim/app/core/ lamwipe lamboot ./run-testcase.pl -- --scase testcases/test.CFcase lamhalt lamwipe
$> qsub -q long coolsolver-serial.sh
Parallel
Mind that the examples here use MPI for the parallelization. Other environments might differ in their setup.
Script
Create some script where you place your shell commands:
#!/bin/bash cd $HOME/workspace/run/dir export MDATE=$(date +%H%M%S%d%m%Y) export NPROC=$(wc -l < $PBS_NODEFILE | cut -b 7-) export MACHINEFILE="machine-file-$MDATE" # setup a machines file for mpi environment cat $PBS_NODEFILE > $MACHINEFILE echo localhost >> $MACHINEFILE lamwipe $MACHINEFILE lamboot $MACHINEFILE mpirun -np $NPROC myprog [options] lamhalt lamwipe $MACHINEFILE rm $MACHINEFILE
In this case, you would need to edit the $HOME/workspace/run/dir directory and the myprog program name and options.
Command
Then submit the script with the command:
$> qsub -q [queue] -lnodes=8:[properties] [script]
Where 8 is the number of CPUs to use and properties defines which nodes you want to use. This can be chosen from the properties field from the list of node characteristics which can be accessed with the pbdnodes -a command. See below for more details.
Example: Running COOLFluiD parallel on the VKI cluster
A skeleton of a script to use when running COOLFluiD solver in parallel can be found here: source:Documents/Scripts/parallel-job.sh
$> qsub -q par_long -lnodes=4:par_long coolsolver-par.sh
Getting information
qstat
The command qstat can give you many informations about the current situation of the queuing system.
- list all the submitted jobs
$> qstat
- list all the submitted jobs of a given user
$> qstat -u username
- list all the queues
$> qstat -q
- which jobs are waiting?
$> qstat -i [queue]
- which jobs are running?
$> qstat -r [queue]
- which jobs are currently submited in a queue?
$> qstat -a [queue]
- which nodes are being taken by each run?
$> qstat -n [queue] $> qstat -n [job]
- which are the details of the job?
$> qstat -f [job]
- which are the details of each queue?
$> qstat -Qf [queue]
pbsnodes
The command pbsnodes can give you information about the nodes which are used for the computations.
- list details of all nodes
$> pbsnodes -a
- list nodes which are down
$> pbsnodes -l
qdel
The command qdel can be used to delete a job that is either running or waiting to be run.
$> qdel [job]
