A gentle introduction to SGE and qsub

From CLSP Wiki
Jump to: navigation, search

Qsub isn't as scary as it sounds!

qsub is a tool for submitting a batch job to Sun Grid Engine, i.e. our grid at CLSP. This allows you to run several jobs in the background with specific resources allocated. Here, we provide a brief template for how to create, monitor, and delete jobs that you can submit to the grid.

Introduction

To run any compute-intensive (CPU or GPU intensive) tasks, including parallel jobs, use qsub. This is part of Sun GridEngine (SGE).

In this and following sections we briefly demonstrate usage of qsub and other SGE tools on the grid. For more information, see tutorials online, for example:

  1. A general tutorial
  2. the man page

The main difference between our grid and others is that if you want to specify memory resources, you should do so with something like:

 qsub -l mem_free=10G,ram_free=10G my_job.sh

If your jobs use 1G of memory or less you don't need to do this. We do not use special queues, only the queue "all.q" which is the default so you don't need to specify the queue. GridEngine as we have configured it will ignore the hashbang e.g. #!/bin/csh at the top of your script. The default is /bin/bash and if you want to change this you need the -S option, e.g. -S /bin/csh. The normal procedure, though, is to simply have a bash script that calls whatever interpreter you really need (e.g. python or matlab).

The job you submit needs to be a shell script, which can't take command-line arguments; if you want to call Python or Matlab or something, do it from the shell script.

If you want to submit a job with multiple threads, you should use the -pe smp N option, e.g.

qsub -l 'arch=*64*' -pe smp 5 foo.sh

for a job that requires 5 threads (or 5 parallel single-threaded processes.. it's the number of processors that is being allocated here). Note that the ram_free option is now configured to be per-job, so if a 5-thread job takes 5G, you would specify -l mem_free=5G,ram_free=5G.

You can set up dependencies between your jobs using the -hold_jid option to qsub, for example,

qsub -hold_jid 19841,19842 foo.sh

won't run foo.sh until jobs 19841 and 19842 are done. You can limit how many of your jobs run at one time to, say, 10, by giving the option "-tc 10". This can be useful for jobs that run for a very long time or use a lot of I/O.

For jobs that do a lot of I/O, it can be helpful to have them run on the same machine as the data is located, e.g. if your data is on a11 it might be helpful to use the option:

  -l "hostname=a11*"

To tell qsub to only use b01 through b09 or b11 through b13:

  -l "hostname=b0[123456789]*|b1[123]*"

To exclude h01 or y01:

  -l 'hostname=![hy]*'

To use any b machine except b01:

  -l 'hostname=!b01*&b*'

(Note single quotes are needed if you use the exclamation mark.)

Usually when things are slow it's because some disk that you're using is slow, usually because someone is running jobs that cause excessive I/O to either the disk that you're using, or the home-directory server. Logging into whichever nfs server it is (e.g. a12 for /export/a12) and running sudo iftop, and correlating the machines that have a lot of network traffic, with the output of qstat, can sometimes reveal who is the culprit.

If something you need is not installed, you can ask clsphelp@clsp.jhu.edu. For things that are Debian packages and present no obvious security threat, we will typically install it right away (the same day)-- for example, if you need nano and the package is not installed.

Example usage

There are two ways to submit jobs through qsub. You can either create a bash file with the qsub parameters, or input those parameters directly into the command line.

Say we have a python file, hello.py, that we want to submit to qsub with parameters.

Useful flags

There are various flags you can add to qsub to control how your job is run. Some of these commands are optional. You should probably have an output log page, error log page, and memory requirements.

  • -o PATH-TO-LOG: path to file where standard output is logged
  • -e PATH-TO-ERROR-LOG: path to file where standard error is logged
  • -M EMAIL-ADDRESS: send an email once job is done
  • -l ARGS specifies additional arguments.
    • mem_free=#G Memory being requested
    • ram_free=#G RAM being requested
    • gpu=# Number of GPUs reserved; only necessary if you want to use GPUs (if so, should probably be 1).
This page has a lot more information on GPU jobs on the grid; be sure to read it before submitting GPU jobs!
    • h=PATTERN Hosts specified, e.g. h=b* requests only the b hosts. These don't always follow standard regex format, but there are some examples above, and further information can be found by searching for specifics online, e.g. here and here.
  • -pe smp 4: tells qsub to use 4 threads for each job


Some more:

  • -cwd: runs from the directory where you ran your qsub job
  • -V: exports your environment variables (except LD_LIBRARY_PATH)
  • -S /bin/bash: ensures you're using bash (you can change this to be python if you want to write your script in python)
  • -j y: combine stdout and stderr into one log file (otherwise two log files per job)
  • -t 1-1000: run array job with 1000 tasks labeled 1 through 1000 (considered one job, useful for getting around job limit; the limit is 5000 tasks, so if you need more than that, then you need to run another array job)
    • -tc 10: run at most 10 tasks from array job at once

Submitting a job via bash file

Add the flags with #$, and the rest of the script (everything without the # sign) is the command I am running

#!/bin/bash
#$ -cwd
#$ -j y -o <PATH-TO-LOG>
#$ -e <PATH-TO-ERROR-LOG>
#$ -m eas
#$ -M <EMAIL>
#$ -l ARG_1=VAL_1,ARG_2=VAL_2, ...
#$ -pe smp 4
#$ -V

python hello.py

Submitting a job via command line

To submit a bash script via command line, you can add each of the above flags as command line options.

To duplicate the above bash file as a command line script, first make a super simple bash script:

# This is a script named hello.sh
python /path/to/hello.py

And then run it with qsub:

qsub -cwd -j y -o <PATH-TO-LOG> -e <PATH-TO-ERROR-LOG> -m eas -M <EMAIL> -l ARG_1=VAL_1,ARG_2=VAL_2, ... -pe smp 4 -V hello.sh

Simple submission script

For a simple Python script that will submit your jobs to qsub via the command line, see this GitHub Gist.

Monitoring a job

To see the jobs you're running, run

qstat -u $USER

and you should see output that looks something like this

job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
 997223 0.51102 baseline_s pxia         r     08/31/2017 17:33:05 all.q@b12.clsp.jhu.edu             1
  1. job-ID: This is the ID of the job.
  2. prior: priority, you don't have much control over this, so don't worry about it
  3. name: name of the job. To see the extended name of the job, run add the -r flag, e.g. run qsub -u USERNAME -r . This displays a lot of information.
  4. user: The username of the person running the job
  5. state: The important ones are listed in this table, while a more complete table is here:
State Meaning Action
r Running Okay
qw Waiting Waiting to enter queue. Either the disks are full or you reached your quota.
Eqw Error Flags might be wrong (e.g. path for log files might not exist)
  1. Submit/start at: Time the job began in its current state. Sometimes the clock is not in sync with real time.
  2. Queue: Which host (machine) it is running on, which queue (gpu/all) it is waiting in.
  3. slots: The number of cores you are using, as reserved by flag -pe smp <number of cores>
  4. ja task-ID: tasks in an array job

TODO: explain how to use qacct to learn about dead jobs

There is also a simple web interface at http://cs.jhu.edu/~winston/qstat.php

Deleting a job

Sometimes you want to delete a job. This is done with the qdel command, e.g. qdel 997223 would delete the job in this example.

If you want to delete all your jobs, you can type qdel -u USER . Caution: this deletes ALL your jobs.

Finding the output of your job

Standard output from your job will be placed in <PATH-TO-LOG>, while standard error will be at <PATH-TO-ERROR-LOG>. These logs are updated in a buffered manner, so even if it looks like nothing is bring printed, the program is probably still running. If you do not specify -o and -e, then the output and error files will be located in your home directory in something like run.sh.e328419.1.

Your program can also read, write, create, or delete additional files. You should be mindful of where your qsub script is being run from. If you use -cwd, the paths will be relative to the current directory. Otherwise, it is recommended to use absolute paths in your program.

Submitting interactive jobs

If you want to run resource-intensive commands interactively on a grid node, use qrsh or qlogin instead of qsub. (If just compiling and debugging code, you can ssh directly to a grid node---you only need to use qrsh and qlogin for heavier workloads.) The following two commands submit interactive jobs reserving two CPU cores and four gigabytes of memory for one hour. The difference is the shell created by qrsh will be more like the shell used to run non-interactive jobs, while the shell created by qlogin will be more like the shell you see on the test nodes (when running commands on them interactively).

qrsh -l num_proc=2,mem_free=4G,ram_free=4G,h_rt=1:00:00
qlogin -l num_proc=2,mem_free=4G,ram_free=4G,h_rt=1:00:00

These commands will return if they can't immediately reserve the requested resources. If you want them to wait until the resources can be reserved, add -now n to the arguments:

qrsh -now n -l num_proc=2,mem_free=4G,ram_free=4G,h_rt=1:00:00

Another example

This walks through another example.

Utilities

  • clspmenu - provides htop, iftop, iotop, wall, email to clsphelp, qstat grepping, log viewing...
  • q.who - displays users and their number of jobs (q.who vv breaks down grid nodes and statuses)
  • sgestat - displays users and their number of jobs with a total