Submitting Batch Jobs

NOTE:  There are some changes to this document for Grid 2.5.  Please see below in context.  Please contact RCS with any questions.

The main way to run jobs on the Grid is by submitting a script with the bsub command. The command to submit a job is as simple as:

bsub < runscript.sh

OR

bsub -q normal -W 6:00 -R "rusage[mem=4000]" -M 4000 < runscript.sh

Grid 2.5 changes:

bsub -q short -W 6:00 -R "rusage[mem=4000]" -M 4000 < runscript.sh

OR

bsub -q normal -W 6:00 -R "rusage[mem=4000]" -M 4000 cp myfile /to/some/path

Grid 2.5 changes:

bsub -q short -W 6:00 -R "rusage[mem=4000]" -M 4000 cp myfile /to/some/path

In the first example, the commands specified in the runscript.sh file will then be run on the first available compute node that fits the resources requested in the script. In the second example, the commands specified in the runscript.sh file will then be run on the first available compute node that fits the resources requested on the command line, which supersedes anything inside the script file. And in the third example, the cp command is run on the first available compute node that fits the resources requested on the command line.

A typical submission script, in this case using the hostname command to get the computer name, will look like this:

#!/bin/bash
#
#BSUB -n 1                    # Number of cores
#BSUB -W 05                   # Runtime in [[D:]HH:]MM
#BSUB -q normal               # Queue to submit to 
  (Grid 2.5 changes to the above line: #BSUB -q short) 
#BSUB -R "rusage[mem=4000]"   # Memory pool for all cores
#BSUB -M 4000                 # Memory pool for all cores 
#BSUB -o hostname_%J.out      # File to which STDOUT will be written 
#BSUB -e hostname_%J.err      # File to which STDERR will be written 
#BSUB -B -N                   # Send email when job begins & ends/fails 
#BSUB -u myemail@what.com     # NOTE! guest users you would need to use this option

hostname

 

In general, the script is composed of 3 parts.

  • The #!/bin/bash line allows the script to be run as a bash script
  • The #BSUB lines are technically bash comments, but they set various parameters for the LSF scheduler
  • The command line itself.

The #BSUB lines shown above set key parameters (Note: It is important to keep all #BSUB lines together and at the top of the script; no bash code or variables settings should be done until after the #BSUB lines). The LSF system copies many environment variables from your current session to the compute host where the script is run including PATH and your current working directory. As a result, you can specify files relative to your current location (e.g. ./project/myfiles/myfile.txt).

#BSUB -n 1

This line sets the number of cores that you're requesting. Make sure that your tool can use multiple cores before requesting more than one. If this parameter is omitted, LSF assumes -n 1.

#BSUB -W 05

This line specifies the running time for the job in minutes. You can also the convenient format [[D:]HH:]MM. If your job runs longer than the value you specify here, it will be cancelled. At this time, jobs have no maximum run time, though this will likely change in the future. (Important NOTE: This is changing in Grid 2.5 !)  Thus it is in your best interest to specify the time as a routine habit. There is no penalty for over-requesting time. NOTE! If this parameter is omitted on any queue, the your job will be given the default of 10 minutes.  

Grid 2.5 changes: To ensure fair usage of the grid there will be a time limit on run time in Grid 2.5. The time limit for short batch jobs (bsub -q short) is 3 days. The time limit for long batch jobs (bsub -q long) is 7 days.

#BSUB -q normal

This line specifies the LSF queue under which the script or command will be run. The normal partition is good for routine, non-SAS jobs that can take advantage of all parts of the Grid. See the queues description below for more information.

Grid 2.5 changes: use -q short for jobs that run within 3 days and -q long for jobs that run more than 3 days but within 7 days

#BSUB -R "rusage[mem=4000]"
#BSUB -M 4000

The HBS LSF cluster does not require that you specify the amount of memory (in MB) that you will be using for your job. If this parameter is omitted, the smallest amount is allocated, usually 100 MB. And chances are good that your job will be killed as it will likely go over this amount. Moreover, accurate specifications allow jobs to be run with maximum efficiency on the system.

#BSUB -o hostname_%J.out

This line specifies the file to which standard out will be appended. If a relative file name is used, it will be relative to your current working directory. The %Jin the filename will be substituted by the jobID at runtime. If this parameter is omitted, any output will be directed to the email that is sent out when the job finishes.

#BSUB -e hostname_%J.err

This line specifies the file to which standard error will be appended. LSF submission and processing errors will also appear in the file. The %J in the filename will be substituted by the jobID at runtime. If this parameter is omitted, any output will be directed to the same location as the -o parameter, which will be either a file or in an email.

#BSUB -B -N

Because jobs are processed in the "background" and can take some time to run, it is useful send an email message when the job has started or finished.  Please NOTE that for guest users you would also need to use the -u option

 

Last Updated 10/5/2018