LSF Queues & Scheduling

(Update 7/16/2020) Please see the notes below about Fairshare and Scheduling Considerations for more information on scheduling priority, resource reservations, and backfill scheduling.

The HBS grid has no per-user resource limits, but unlimited run times for interactive sessions and batch jobs are no longer permitted. In general, the lower the cores requested per job, the long the job can run. This table gives a summary of the runtime and core limits: 

Queue Type Length Max Cores/Job
long_int interactive 3 days 4
short_int interactive 1 day 12
sas_interactive interactive no limit 4
long batch 7 days 12
short batch 3 days 16
sas_normal batch no limit 4
unlimited batch no limit 4 (for now)

Details on the queues are as follows:

Interactive queues: Interactive queues are divided into long and short run lengths, based on the number of cores requested per job. Additionally, since we wish to ensure that all persons should be able to get at least one interactive session, there is a maximum of 24 cores allowed over a max of 3 sessions; more than 12 cores for a given job are not permitted ("interactive sessions limit").

long_int

This queue is dedicated for interactive (foreground / live) work, for testing (interactively) code before submitting in batch or scaling, or for exploratory work. Serial and parallel jobs using 1 to 4 cores are permitted in this queue, can run a maximum of 3 days, and are subject to the interactive sessions limit.

short_int

This queue is also dedicated for interactive (foreground / live) work, for testing (interactively) code before submitting in batch or scaling, or for exploratory work. Parallel jobs using 1 to 12 cores are permitted in this queue, can run a maximum of 1 day, and are subject to the interactive sessions limit.

sas_interactive

This queue is dedicated for interactive and code testing work for SAS, before submitting in batch or scaling. Serial and parallel jobs using 1 to 4 cores with small resource requirements (RAM/cores) are permitted on this queue, and can run for an unlimited length of time.

Batch queues: Batch queues, like the interactive queues, are also divided into long and short run lengths, based on the number of cores requested per job. Jobs are limited only by the available resources on the batch compute nodes, and the scheduler may limit dispatching jobs to run based on your Fairshare score -- a priority score that might limit your work the more you compute, in order to allow others to run theirs as well. Jobs cannot exceed a maximum of 16 cores.

long

This queue is general purpose queue for all background, batch work and scaled jobs. Serial and parallel jobs using 1 to 12 cores are permitted in this queue, can run a maximum of 7 days, are only limited by the available resources on the system and your Fairshare priority score.

short

This queue is general purpose queue for all background, batch work and scaled jobs. Serial and parallel jobs using 1 to 16 cores are permitted in this queue, can run a maximum of 3 days, and are only limited by the available resources on the system and your Fairshare priority score.

sas_normal

This queue is a general purpose queue for all SAS background, batch work and scaled jobs. Jobs submitted here can use 1 to 4 cores, are only limited by the available resources on the dedicated SAS nodes, and can run for an unlimited length of time.

unlimited

This queue is for single or parallel work up to 4 cores per job with no run time limit (Note: the max cores/job may increase later during the beta-testing period). Since there are very few number of compute nodes for this type of work, your job will schedule when room is available and the previous jobs have finished. We highly recommend that you do not use this queue if possible.

Fairshare

Without additional intervention, schedulers usall dispatch jobs by FIFO - first in, first out. This can lead to persons dumping hundreds or thousands of jobs on the cluster, monopolizing resources and preventing fair access, particularly in the batch queues. To prevent this, the batch queues use Fairshare: each person has a dynamic priority score that decreases as you use more compute. This allows the scheduler to move those with higher prioirty to the front of the queue. Since this is recalculated on a periodic cycle and after jobs complete (compute is used), one's priority shifts back and forth relative to other busy users, allowing jobs to be scheduled for all users. 

Scheduling Considerations - Resource Reservation and Backfill Scheduling

Between FIFO and Fairshare, these two features usually ensures smooth job scheduling and high turnaround, so each person gets results back as quickly as possible. Sometimes, though, hiccups do occur and we need to make adjustments.

Large jobs -- those bigger in cores/RAM than what is currently running -- can be delayed for long periods when the cluster is both very busy and many jobs are waiting to run. And as we don't require run time limits for submitted jobs (interactive applications and batch jobs), how is the scheduler to know how to plan -- to schedule?! Its approach is to run smaller jobs, even if your scheduling priority is high (your Fairshare score is higher than that of other people whose jobs are pending), inconveniencing and frustrating those with 'bigger' jobs.

To remedy this, we have made two changes to our configuration:

  • We have enabled resource reservation for cores and RAM, which holds cores/RAM free for a configurable period of time. This creates an opportunity for the bigger jobs to run, although resources might sit idle (see next). We will take the time to periodically tune this to ensure jobs aren't waiting (... and waiting ...).
  • We have also enabled backfill scheduling, a feature that will schedule small jobs on reserved/idle resources, provided those jobs are predeicted to finish before the anticipated start of the larger jobs. This depends on using the -W and -We options. This should promote efficient job throughput.

LSF's scheduling algorithm prioritizes cluster utilization and efficiency, not necessarily fair scheduling for our users. The resource reservation with (configurable) grace periods should remedy this. And backfill scheduling should increase job throughput to levels we experience now.

Those users who submit jobs with the -W or -We options will likely see higher throughput, as this exact or approximate, estimated run times will inform the scheduler on run lengths, and these jobs are scheduled via backfill scheduling on the reserved resources ahead of other pending jobs.

We hope this helps you utilize the HBSGrid cluster resources more effectively. As always, please contact RCS if you should have any questions.

Updated 7/16/2020