Load Sharing Facility (LSF) on the Grid

The HBS research Grid's collection of compute nodes and servers are all coordinated by the software system Load Sharing Facility, otherwise known as LSF. This overlay on a compute cluster is also known as the scheduler. Other computer clusters may use LSF, or other common scheduler software like SLURM, Sun Grid Engine (SGE), or PBS/Torque.

Mostly transparent, this system of networked software listens to requests to launch programs via the application scripts in NoMachine or commands in the terminal on the login nodes, or via batch submission commands. The software then coordinates matching a compute node with the work that you need to perform, and makes those resources exclusively available to you for the duration. Through this process, your interactive GUI program session or background batch program are jobs that are scheduled on the system, alongside all other users.

As this is a shared resource with finite constraints, it is important to understand how it works, the limitations, and how to work appropriately. This will enable you to work most effectively and efficiently, while at the same ensuring that the resources are available for others to use when you don't need them.

LSF 10.1 Reference: https://www.ibm.com/support/knowledgecenter/en/SSWRJV_10.1.0/lsf_welcome/lsf_welcome.html

Updated on 1/15/19