HBSGrid Environment Changes for 7/6/2020

July 7, 2020

The following changes are being made to the HBSGrid cluster and storage, effective 7/6/2020:

Software Updates

We have installed License updates for:

  • SAS
  • StatTransfer
  • Mathematica
  • AMPL/Knitro

We have installed Version updates to:

  • MATLAB has been updated to r2020a (from r2019a)
  • StatTransfer has been updated to v15 (from 14.1)
  • Mathematica has been updated to 12 (from 11.2)

The new titles will be available initially via Lmod software modules (see below), and to all GUI/wrapper script users within several weeks.

We have replaced LibreOffice/Calc with OpenOffice/Calc due to several problems with the former title. This should be accessible now in the NoMachine/Gnome Applications menu.

Other, newer software versions will be made available within the next 2 to 4 weeks. This includes newer versions of R, RStudio, and Python.

Flexible Software Usage with Lmod Software Modules

Description: Our current cluster software environment is not flexible enough to accommodate new software versions for both command-line and GUI users. One example of this inflexibility is that software titles are updated, GUI users and those using command-line 'wrapper scripts' are forced to use new the titles with not-so-great options for staying with older titles if needed for their research project.

Remedy: We have installed and are gradually releasing Lmod software modules, a software control system that provide great flexibility in what software titles and versions are available for one's use. This also ensures that one can write a script and little or no changes are necessary when the software is updated or upgraded. Full documentation for using software modules is provided on our Software Modules pages (here and soon in the Running Jobs section).

16-group Limit

Description: Persons using the HBSGrid cluster and storage could not be a member of more than 16 groups ( == have more than 12 - 14 projects spaces). If a member of > 16 groups, users would receive "Access Denied" errors to project folders in a random and unpredictable fashion.

Remedy: This restriction has now been eliminated.

LSF Batch Job Dispatch and Scheduling Adjustments

Description: Large jobs -- those bigger in cores/RAM than what is currently running -- can be delayed for long periods when the cluster is both very busy and many jobs are waiting to run. The scheduler instead runs smaller jobs, even if your scheduling priority is high (your Fairshare score is higher than that of other people whose jobs are pending).

Remedy: We have implemented two changes:

  • We have enabled resource reservation for cores and RAM, which holds cores/RAM free for a configurable period of time. This creates an opportunity for the bigger jobs to run, although resources might sit idle (see next).
  • We have also enabled backfill scheduling, a feature that will schedule small jobs on reserved/idle resources, provided the jobs will finish before the anticipated start of the larger jobs. This should promote efficient job throughput.

LSF's scheduling algorithm prioritizes cluster utilization and efficiency, not necessarily fair scheduling for our users. The resource reservation with (configurable) grace periods should remedy this. Backfill should increase job throughput to levels we experience now. And users who submit jobs with exact or approximate, estimated run times (bsub -W option or -We option, respectively) will likely see higher throughput as their jobs are scheduled via backfill on the reserved resources ahead of other pending jobs.

Updated 7/7/2020