HBSGrid Environment Changes for 3/8/2021: The Don't Blink Release

March 4, 2021

blinkThis is a reminder that our next HBSGrid maintenance is scheduled for Monday, March 8th, 8 am–9 am. Please note that this does not fall upon our usual first-Monday-of-the-month maintenance schedule, and is pretty short (don't blink!).

How will this affect you?

  • You will not be able to log in to the HBSGrid via terminal or NoMachine
  • Interactive jobs will be terminated
  • We will disable the generation of core dump files (e.g. core.1234) from crashing programs
  • The local /tmp volume on login nodes will be wiped clean at reboot
  • Files in the scratch space (/export/scratch) will be deleted if they haven’t been modified in over 60 days

What work is being performed?

  • All login nodes will be rebooted
  • LSF will be reconfigured so that jobs will have a ulimit -c of 0. This is a hard limit.
  • At restart, login nodes will run a standard /tmp-wiping script
  • We will run the /scratch filesystem retention script, which deletes files and empty directories that haven't been modified in the last 60 days

Upcoming Changes

We want to alert you to the following changes in our environment that may affect how you work:

  1. /tmp volume on compute nodes: As we've been experiencing problems with this volume filling up, effectively crippling the compute node, we will soon enable a routinely-run script to clean out files written to /tmp that are older than 30 days. We are planning for certain exceptions -- the SAS node and 'unlimited' compute nodes -- and wish to remind persons that one should not rely on /tmp as a temporary storage location beyond the duration of a jobs execution on a given node. We will share more details soon.
  2. /scratch volume: Again, as better storage management measure, we will switch to daily sweeps of /scratch to remove files older than 60 days, instead of the once-a-month occurrence. Please remember that /scratch is not backed up, and we may be forced to remove files newer than the 60-day threshold if the volume's space is exhausted.
  3. More NoMachine/Gnome applications: You may start to notice additional programs appearing in the Applications menus. We are adding back programs normally a part of the RHEL v7 distribution, in order to provide more and better options for conducting research.

More on these topics in the near future, and we hope that these planned improvements will reduce any friction points with the HBSGrid.

We thank you for your anticipated cooperation as we make improvements to the HBSGrid. Our next maintenance will occur on Monday, April 5th. If you have any questions/comments, please email us at research@hbs.edu.

Updated 3/4/2021