Welcome to the new and improved HBS Compute Grid v2.5!
- Major Changes to Note
- Major Considerations
- Obtaining Support and Feedback
- Problem Resolution and Maintenance
- Other Important Items to Note
Based on both research computing trends and recommendations from the HBS research computing environment assessment, Research Computing Services (RCS) has been working closely with HBS IT to make improvements to our local compute grid. This new compute grid, v2.5, provides the following updates and enhancements:
- Improved compute capacity through more hardware and better using of existing hardware
- Significantly fewer restrictions on compute capacity
- Increased safeguards to prevent CPU spillover and memory problems
- Newer OS and software versions, and improved usability
- More software titles, including GitKraken, a GUI Git version control application, and Spyder, a Python IDE
- Better command-line submission scripts, to improve productivity.
We have summarized below what we believe are the need-to-know points for you to begin your work in the new environment. We ask that you read through this and related documents / web pages thoroughly. With the new environment, there will be differences with directions, hostnames, URLs, etc. Although we have tried to document these differences as much as possible, and please let us if there are any omissions. Of course, contact us at any time if you have questions.
We put together some information to summarize and detail the improvements to our computing environment:
- Grid 2.5 Improvements (1-page PDF)
- Web page on Major Usability Changes details
- Major NoMachine Interface Changes (PDF)
The two major interrelated points to remember as you conduct your work are:
- Interactive sessions are now limited to 1 or 3 days. Most batch sessions are limited to 3 or 7 days.
- The per-user resource limits have been either raised or eliminated.
Finally, PAC is not immediately available in the new environment. We hope to have this running in early 2019.
The queues and scheduling setup is a work in progress! As we add more compute nodes, watch usage patterns, and determine how the scheduler responds, we may need to adjust the scheduling policies and limits; though we will communicate clearly any changes and with advanced notice.
You will need to use new hostnames to log into the new environment:
For SSH / terminal sessions, login in to
hbsgrid.hbs.edu. You might receive a warning about a host key/fingerprint. Please accept/save the changes, as this is an expected behavior for a new login server.
For NoMachine GUI sessions, follow the setup instructions on our website, and use
researchnx-new.hbs.edufor the hostname.
If you experience any unusual behavior, or if inspiration visits you about new features or enhancements as you work, please contact RCS via email as our preferred communication vehicle. Please try to describe your problem in detail; and error messages and screen captures of the problem are immensely helpful! Of course, phone calls to 617-495-6100 or drop-in visits if you are nearby are always welcome!
As mentioned, the new environment is a work in progress. We’ve had other researchers and our own RCS team using the new environment since late August 2018. Testing has been going smoothly, and we believe that we have caught most of the problems. But in complex systems, one cannot catch every problem in advance. We hope you will bear with us as we work out any final, unforeseen problems.
As noted above, we will make the best decision to either attempt to fix any problems as they arise or to schedule the fix for a maintenance window. In certain situations, we may need to take measures that may jeopardize running sessions, but we will try to avoid this at all costs. This might include killing running jobs, or rebooting either login or compute nodes. For server reboots, we will make every effort to schedule these during maintenance window which we will announce as needed and with appropriate lead time to minimize work interruptions.
Please be aware of the following items that may affect your work:
- Compiled software: Since this new environment is an OS upgrade (Red Hat Enterprise Linux 7.5, from 6.5), software that you have compiled on the current grid is not guaranteed to work in the new environment. This includes packages and modules in R and python. If the add-on is compiled, you are advised to remove the package or module, and re-install. This will force your programming environment to recompile the add-on during installation, which will then be OS-compatible.
- SAS, SAS server, and SAS Connect: These services are now back online and fully operational.
- MariaDB: All services and methods of access are back online and fully operational.
Updated on 1/15/19