Accessing Software via Modules

Introduction

After choosing resources, picking the software application and environment is the next, important step in doing work on the HBSGrid cluster. This is true for both real-time, interactive work in applications or via batch (background) jobs.

Because of the diversity of projects currently in flight on the HBSGrid cluster, and because the cluster is not a single computer on which you install software directly, a variety of applications and libraries are supported on the cluster. Technically, it is impossible to include everything at once in every user’s environment. For this reason, we now offer https://www.tacc.utexas.edu/research-development/tacc-projects/lmod to selectively expose, or make available, an application and all its supporting binaries, while also ensuring that incompatible programs are not also in the mix.

The primer below will help you use them for submitting jobs; full documentation is on our Software Modules page.

One-time Opt-in

For the time being, using software modules is opt-in. Use the command touch ~/.lmod-yes in the terminal when connnected to the hbsgrid, log out, and log in again, and you should be ready to use software modules.

NoMachine GUI Application menus

If you are currently using the Application menus in NoMachine, software modules does not apply to you at this time. At some future point, we will incorporate software modules into these menus, giving you the flexibility to choose an application and which version to run.

Wrapper Scripts and Custom LSF Job Submissions in the terminal

If you work or submit jobs primarily via command line, especially for batch jobs, software modules will now play a role in how you access software via your job submissions.

There are two ways to use software modules for your job submissions:

  1. Load software modules before submitting jobs, as the software environment is inherited by the job
  2. Load software modules in your job scripts

#1 is great when using interactive shells for development, one-off's, and exploratory work. This method is the only method for command-line wrapper scripts, as wrappers handle job submissions for you with pre-set defaults and commands. Additionally, Note: module integration with wrapper scripts will not be fully functional until September at the earliest. It may work for some wrappers (e.g. MATLAB), but not others (e.g. R, Python). Test carefully before using.

#2 is preferred for batch jobs and using submission scripts for your work: Including the module load command inside the job script is documentation of software title and version, and is a good research data management practice. But it can be rather cumbersome to always write job submission scripts for single-command jobs (e.g. bsub... Rstudio or bsub... python3 myscript.py).

Note: we highly discourage adding module load commands to your ~/.bash_profile and ~/.bashrc login scripts. This not only skews our module usage metrics for software management, but this also might introduce problems to your software environment that may be hard to troubleshoot, esp. as software environments evolve over time (and in rare cases may even complicate your ability to log in).

No matter the approach, some basics will help demystify the process:

The module avail command shows you what applications and versions are available (D indicates default versions):

[jharvard@rhrcscli2:~]$ module avail

----------------------- /usr/local/app/lmod/modulefiles ------------------------
   AMPL/20200501           mathematica/11.3        spyder/3.6.5    (D)
   R/3.5.1                 mathematica/12.1 (D)    stata-mp4/15
   Rscript/3.5.1           matlab/R2017b           stata-mp4/16    (D)
   Rstudio/3.5.1           matlab/R2018a           stata-mp8/15
   SAS/9.1                 matlab/R2019a           stata-mp8/16    (D)
   anaconda/2_5.1          matlab/R2020a    (D)    stata-se/15
   anaconda/3_5.1   (D)    openoffice/4.1.6        stata-se/16     (D)
   conda-R/5.1             python/2.7.14           stattransfer/14
   gitkraken/v1.8.4        python/3.6.5     (D)    stattransfer/15 (D)
   gurobi/7.5.2            spyder/2.7.14

  Where:
   D:  Default Module

Use "module spider" to find all possible modules.
Use "module keyword key1 key2 ..." to search for all possible modules matching
any of the "keys".

The module load command enables a particular application in the environment, by adding the application to your PATH variable, changing other environment variables, and/or pulling in dependencies. For example, to enable the R2019a version of MATLAB:

module load MATLAB/R2019a

or to use the default version (which here is R2020a):

module load MATLAB

Note: We highly recommend that you use the full title/version notation, as defaults will change over time.

Once a module is loaded in your session or inside your script, it is available just as though it had always been there:

[jharvard@rhrcscli2:~]$ module load matlab
[jharvard@rhrcscli2:~]$ which matlab
/usr/local/app/matlab/matlab_2020a/bin/matlab

Full details on module avail, module load, and other module commands are on our Software Modules page.

#1 Loading software modules before submitting jobs

As in the examples above, if one uses the module commands (module purge, module unload, module load, etc.) in your terminal shell, this changes the environment for that shell and any submitted jobs. For example, for an interactive job:

module load matlab/R2020a
bsub -q short_int -Is ... matlab -nodisplay -nojvm -nospash 

or a background job:

module load matlab/R2020a
bsub -q short ... matlab -nodisplay -nojvm -nospash -r my_matlab_script

In both cases, your job will inherit the environment (in particular, the execution PATH), and your preferred version MATLAB will run.

This can be good while developing scripts or code, for running interactive application sessions (both GUI and/or terminal) from the terminal, or general putzing around. If you close that terminal window/session, these settings are lost, and you will have to perform the module load again to set up the environment.

Please see our Command-line Wrapper Scripts or Submitting Jobs pages for more information.

#2 Loading software modules in your job scripts

Loading modules in a terminal/shell sets up the environment solely for the life of that session -- until the shell exits, you close the window, or the like. A better method for loading software is to use submission scripts to run jobs and to include the module load command(s) in the script. This both guarantees the correct program and version is loaded when using the script at that time and in the future, and also serves as documentation for your work.

As a brief example, we create the file my_submit_job.sh to run a job to execute a MATLAB script:

------ my_submit_job.sh -------
#!/bin/bash

module load matlab/r2020a
matlab -nosplash -nodestop -nojvm -r my_matlab_script

---------------------------

Now we submit this script to the scheduler:

bsub -q short bash my_submit_job.sh

Once the job runs (with automatic defaults 1 core and 5 GB RAM) the instructions in my_submit_job.sh are run, and the module load will happen as the first command during the job.

For this last example, more experienced users might note:

  • The bash my_submit_job.sh can be changed to ./my_submit_job.sh if one makes the script executable with chmod ug+x my_submit_job.sh .
  • Additionally, any bsub command line options (e.g. RAM options, -J job name to hide script contents, -We estimated run time for backfill scheduling) can included in the top of the submission script as #BSUB directives, and these will be parsed when submitting the script with the bsub < scriptname format.

 

More details can be found on the Submitting Jobs page, including basic instructions and also more complete examples on writing job submission scripts.

Updated 7/15/2020