Parallel Processing

Also commonly called parallel computing or multicore processing, using multiple cores (CPUs) to analyze data can be an efficient way to get more work done in less time. There two basic ways to use multiple cores:  implicit parallelism built in to your application or library, and explicit parallelism that you program and manage yourself. Explicit parallelism can be achieved using application- or language-native tools, or using LSF job arrays (as one example).

Requesting multiple CPUs on the Grid

When using parallel processing on shared compute systems, you need to indicate to the scheduler (the system software that manages workloads, e.g., LSF) that you wish to use multiple cores. On your personal desktop or laptop, this isn't necessary, as you control all the resources on that machine. However, on a compute cluster, you only control the resources that the scheduler has given you, and it has given you only the resources that you've requested, whether this is done explicitly via a custom job submission script, or implicitly using default values or default submission (wrapper) scripts available on the HBSGrid compute cluster. This is due to the fact that jobs (work sessions) from multiple (and possibly) different people are often running side-by-side on a given compute node on the compute cluster.

To use multiple CPUs on the HBSGrid you can start your application using a wrapper script or a custom submission script and specify the number of CPUs you will use. For example, starting R via 'Rstudio -n 5' from the command line will start R with 5 CPUs reserved. Note that the NoMachine drop-down menus do not currently afford the ability to do parallel processing except for using Stata. The menus indicate if one-core (Stata-SE) or four-core (Stata-MP4) can be utilized. If you wish to use multiple cores in any other environment, you must use the terminal and command line tools to request and reserve these resources from the scheduler.

Implicit parallelism

Implicit parallelism is easiest to use but limited to the features offered by your application or programming language. Most of the applications commonly used for data analysis on the cluster provide some degree of implicit parallelization. The system-wide installation of RStudio / Microsoft R Open uses the Intel Math Kernel Lbrary (MKL) for fast multi-threaded computations.  The system-wide installation of Spyder / Python also use MKL to speed up some computations. Similarly, many Stata commands have been paralellized, as have some MATLAB algorithms. Note that for all these applications only some computations (e.g. built-in functions) use implicit parallelization and many computations will only use a single CPU. To speed up other computations you may be able to use explicit parallelization.

Programs on the HBSGrid that use implicit parallelism are Stata, MATLAB, and Mathematica. Most system-wide applications started on the HBSGrid via command-line wrapper scripts will use the number of CPUs you specify when starting your application. For example, starting MATLAB via matlab -n 4 from the command line will start MATLAB with a default 5 GB of RAM and a reservation of 4 cores. More information about using implicit parallelization can be found in the application-specific parallelization pages.

Explicit parallelism

Explicit parallelism can be achieved using application or library features to use multiple CPUs on a single compute node, or using LSF job arrays to use multiple CPUs across multiple compute nodes (a best practice). For this to work, your script or code must be parallelizable, meaning that it can be broken into parts that can execute side-by-side independently. This is often the case with for loops or functions that can perform work independently. A good example is the apply functions in R (apply(), lapply(), etc).

As when using implicit parallelism, you must request the number of CPUs you will use when starting a job on the Grid. We also recommend that you do not statically indicate, or hard code, the number of cores that you'll be using in your code. Instead, we urge you to write code that dynamically sets this value based on the job/run time environment at job execution time. This is explained in detail in each of the sections enumerated below.

Explicit parallelism uses application-specific libraries and features, and is described below for each of the most commonly used programs on the cluster:

Please see our documentation for parallel processing under the following software and analysis environments:

For other environments, or if you have any questions, please contact RCS.

Updated on 6/2/2020