Parallel Processing

Also commonly called parallel computing or multicore processing, using multiple cores (CPUs) to analyze data is an efficient way to get more work done in less time. But only under certain circumstances!

First, your program must have been design from the start to make use of multiple cores. Just because the computer has them doesn't mean a given program can use them. This is the case for both R and Python, which are single-threaded (one CPU), and can only use multiple CPUs with custom programming that includes helper modules or packages to do so. In contrast to this is Stata, which out of the box has been parallelized, with most functions achieving up to 75% efficiency but with diminishing returns (75% efficiency at 2-cores, 60% efficiency at 4 cores, etc). 

Second, your script or code must be parallelizable, meaning that it can be broken into parts that can execute side-by-side independently. This is often the case with for loops or functions that can perform work independently. A good example is the apply functions in R (apply(), lapply(), etc). 

Finally, on shared compute systems, you need to indicate to the scheduler, the system software that manages workloads, that you wish to use multiple cores. On your personal desktop or laptop, this isn't necessary, as you control all the resources on that machine. However, on a compute cluster, you only control the resources that the scheduler has given you, and it has given you only the resources that you've requested, whether this is done explicitly via a custom job submission script, or implicitly using a default values or default submission scripts available on the HBS compute grid. This is due to the fact that jobs (work sessions) from multiple (and possibly) different people, are often running side-by-side on a given compute node on the compute cluster.

So, how does one go about doing this? 

  1. When you submit your job or start your work, request from the scheduler the resources (the CPU cores) you need.
  2. Employ the appropriate code or frameworks to take advantage of the multiple cores that have been reserved for you by the scheduler.

For #2, we'll outline in the following sections how to use parallel processing in each of the documented programming and analysis environments. 

For #1, the NoMachine drop-down menus do not afford the ability to do parallel processing except for using Stata. The menus indicate if one-core (Stata-SE) or four-core (Stata-MP4) can be utilized. If you wish to use multiple cores in any other environment, you must use the terminal and command line tools to request and reserve these resources from the scheduler.

We also recommend that you do not statically indicate, or hard code, the number of cores that you'll be using in your code. Instead, we urge you to write code that dynamically sets this value based on the job/run time environment at job execution time. This is explained in detail in each of the sections enumerated below.

Please see our documentation for parallel processing under the following software and analysis environments:

For other environments, or if you have any questions, please contact RCS.

Updated on 1/15/19