Stata offers a 293-page report on its parallelization efforts:
Stata/MP1 is the version of Stata that is programmed to take full advantage of multicore and multiprocessor computers. It is exactly like Stata/SE in all ways except that it distributes many of Stata’s most computationally demanding tasks across all the cores in your computer and thereby runs faster—much faster.
They are pretty impressive. However:
With multiple cores, one might expect to achieve the theoretical upper bound of doubling the speed by doubling the number of cores—2 cores run twice as fast as 1, 4 run twice as fast as 2, and so on. However, there are three reasons why such perfect scalability cannot be expected: 1) some calculations have parts that cannot be partitioned into parallel processes; 2) even when there are parts that can be partitioned, determining how to partition them takes computer time; and 3) multicore/multiprocessor systems only duplicate processors and cores, not all the other system resources.
Stata/MP achieved 75% parallelization efficiency overall and 85% efficiency among estimation commands... Speed is more important for problems that are quantified as large in terms of the size of the dataset or some other aspect of the problem, such as the number of covariates. On large problems, Stata/MP with 2 cores runs half of Stata’s commands at least 1.7 times faster than on a single core. With 4 cores, the same commands run at least 2.4 times faster than on a single core.
How to Utilize This?
This parallelization benefit is mostly realized in running code in batch mode. If using Stata interactively, Stata is predominantly waiting for user input, and so the parallelization gains diminish rapidly. If one intends to do intense, focused work for short periods of time (up to a few days) and subsequently exit the software, choosing multiple cores is fine. But if you plan to run an interactive session over the course of the day or two, please select Stata-SE, as the multiple cores that you have requested are reserved only for you and will sit idle during this time, decreasing the resources available to other people.
No additional work is needed for you to utilize the multiple CPU cores in your code. Stata will handle this transparently for you. But you do need to ensure that you ask the compute grid to reserve the cores for your use:
Using NoMachine (interactive only):
From the Applications menu, select the Stata-SE menus for single-core or Stata-MP4 menus for 4-core Stata. Under each, select the appropriate memory footprint for your work (see Choosing Resources). An example screenshot can be see here. The wrapper scripts that drive these menu items include all the necessary commands to start Stata with the designated number of CPU cores within your session.
Using PAC (batch only):
Note that PAC is not yet available for Grid 2.5. Please contact RCS if you have any questions.
Stata-SE (single-core), -MP4 (4-core), and -MP8 (8-core) options are available as application profiles using the Platform Application Center when starting a new job. Selecting one of these profiles will ask the scheduler to reserve the cores for your job. Please see our PAC documentation for more details.
Using the command-line (interactive or batch):
Both interactive and batch jobs can started from the command line, and both default, wrapper scripts and custom submission scripts can be employed. (Again, wrapper scripts are great for out-of-the-box configs; custom scripts are needed for atypical RAM or CPU requirements.) For example:
# interactive (GUI) Stata-MP4 with 5 GB footprint via default wrapper xstata-mp4-5g # interactive (GUI) Stata-MP4, 35 GB, for 12 hours via custom submission script bsub -q long_int -n 4 -Ip -W 12:00 -R "rusage[mem=35000]" -M 35000 -hl xstata-mp4 # batch Stata-MP4 with 5 GB footprint via default wrapper stata-mp4-5g -b do myfile.do # batch Stata-MP4, 35 GB, for 12 hours via custom submission script bsub -q long -n 4 -W 12:00 -R "rusage[mem=35000]" -M 35000 -hl stata-mp4 -b do myfile.do
Updated on 2/19/19