Parallel Python

Note: There are some change to this document for Grid 2.5. Please see below and in-context. 
Please contact RCS with any questions.

Sections:

Introduction

This page is intended to help you with running parallel python codes on the HBS compute grid or on your local multicore machine. The package to be highlighted is the 'multiprocessing' package. This page will NOT cover distributed computing, which distributes the workload over multiple machines.

Maximum Workers: Each compute node on the Grid has 32 physical cores; therefore (in theory) users should request no more than 32 cores. However, due to current user resource limits, you should request no more than 12 cores. If you request more than 12 cores, your job will not run as it will sit in a PEND state.

(Grid 2.5: For short queue jobs, you may request the use of up to 16 cores, while the limit remains at 12 cores for long queue jobs.)

Example: Parallel Processing Basics

This sample code will provide a basic introduction to parallel processing. You will be shown how to set up your parallel pool with the appropriate number of workers, how to define which function is to be run in parallel, and how to gather the results.

For this example, we will calculate the square of a list of numbers in parallel. 

____________________________________________________________

import sys
import os
import multiprocessing
import time

 

def f(x):
    pid=os.getpid()
    print("{}:{}".format(pid,x*x))
    return x*x


if __name__ == "__main__":
        numList=range(1,100)      
        procs = [multiprocessing.Process(target=f, args=(x,)) for x in numList]  

        for p in procs:
               p.start()
               p.join()

Outputs:

3728:1
10236:4
11508:9
8348:16
3012:25
13244:36
2528:49
8440:64
5184:81
11168:100
6848:121
13292:144
........

Example: Parallel Processing with Pools

if __name__ == '__main__':

      num_workers = multiprocessing.cpu_count()-1
      numList=range(1,100)

    p = multiprocessing.Pool(num_workers)
    result = p.map(f,numList)
    p.close()
    p.join()

Outputs:

14452:1
14452:4
14452:9
14452:16
14452:25
14452:36
......
14452:2304
14452:2401
2940:2809
2940:2916
14452:2500
2940:3025
2940:3136
14452:2601
6452:3249
2940:3721
2940:3844

 

____________________________________________________________

 

 

Code with Job Submission Script

To run the above code (named test.py) using 5 CPU cores with the Grid's default wrapper scripts, in the terminal use the following command:

python -n 5 test.py

Grid 2.5: Note that since the normal queue has been split, in the above two examples you will need to use "short" or "long" instead of "normal." Therefore for Grid 2.5 those two examples should look like the following:

bsub -q long -N -n 5 -W 10 -R ”rusage[mem=100]” -M 100 python -r "run('python.py');"

bsub -q long -N -n 5 -W 10 -R ”rusage[mem=100]” -M 100 python \< test.py

If you wish to use a submission script to run this code and include LSF job option parameters, create a text file named code.sh containing the following:

____________________________________________________________
 
#!/bin/bash
#
#BSUB -q normal
#BSUB -N
#BSUB -W 10
#BSUB -R" rusage[mem=100]"
#BSUB -W 100
python -r "run('test.py');"
____________________________________________________________

(Grid 2.5: Note that since the normal queue has been split, in the above example you will need to use "short" or "long" instead of "normal" dependent upon the number of cores you will be requesting.)

Once your script is ready, you may run it with 5 cores by entering:

bsub -n 5 < ./code.sh

The < character is used here so that the #BSUB directives in the script file are parsed by LSF.

Please see Submitting Batch Jobs for more information.

 

 

 

Updated on 8/9/18