There are two ways to monitor your running job:
- Run unix command top on the compute node where your job is executing, or
- Query LSF via bjobs for current running jobs
Both have their advantages and disadvantages, which we'll discuss below. In either case, either method will give you real-time feedback on how your code is behaving.
Running top on the compute node:
1) Use bjobs to figure out what execution hosts (EXEC_HOST) your job is running on
2) For each host, get an interactive session on that machine (replace EXEC_HOST with the actual machine name, e.g rhrcsnod05)
bsub -q interactive -Is -W 24:00 -R "rusage[mem=1000]" -m EXEC_HOST /bin/bash
(This command gets a bash shell on the named machine with 1 core for 24 hrs with 1000 MB of RAM)
3) Run top, and watch for your processes. Use Ctrl-C to exit top.
Monitoring execution via bjobs output
This is less precise as LSF (the scheduler) must collect runtime data from the execution hosts, so there will be a lag in information. Also, you may not have information for the first 1 to 5 minutes of your job.
1) use bjobs to figure out the job IDs for your jobs
2) for each job, use bjobs -l jobID to get job details
3) look at the IDLE_FACTOR statistic. For a 1 core job at 100% efficiency, this will be 1. For a 2 core job, this will be 2, etc. An example of threading out would be asking for 2 cores and seeing a value of > 2.5 (e.g. 4, 6, etc).