Monitoring Progress and Controlling Jobs

Whether using a GUI or the command line, monitoring the progress of your jobs is essential. Even more so, is understanding how to control them. Please see the appropriate section,

for your task at hand.

Both methods provide information about the job state. This value will typically be one of PENDING, RUNNING, COMPLETED, CANCELLED, and FAILED:

  • PENDING: Job is awaiting a slot suitable for the requested resources or you've gone over your limit on resource usage. Jobs with high resource demands may spend significant time PENDING if the compute grid is busy.
  • RUNNING: Job is running.
  • COMPLETED: Job has finished and the command(s) have returned successfully (i.e. exit code 0).
  • CANCELLED: Job has been terminated by the user or administrator using bkill.
  • FAILED: Job finished with an exit code other than 0.

Please also see our technote on Monitoring CPU Usage for your Jobs in order to understand the runtime behavior of your code.