Troubleshooting Jobs

A variety of problems can arise when running jobs and applications on the Grid. Many are related to resource misallocation, but there are other common problems as well. Be sure to check for email messages from the schedule which may explain problems; or check the log files from your application.

Error

Likely Cause

JOB <jobid> CANCELLED AT <time> DUE TO TIME LIMIT

You did not specify enough time in your submission script. The -t option sets time in minutes or can also take D-HH:MM form (0-12:30 for 12.5 hours)

Job <jobid> exceeded <mem> memory limit, being killed

Your job is attempting to use more memory than you've requested for it. Either increase the amount of memory requested or, if possible, reduce the amount your application is trying to use. For example, many Java programs set heap space using the -Xmx JVM option. This could potentially be reduced.

Exited with exit code N

Your job failed because your application exited with an error. Please look at the job or application logs to determine why your program exited abnormally.

If you are unable to determine why your jobs are not running correctly, please contact RCS for assistance.