NOTE: There are some changes to this document for Grid 2.5. Please see below in context. Please contact RCS with any questions.
Why use the grid? Or what are some examples of things one can do that will enable or accelerate your research?
- You wish to run software interactively that requires computing power (RAM or CPUs) equal to or more than one's desktop or laptop.
- You have to run an analysis that may take quite a few hours, but need to free up one's desktop or laptop for other work.
- You wish to analyze or transform large datasets sitting on the research storage spaces with computing resources local to that data.
- You wish to work with data in the research MariaDB database.
- You wish to run a piece of code hundreds of times for a parameter sweep, optimization, or model fitting.
Since this is a shared system, we have to ensure that everyone has equal access and opportunity to use the resources. At this time, the following constraints have been put in place to help give fair usage:
- Each user can use 12 CPUs (cores) max across all jobs. This can be 12 single-core jobs, or fewer jobs that total 12 cores if running multi-threaded applications.
- Each user can use a max of 80GB of memory concurrently (across all jobs).
Grid 2.5 changes: We have reduced the constraints on both interactive and batch work:
- For interactive sessions, each user can use up to 24 CPUs (cores) across all jobs and up to 3 jobs at a time.
- For batch jobs, there is no hard limit on the number jobs that can run at once. Instead, the scheduler will dynamically limit the number of jobs dispatched to run run based on past usage. This Fairshare scheduling ensures that everyone is able to run jobs. This is an upper limit of 16 cores/job.
- There will be no more limits on RAM (memory).
Your jobs will queue, or PEND, if you exceed either of these allowances, and will run when resources you are using are freed. For example If you launch 3 Stata-MP4 sessions on the NoMachineNX server (which consumes a total of 12 cores), and then try to launch any other program, this program will not launch:
Since you've exceeded your resource allotment, your job is queued (PEND) to run. If you were to quit one of the Stata sessions (freeing up 4 cores), your program will then launch.
Grid 2.5 changes:
The above example of launching another program after your 3 Stata-MP4 sessions (which consumes a total of 12 cores) on the NoMachineNX server will not pend as long as it doesn't consume more than the remaining 12 cores.
If your work requires resources beyond these limits, please contact Research Computing Services (RCS) so that we can arrange temporary exemptions.
Last Updated 8/1/18