"Take what you need. Need what you take."
Since the compute grid is a shared computing resource, being a good community member is important to ensure that everyone has fair access to do their work. For that purpose, it is important to accurately request the amount of RAM and number of CPU/cores that you wish to use.
Why is this important? The Grid is a complex, shared system that must have an accurate idea of the resources your program(s) will use so that it can effectively schedule jobs -- finding space (RAM+CPUs) on a computer in order to do your work. If insufficient memory is allocated, your program may crash, often in an unintelligible way; if too much memory is allocated, resources that could be used for other researcher's sessions (interactive or batch) will be wasted. Additionally, your "fairshare", a number used in calculating the priority of getting your work scheduled and running, can be adversely affected by over-requesting. Therefore it is important to be as accurate as possible when requesting cores and memory through the default application submission scripts or through custom ones.
Many scientific computing tools can take advantage of multiple processing cores, but many cannot. A typical MATLAB, R, or Python script, for example, will not use multiple cores. On the other hand, Stata, a graphical console for statistics is improved substantially by using multiple cores, but not for every Stata function. Please read the program documentation to understand the multicore capabilities or check with the RCS staff before requesting multiple cores for a given application.
Finally, when you start a session interactively or via batch, the RAM and CPU that you've requested are reserved only for your use, even when your session is idle. This means that, until your code finishes or you exit the interactive program, those resources cannot be used by anyone else, and are wasted if sitting idle.
Figuring out the appropriate amount of RAM and CPU/cores take a little bit of sleuthing, but becomes very easy in time. Here are some general guidelines:
If you data you've worked with before or work that you're repeating:
- More is not really better, since this is a shared resource
- Use fewer cores (1!) for interactive work, especially if you plan on having a session open over several days. It is considered bad form to leave sessions open for more than 7 days, as no one can use the resources that are reserved exclusively for your use.
- Choosing multiple cores for interactive work is OK if you will be finishing your work in hours to a day or two. Please do not let these sessions sit idle.
- Check your MAX MEM usage (see below) from past job history, and select best fit memory footprint.
- A little more difficult, but write custom LSF job submit commands to closely match memory usage that you need. You'll need to do this if requiring RAM amounts > 30 GB, as the default wrapper scripts only allow 30 GB RAM allocations as a maximum.
If you really have no idea where to start, try one of the following approaching for approximating RAM and/or CPU usage:
- Remember that MATLAB, R, and Python can only use 1 CPU unless you've programmed it to do otherwise.
- Stata can use multiple CPUs, but be conservative. Again, more is not necessarily better.
- Each language has commands that will give you the memory usage of your data while loaded (in memory)
- Or, if not creating new data structures after reading in data file, try RAM footprint that is 10x the data file size. If creating new ones, try 20x to 30x.
- Or, try a large memory size (e.g. 20G), finish your work, and decrease the memory ask by checking the MAX MEM usage, and selecting best fit memory footprint next time.
- Give yourself about 20% wiggle room
LSF, the scheduling software, makes it easy to figure out how much RAM you've used for a currently running or past job. Using either the
bjobs command for currently running jobs, or the
bhist command for finished jobs, and search for the text MAX MEM with grep and one can easily determine the maximum amount of RAM used for your jobs.
For example, using the
-l flag (long format) to display info for currently running jobs:
[jharvard@rhrcscli01:~]$ bjobs -l | grep -E "Application|IDLE|MAX" Job <144795>, User <rfreeman>, Project <XSTATA>, Application <stata-mp4-30g>, S IDLE_FACTOR(cputime/runtime): 0.01 MAX MEM: 56 Mbytes; AVG MEM: 49 Mbytes
The next example can be used to display information for jobs that ran since a particular date. Here we will use the flags
-a (all jobs),
-l (long format), and
-S (submitted date; comma indicates range up to today):
[jharvard@rhrcscli01:~]$ bhist -a -l -S 2017/09/1, | grep -E "Application|IDLE|MAX" Job <158502>, User <jharvard>, Project <STATA-SE>, Application <stata-se-5g>, I MAX MEM: 12 Gbytes; AVG MEM: 12 Mbytes Job <158547>, Job Name <MATLAB>, User <rfreeman>, Project <MATLAB>, Application MAX MEM: 607 Mbytes; AVG MEM: 511 Mbytes
MAX MEM values in bold can now inform how much RAM you would ask for next time you do similar work or work with the same data.