Tip of the Month

New Tip from the RCS Data Team

October 25, 2018

The EXPLAIN statement is a very useful tool in SQL databases to help users better understand what's going on in queries and where to apply tweaks. For example, the output of EXPLAIN can help you decide where to add indexes and can quickly remedy slow queries by telling you the join type, the possible indexes to choose vs. the index actually chosen, the estimate of rows to be examined, etc.

How do you use EXPLAIN?

Simply put the keyword EXPLAIN in front of the query to be analyzed. EXPLAIN can be used in front of a query beginning with...

Read more about New Tip from the RCS Data Team

New Tip from the RCS Data Team

August 11, 2018
Have you ever had to determine gender for a list of names, but are not sure where to start? We’ve come across these cool and easy-to-use tools to help you with this task. You can assign gender and probability using the gender package in R (https://cran.r-project.org/web/packages/gender/index.html) and genderize.io in Python (https://pypi.org/project/Genderize/). Please contact RCS for sample scripts if you... Read more about New Tip from the RCS Data Team

Compute Grid Tip of the Month - December 2017

December 5, 2017

Program crashing? What's going on?

Software will occasionally crash while working on the compute grid for apparently no reason, which can be very frustrating. The typical reaction is to launch the program again to continue one's work. But why did this happen in the first place? And will this happen again?

The most common problem is an Out of Memory error. As opposed to work on desktops or laptops, events on the compute grid are carefully logged. The...

Read more about Compute Grid Tip of the Month - December 2017

Compute Grid Tip of the Month - May 2017

April 26, 2017

Parallel/Multicore Processing

Using multiple cores (CPUs) to analyze data is an efficient way to get more work done in less time. But this is true only under certain circumstances. By default, R, Python, and MATLAB can only use one core even on a multicore (multiCPU) machine, unless you specifically program them to use more. Stata, on the other hand, has been parallelized, so many of its functions can use more than one core, but only to a maximum of 75% efficiency overall. To get the most efficiency, its best to run your 'do' files in batch; if using the interactive GUI, Stata spends...

Read more about Compute Grid Tip of the Month - May 2017

Compute Grid Tip of the Month - April 2017

March 30, 2017

Turn those icons off!

If you are using project spaces and home folders on the research storage (part of the HBS compute grid), it is likely that you will need to access these while not on the HBS campus. If you are mapping drives or mounting shared folders, the default settings in both Windows and Mac OS may be working against you, as the OS will try to present the files to you with an icon of the contents. This takes much more time to display in Finder or Explorer windows that simple generic file icons, especially over the VPN.... Read more about Compute Grid Tip of the Month - April 2017

Compute Grid Tip of the Month - March 2017

March 3, 2017

Choose your resources efficiently

Using the HBS Compute Grid is a great opportunity to scale your research beyond what you can do on a desktop or laptop. But, as a shared resource, we all have a responsibility to use the resources appropriately. The biggest impediment we face is Grid users over-requesting RAM and CPUs: once requested and allocated, these are not returned to the general pool until your job has finished or you have quit your program in the interactive session (the programs from the Applications drop-down menu in NoMachine). So, if you ask for more resources than your job or program needs, those “extra” resources are, in effect, wasted/unavailable to be used by other users. Please help other users out by choosing your resources efficiently when running jobs and program (“Take What You Need, but Need What You Take”).... Read more about Compute Grid Tip of the Month - March 2017

Compute Grid Tip of the Month - February 2017

February 1, 2017

Setting up aliases

Using shortcuts can save time. These statements can be added to the file named '.bashrc' in your home directory. Here are three examples. The lines starting with '#' are comments and are optional.

# detailed directory listing

alias ll="ls -la"

# quick route to my project space

alias jh="cd /export/projects/jharvard_res_project"

# start the emacs editor

alias emacs="emacs -nw"

Compute Grid Tip of the Month - January 2017

December 21, 2016

Where Am I?

The Unix terminal is a very customizable environment. One easy customization is to change your static, uninformative default bash command prompt "[jharvard@researchgrid]$" to update and show you where you are when you change directories. Please see the HOW-TO on our HBS Research Computing Environment website at http://grid.rcs.hbs.org/change-your-unix-command-prompt-now.

 

Read more about Compute Grid Tip of the Month - January 2017