# New Tips from the RCS Stats Team

December 7, 2018

We are often asked about how to calculate marginal effects in R, especially from Stata users who use Stata's margins and marginsplot commands after regression models. These two packages in R have similar functions to Stata's margins and marginsplot commands, which are used to calculate marginal effects after a regression model and graph them:

ggeffects

...

# New Tip from the RCS Data Team

December 7, 2018

Natural Language Processing (NLP) assists computers with processing and understanding natural human language, such as speeches, tweets, and newspaper articles. NLP can range from counting the number of times a word appears in text to analyses that assess attitudes (e.g., positive, negative). NLP can be conducted on a variety of platforms, including the robust NLTK package in Python and several libraries in R.

For an introduction and hands-on experience using the NLTK in Python, DataCamp provides a free module as part of their NLP fundamentals course:...

# New Tip from the RCS Data Team

October 25, 2018

The EXPLAIN statement is a very useful tool in SQL databases to help users better understand what's going on in queries and where to apply tweaks. For example, the output of EXPLAIN can help you decide where to add indexes and can quickly remedy slow queries by telling you the join type, the possible indexes to choose vs. the index actually chosen, the estimate of rows to be examined, etc.

How do you use EXPLAIN?

Simply put the keyword EXPLAIN in front of the query to be analyzed. EXPLAIN can be used in front of a query beginning with...

# New Tip from the RCS Stats Team

August 11, 2018
Our Senior Statistician, Xiang Ao, has provided some helpful tips for interpreting interatctions in a regression model both with binary variables and continuous variables. Please see his blog post for details: http://xiangao.netlify.com/2017/12/07/interpreting-interaction/

# New Tip from the RCS Data Team

August 11, 2018
Have you ever had to determine gender for a list of names, but are not sure where to start? We’ve come across these cool and easy-to-use tools to help you with this task. You can assign gender and probability using the gender package in R (https://cran.r-project.org/web/packages/gender/index.html) and genderize.io in Python (https://pypi.org/project/Genderize/). Please contact RCS for sample scripts if you... Read more about New Tip from the RCS Data Team

# Compute Grid Tip of the Month - December 2017

December 5, 2017

Program crashing? What's going on?

Software will occasionally crash while working on the compute grid for apparently no reason, which can be very frustrating. The typical reaction is to launch the program again to continue one's work. But why did this happen in the first place? And will this happen again?

The most common problem is an Out of Memory error. As opposed to work on desktops or laptops, events on the compute grid are carefully logged. The...

Read more about Compute Grid Tip of the Month - December 2017

# Compute Grid Tip of the Month - October 2017

October 2, 2017

Are you interested to see what commands you use most often? The following one liner shows the top ten commands from your command history list:

$history | awk '{print$2;}' | sort | uniq -c | sort -rn | head -n 10

... Read more about Compute Grid Tip of the Month - October 2017

# Compute Grid Tip of the Month - May 2017

April 26, 2017

Parallel/Multicore Processing

Using multiple cores (CPUs) to analyze data is an efficient way to get more work done in less time. But this is true only under certain circumstances. By default, R, Python, and MATLAB can only use one core even on a multicore (multiCPU) machine, unless you specifically program them to use more. Stata, on the other hand, has been parallelized, so many of its functions can use more than one core, but only to a maximum of 75% efficiency overall. To get the most efficiency, its best to run your 'do' files in batch; if using the interactive GUI, Stata spends...

Read more about Compute Grid Tip of the Month - May 2017

# Compute Grid Tip of the Month - April 2017

March 30, 2017

Turn those icons off!

If you are using project spaces and home folders on the research storage (part of the HBS compute grid), it is likely that you will need to access these while not on the HBS campus. If you are mapping drives or mounting shared folders, the default settings in both Windows and Mac OS may be working against you, as the OS will try to present the files to you with an icon of the contents. This takes much more time to display in Finder or Explorer windows that simple generic file icons, especially over the VPN.... Read more about Compute Grid Tip of the Month - April 2017

March 3, 2017