Tip of the Month

New Tip from the RCS Stats Team

May 17, 2019

This month's tip is by our Senior Statistician, Xiang Ao: 

Stata’s margins command has been a powerful tool for many economists. It can calculate predicted means as well as predicted marginal effects. Sometimes we’d like to compare those marginal effects. People use margins and marginsplot to generate marginal effects; then draw conclusions on whether there is a difference between marginal effects, based on whether the confidence intervals overlap or not. However, that can actually be wrong. In this post, I’d like to introduce a way to...

Read more about New Tip from the RCS Stats Team

New Tips from the RCS Stats Team

December 7, 2018

We are often asked about how to calculate marginal effects in R, especially from Stata users who use Stata's margins and marginsplot commands after regression models. These two packages in R have similar functions to Stata's margins and marginsplot commands, which are used to calculate marginal effects after a regression model and graph them: 

ggeffects

...

Read more about New Tips from the RCS Stats Team

New Tip from the RCS Data Team

December 7, 2018

Natural Language Processing (NLP) assists computers with processing and understanding natural human language, such as speeches, tweets, and newspaper articles. NLP can range from counting the number of times a word appears in text to analyses that assess attitudes (e.g., positive, negative). NLP can be conducted on a variety of platforms, including the robust NLTK package in Python and several libraries in R.

For an introduction and hands-on experience using the NLTK in Python, DataCamp provides a free module as part of their NLP fundamentals course:...

Read more about New Tip from the RCS Data Team

New Tip from the RCS Data Team

October 25, 2018

The EXPLAIN statement is a very useful tool in SQL databases to help users better understand what's going on in queries and where to apply tweaks. For example, the output of EXPLAIN can help you decide where to add indexes and can quickly remedy slow queries by telling you the join type, the possible indexes to choose vs. the index actually chosen, the estimate of rows to be examined, etc.

How do you use EXPLAIN?

Simply put the keyword EXPLAIN in front of the query to be analyzed. EXPLAIN can be used in front of a query beginning with...

Read more about New Tip from the RCS Data Team

New Tip from the RCS Data Team

August 11, 2018
Have you ever had to determine gender for a list of names, but are not sure where to start? We’ve come across these cool and easy-to-use tools to help you with this task. You can assign gender and probability using the gender package in R (https://cran.r-project.org/web/packages/gender/index.html) and genderize.io in Python (https://pypi.org/project/Genderize/). Please contact RCS for sample scripts if you... Read more about New Tip from the RCS Data Team

Compute Grid Tip of the Month - December 2017

December 5, 2017

Program crashing? What's going on?

Software will occasionally crash while working on the compute grid for apparently no reason, which can be very frustrating. The typical reaction is to launch the program again to continue one's work. But why did this happen in the first place? And will this happen again?

The most common problem is an Out of Memory error. As opposed to work on desktops or laptops, events on the compute grid are carefully logged. The...

Read more about Compute Grid Tip of the Month - December 2017

Compute Grid Tip of the Month - May 2017

April 26, 2017

Parallel/Multicore Processing

Using multiple cores (CPUs) to analyze data is an efficient way to get more work done in less time. But this is true only under certain circumstances. By default, R, Python, and MATLAB can only use one core even on a multicore (multiCPU) machine, unless you specifically program them to use more. Stata, on the other hand, has been parallelized, so many of its functions can use more than one core, but only to a maximum of 75% efficiency overall. To get the most efficiency, its best to run your 'do' files in batch; if using the interactive GUI, Stata spends...

Read more about Compute Grid Tip of the Month - May 2017