Week 2

Date: Sep 7th - Sep 9th

What we will cover

In this class, we will go dipper into statistical adjustment, and specific issues related to collinearity. Additionally, we will start discussing the differences between correlation and causation.


Complete before Sunday Sep 5th (11:59 pm) (if your section is on Tue) or Tuesday Sep 7th (11:59 pm) (if your section is on Thu). You can find the assignment here.


New window Download

New window Download


Here are the two R scripts we will review in class, with some additional data and questions

Download (Code Week 1)



  • How do I interpret log transformations of variables in a linear regression?

Answer: A lot of the time, we want to transform our dependent variable $ y $ to $ \log(y) $, so that it’s normally distributed (e.g. income), or sometimes we could also have a covariates included in our model in a log form. How do we interpret the coefficients in a linear regression model under these transformations? As we saw in class, you can actually interpret them as percentage changes! Take a look at this article to see how to exactly interpret these coefficients, depending on whether your dependent or independent variable (or both!) are in log form. Go to article

  • Do we need to standardize binary covariates?

Answer: As we saw in class, the main problem with standardizing binary variables is that the interpretation becomes more complicated (e.g. what does “1 standard deviation increase” of the Bechdel test variable mean?). One way to address this would be to not standardize coefficients for indicators (be careful here when comparing effect sizes, though), or standardize variables as suggested by Andrew Gelman, where you can divide all numeric variables by two standard deviations to make them comparable to the coefficients of binary variables (Note: we won’t be doing that in this class, but the information is here if someone is curious about this).

© Magdalena Bennett - licensed under Creative Commons.