STA 235H - Multiple Regression: Binary Outcomes
Fall 2023
McCombs School of Business, UT Austin
1 / 18

Binary Outcomes

You have probably used binary outcomes in regressions, but do you know the issues that they may bring to the table?

What can we do about them?

2 / 18

How to handle binary outcomes?

Linear Probability Model

Logistic Regression

3 / 18

Linear Probability ModelsA Linear Probability Model is just a traditional regression with a binary outcome
4 / 18

Linear Probability Models

A Linear Probability Model is just a traditional regression with a binary outcome
Something interesting about a binary outcome is that the expected value of $Y$ if $Y$ is binary is actually a probability!

$E [Y | X_{1}, . . ., X_{P}] = P r (Y = 0 | X_{1}, . . ., X_{p}) \cdot 0 + P r (Y = 1 | X_{1}, . . ., X_{p}) \cdot 1$ $= P r (Y = 1 | X_{1}, . . ., X_{p})$

4 / 18

How to interpret a LPM?^ββ^'s interpreted as change in probability
5 / 18

How to interpret a LPM?

$\hat{β}$ 's interpreted as change in probability
Example:

$G r a d e A = β_{0} + β_{1} \cdot S t u d y + ε$

${\hat{β}}_{1}$ is the average change in probability of getting an A if I study one more hour.

5 / 18

How to interpret a LPM?

$\hat{β}$ 's interpreted as change in probability
Example:

$G r a d e A = β_{0} + β_{1} \cdot S t u d y + ε$

${\hat{β}}_{1}$ is the average change in probability of getting an A if I study one more hour.
Studying one more hour is associated with an average increase in the probability of getting an A of ${\hat{β}}_{1} \times 100$ percentage points.

5 / 18

How to interpret a LPM?

$\hat{β}$ 's interpreted as change in probability
Example:

$G r a d e A = β_{0} + β_{1} \cdot S t u d y + ε$

${\hat{β}}_{1}$ is the average change in probability of getting an A if I study one more hour.
Studying one more hour is associated with an average increase in the probability of getting an A of ${\hat{β}}_{1} \times 100$ percentage points.

$\hat{G r a d e A} = 0.2 + 0.125 \cdot S t u d y$

Studying one more hour is associated with an average increase in the probability of getting an A of $12.5$ percentage points.

5 / 18

Side note: Difference between percent change and change in percentage pointsImagine that if you study 4hrs your probability of getting an A is, on average, 70% and if you study for 5hrs that probability increases to 75%.
6 / 18

Side note: Difference between percent change and change in percentage points

Imagine that if you study 4hrs your probability of getting an A is, on average, 70% and if you study for 5hrs that probability increases to 75%.
Then, we can say that your probability increased by 5 percentage points.

6 / 18

Side note: Difference between percent change and change in percentage points

Imagine that if you study 4hrs your probability of getting an A is, on average, 70% and if you study for 5hrs that probability increases to 75%.
Then, we can say that your probability increased by 5 percentage points.
Why is this not the same as saying that your probability increased by 5%?

6 / 18

Side note: Difference between percent change and change in percentage points

Imagine that if you study 4hrs your probability of getting an A is, on average, 70% and if you study for 5hrs that probability increases to 75%.
Then, we can say that your probability increased by 5 percentage points.
Why is this not the same as saying that your probability increased by 5%?
Remember percent change?

$\frac{y_{1} - y_{0}}{y_{0}} = \frac{75 - 70}{70} = 0.0714$

6 / 18

Side note: Difference between percent change and change in percentage points

Imagine that if you study 4hrs your probability of getting an A is, on average, 70% and if you study for 5hrs that probability increases to 75%.
Then, we can say that your probability increased by 5 percentage points.
Why is this not the same as saying that your probability increased by 5%?
Remember percent change?

$\frac{y_{1} - y_{0}}{y_{0}} = \frac{75 - 70}{70} = 0.0714$

This means that, in this case, a 5 percentage point increase is equivalent to a 7% increase in probability.

6 / 18

Side note: Difference between percent change and change in percentage points

Imagine that if you study 4hrs your probability of getting an A is, on average, 70% and if you study for 5hrs that probability increases to 75%.
Then, we can say that your probability increased by 5 percentage points.
Why is this not the same as saying that your probability increased by 5%?
Remember percent change?

$\frac{y_{1} - y_{0}}{y_{0}} = \frac{75 - 70}{70} = 0.0714$

This means that, in this case, a 5 percentage point increase is equivalent to a 7% increase in probability.

Be aware of the difference in percentage points and percent!

6 / 18

Let's look at an example

Home Mortgage Disclosure Act Data (HMDA)

hmda = read.csv("https://raw.githubusercontent.com/maibennett/sta235/main/exampleSite/content/Classes/Week3/2_OLS_Issues/data/hmda.csv", stringsAsFactors = TRUE)
head(hmda)

##   deny pirat hirat     lvrat chist mhist phist unemp selfemp insurance condomin
## 1   no 0.221 0.221 0.8000000     5     2    no   3.9      no        no       no
## 2   no 0.265 0.265 0.9218750     2     2    no   3.2      no        no       no
## 3   no 0.372 0.248 0.9203980     1     2    no   3.2      no        no       no
## 4   no 0.320 0.250 0.8604651     1     2    no   4.3      no        no       no
## 5   no 0.360 0.350 0.6000000     1     1    no   3.2      no        no       no
## 6   no 0.240 0.170 0.5105263     1     1    no   3.9      no        no       no
##   afam single hschool
## 1   no     no     yes
## 2   no    yes     yes
## 3   no     no     yes
## 4   no     no     yes
## 5   no     no     yes
## 6   no     no     yes

7 / 18

Probability of someone getting a mortgage loan denied?

Getting mortgage denied (1) based on race, conditional on payments to income ratio (pirat)

hmda = hmda %>% mutate(deny = as.numeric(deny) - 1)
summary(lm(deny ~ pirat + afam, data = hmda))

## 
## Call:
## lm(formula = deny ~ pirat + afam, data = hmda)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.62526 -0.11772 -0.09293 -0.05488  1.06815 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.09051    0.02079  -4.354 1.39e-05 ***
## pirat        0.55919    0.05987   9.340  < 2e-16 ***
## afamyes      0.17743    0.01837   9.659  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3123 on 2377 degrees of freedom
## Multiple R-squared:  0.076,    Adjusted R-squared:  0.07523 
## F-statistic: 97.76 on 2 and 2377 DF,  p-value: < 2.2e-16

8 / 18

Probability of someone getting a mortgage loan denied?

Getting mortgage denied (1) based on race, conditional on payments to income ratio (pirat)

hmda = hmda %>% mutate(deny = as.numeric(deny) - 1)
summary(lm(deny ~ pirat + afam, data = hmda))

## 
## Call:
## lm(formula = deny ~ pirat + afam, data = hmda)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.62526 -0.11772 -0.09293 -0.05488  1.06815 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.09051    0.02079  -4.354 1.39e-05 ***
## pirat        0.55919    0.05987   9.340  < 2e-16 ***
## afamyes      0.17743    0.01837   9.659  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3123 on 2377 degrees of freedom
## Multiple R-squared:  0.076,    Adjusted R-squared:  0.07523 
## F-statistic: 97.76 on 2 and 2377 DF,  p-value: < 2.2e-16

Holding payment-to-income ratio constant, an AA client has a probability of getting their loan denied that is 18 pp higher, on average, than a non AA client.
Being AA is associated to an average increase of 0.177 in the probability of getting a loan denied compared to a non AA, holding payment-to-income ratio constant.

8 / 18

How does this LPM look?

9 / 18

Issues with a LPM?

Main problems:
- Non-normality of the error term
- Heteroskedasticity (i.e. variance of the error term is not constant)
- Predictions can be outside [0,1]
- LPM imposes linearity assumption

10 / 18

Issues with a LPM?

Main problems:
- Non-normality of the error term $\to$ Hypothesis testing
- Heteroskedasticity $\to$ Validity of SE
- Predictions can be outside [0,1] $\to$ Issues for prediction
- LPM imposes linearity assumption $\to$ Too strict?

11 / 18

Are there solutions?

Some solutions we will take into account:

Don't use small samples: With the CLT, non-normality shouldn't matter much.
Use robust standard errors: Package estimatr in R is great!

12 / 18

Run again with robust standard errors

library(estimatr)
model1 <- lm(deny ~ pirat + afam, data = hmda)
model2 <- lm_robust(deny ~ pirat + afam, data = hmda)

	(1)	(2)
(Intercept)	−0.091***	−0.091**
	(0.021)	(0.031)
pirat	0.559***	0.559***
	(0.060)	(0.095)
afamyes	0.177***	0.177***
	(0.018)	(0.025)
+ p < 0.1, * p < 0.05, p < 0.01, * p < 0.001

Can you interpret these parameters? Do they make sense?

13 / 18

Most issues are solvable, but...

What about prediction?

14 / 18

Logistic Regression

Typically used in the context of binary outcomes (Probit is another popular one)
Nonlinear function to model the conditional probability function of a binary outcome.

$P r (Y = 1 | X_{1}, . . ., X_{p}) = F (β_{0} + β_{1} X_{1} + . . . + β_{p} X_{p})$ Where in a logistic regression: $F (x) = \frac{1}{1 + e x p (- x)}$

In the LPM, $F (x) = x$

15 / 18

Logistic Regression

Typically used in the context of binary outcomes (Probit is another popular one)
Nonlinear function to model the conditional probability function of a binary outcome.

$P r (Y = 1 | X_{1}, . . ., X_{p}) = F (β_{0} + β_{1} X_{1} + . . . + β_{p} X_{p})$ Where in a logistic regression: $F (x) = \frac{1}{1 + e x p (- x)}$

In the LPM, $F (x) = x$
A logistic regression doesn't look pretty:

$P r (Y = 1 | X_{1}, . . ., X_{p}) = \frac{1}{1 + e^{- (β_{0} + β_{1} X_{1} + . . . + β_{p} X_{p})}}$

15 / 18

Logistic Regression

Typically used in the context of binary outcomes (Probit is another popular one)
Nonlinear function to model the conditional probability function of a binary outcome.

$P r (Y = 1 | X_{1}, . . ., X_{p}) = F (β_{0} + β_{1} X_{1} + . . . + β_{p} X_{p})$ Where in a logistic regression: $F (x) = \frac{1}{1 + e x p (- x)}$

In the LPM, $F (x) = x$
A logistic regression doesn't look pretty:

$P r (Y = 1 | X_{1}, . . ., X_{p}) = \frac{1}{1 + e^{- (β_{0} + β_{1} X_{1} + . . . + β_{p} X_{p})}}$ A regression with log(Y) is NOT a logistic regression!

15 / 18

How does this look in a plot?

16 / 18

When will we use logistic regression?

As you discovered in the readings, logit is great for prediction (much better than LPM).
For explanation, however, LPM simplifies interpretation.

17 / 18

When will we use logistic regression?

As you discovered in the readings, logit is great for prediction (much better than LPM).
For explanation, however, LPM simplifies interpretation.

Use LPM for explanation and logit for prediction

17 / 18

When will we use logistic regression?

As you discovered in the readings, logit is great for prediction (much better than LPM).
For explanation, however, LPM simplifies interpretation.

Use LPM for explanation and logit for prediction

(but remember robust SE!)

17 / 18

Takeaway points

Always make sure to check your data:
- What are analyzing? Does the data behave as I would expect? Should I exclude observations?
For LPM, always include robust standard errors!

18 / 18

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help

STA 235H - Multiple Regression: Binary Outcomes

Fall 2023

McCombs School of Business, UT Austin

Binary Outcomes

How to handle binary outcomes?

Linear Probability Models

Linear Probability Models

How to interpret a LPM?

How to interpret a LPM?

How to interpret a LPM?

How to interpret a LPM?

Side note: Difference between percent change and change in percentage points

Side note: Difference between percent change and change in percentage points

Side note: Difference between percent change and change in percentage points

Side note: Difference between percent change and change in percentage points

Side note: Difference between percent change and change in percentage points

Side note: Difference between percent change and change in percentage points

Let's look at an example

Probability of someone getting a mortgage loan denied?

Probability of someone getting a mortgage loan denied?

How does this LPM look?

Issues with a LPM?

Issues with a LPM?

Are there solutions?

Run again with robust standard errors

Logistic Regression

Logistic Regression

Logistic Regression

How does this look in a plot?

When will we use logistic regression?

When will we use logistic regression?

When will we use logistic regression?

Takeaway points

Binary Outcomes

Help