Class 10: Regression discontinuity I

# In-person<br>session 10

**March 21, 2024**

]

---

# Plan for today

.box-2.medium.sp-after-half[Diff-in-diff effect sizes]

.box-5.medium.sp-after-half[Miscellaneous R stuff]

.box-6.medium.sp-after-half[RDD fun times]

---

layout: false
name: ps5
class: center middle section-title section-title-2 animated fadeIn

# Diff-in-diff effect sizes

---

---

.box-2.large[What the heck is happening at<br>the end of problem set 5?!]

---

layout: false
name: r-stuff
class: center middle section-title section-title-5 animated fadeIn

# Miscellaneous R stuff

---

---

.box-5.large[Searching past code]

???

Use Quarto search thing; use GitHub

---

.box-5.large[Learning with<br>the example pages]

???

Copy and paste! Work through the example yourself! Don't just read it.

Like what I do in <https://bayesf22-notebook.classes.andrewheiss.com/>

---

.box-5.large[Lines across categories]

???

```r
library(tidyverse)

mpg

avg_hwy_by_class <- mpg %>% 
  filter(cyl != 5) %>% 
  mutate(cyl = factor(cyl)) %>% 
  group_by(class, cyl) %>% 
  summarize(avg = mean(hwy))

ggplot(avg_hwy_by_class, aes(x = class, y = avg, color = cyl, group = cyl)) +
  geom_point() +
  geom_line()

```

```r
terror_trends <- terror %>% 
  group_by(month, same_block_factor) %>% 
  summarize(avg_robberies = mean(car_theft))

ggplot(terror_trends, aes(x = month, y = avg_robberies, 
                          color = same_block_factor, group = same_block_factor)) +
  geom_vline(xintercept = "7") +
  geom_line(size = 2) +
  theme_bw() +
  theme(legend.position = "bottom")
```

---

layout: false
name: rdd
class: center middle section-title section-title-6 animated fadeIn

# RDD fun times

---

---

.box-6.medium[Is there a rule of thumb to determine which<br>quasi-experimental method we should use?]

.box-6.medium[How do we know which method applies<br>to which circumstance? Does the data tell us?]

---

.box-6.medium[With RDD we rely on "the rule" to<br>determine treatment and control groups]

.box-6[How do you decide on the rule?<br>You mentioned that it's arbitrary—<br>we can choose whatever rule we want?]

---

.box-6.medium[Can we use RDD to evaluate a program<br>that doesn't have a rule for participation?]

---

.box-6.large[Can we use a binary running variable?]

.box-inv-6.small[e.g. someone is eligible for a program if they complete a course]

---

.box-6.large[Do we have to limit<br>the data to a bandwidth?]

---

.box-6.large[How common are these kinds of rules<br>in the real world?]

???

- Anything income-based or means-tested - sliding scale community health clinics, school truancy programs
- Anything with a test: SAT/ACT, AIG programs
- Elections - causal effect of candidates
- Grades - 89.49 vs. 89.51
- Poverty, EITC

---

.center[
<figure>
  <img src="img/10-class/goodreads.png" alt="Goodreads" title="Goodreads" width="80%">
</figure>
]

---

.box-6.medium[Where do these eligibility thresholds come from? Do policy makers research them first and reexamine them later?]

---

---

# Discontinuities everywhere!

.pull-left-wide.small[
<table>
 <thead>
  <tr>
   <th style="text-align:center;"> Size </th>
   <th style="text-align:center;"> Annual </th>
   <th style="text-align:center;"> Monthly </th>
   <th style="text-align:center;"> 138% </th>
   <th style="text-align:center;"> 150% </th>
   <th style="text-align:center;"> 200% </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:center;"> 1 </td>
   <td style="text-align:center;"> $12,760 </td>
   <td style="text-align:center;"> $1,063 </td>
   <td style="text-align:center;"> $17,609 </td>
   <td style="text-align:center;"> $19,140 </td>
   <td style="text-align:center;"> $25,520 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 2 </td>
   <td style="text-align:center;"> $17,240 </td>
   <td style="text-align:center;"> $1,437 </td>
   <td style="text-align:center;"> $23,791 </td>
   <td style="text-align:center;"> $25,860 </td>
   <td style="text-align:center;"> $34,480 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 3 </td>
   <td style="text-align:center;"> $21,720 </td>
   <td style="text-align:center;"> $1,810 </td>
   <td style="text-align:center;"> $29,974 </td>
   <td style="text-align:center;"> $32,580 </td>
   <td style="text-align:center;"> $43,440 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 4 </td>
   <td style="text-align:center;"> $26,200 </td>
   <td style="text-align:center;"> $2,183 </td>
   <td style="text-align:center;"> $36,156 </td>
   <td style="text-align:center;"> $39,300 </td>
   <td style="text-align:center;"> $52,400 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 5 </td>
   <td style="text-align:center;"> $30,680 </td>
   <td style="text-align:center;"> $2,557 </td>
   <td style="text-align:center;"> $42,338 </td>
   <td style="text-align:center;"> $46,020 </td>
   <td style="text-align:center;"> $61,360 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 6 </td>
   <td style="text-align:center;"> $35,160 </td>
   <td style="text-align:center;"> $2,930 </td>
   <td style="text-align:center;"> $48,521 </td>
   <td style="text-align:center;"> $52,740 </td>
   <td style="text-align:center;"> $70,320 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 7 </td>
   <td style="text-align:center;"> $39,640 </td>
   <td style="text-align:center;"> $3,303 </td>
   <td style="text-align:center;"> $54,703 </td>
   <td style="text-align:center;"> $59,460 </td>
   <td style="text-align:center;"> $79,280 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 8 </td>
   <td style="text-align:center;"> $44,120 </td>
   <td style="text-align:center;"> $3,677 </td>
   <td style="text-align:center;"> $60,886 </td>
   <td style="text-align:center;"> $66,180 </td>
   <td style="text-align:center;"> $88,240 </td>
  </tr>
</tbody>
</table>
]

.box-inv-6.smaller[**ACA subsidies**<br>138–400%*]

.box-inv-6.smaller[**CHIP**<br>200%]

.box-inv-6.smaller[**SNAP/Free lunch**<br>130%]

.box-inv-6.smaller[**Reduced lunch**<br>130–185%]
]

---

# The US's official poverty measure

.pull-left.center[
<figure>
  <img src="img/10-class/orshansky.jpg" alt="Mollie Orshansky" title="Mollie Orshansky" width="70%">
  <figcaption>Mollie Orshansky</figcaption>
</figure>
]

???

- <https://www.census.gov/topics/income-poverty/poverty/about/history-of-the-poverty-measure.html>
- <https://www.ssa.gov/policy/docs/ssb/v68n3/v68n3p79.html>

---

# The US's official poverty measure

.box-6.medium[**1955 annual food budget × 3**]

<br>

---

---

.center[
<iframe width="800" height="450" src="https://www.youtube.com/embed/q9EehZlw-zk" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
]

---

.center[
<figure>
  <img src="img/10-class/eitc-phaseout.png" alt="EITC phase out" title="EITC phase out" width="75%">
</figure>
]

---

.center[
<figure>
  <img src="img/10-class/ctc-phase-out.jpg" alt="CTC phase out" title="CTC phase out" width="75%">
</figure>
]

---

.box-6.medium[Why does the cutoff need<br>to be unique to the<br>program of interest?]

???

In theory, to take care of confounding. If the same cutoff is used for two different programs, and you're evaluating one, and the other isn't automatic, peoples' use of the second program would influence their outcomes in the first one.

---

.box-6.medium[What if there are multiple cutoffs?]

---

.pull-left[
<figure>
  <img src="img/10-class/one-running-var.png" alt="One running variable" title="One running variable" width="100%">
</figure>
]

.pull-left[
<figure>
  <img src="img/10-class/multiple-running-vars.png" alt="Multiple running variables" title="Multiple running variables" width="100%">
</figure>
]

---

.box-6.large[Why do we center<br>the running variable?]

---

.box-6.large[Regression is just fancy averages!]

---

---

```r
lm(exit_exam ~ entrance_exam + tutoring,
   data = filter(tutoring, entrance_exam <= 80, 
                 entrance_exam >= 60)) %>% 
  tidy()
```

```
## # A tibble: 3 × 5
##   term          estimate std.error statistic  p.value
##   <chr>            <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)     33.2       8.64       3.84 1.43e- 4
## 2 entrance_exam    0.388     0.114      3.40 7.45e- 4
## 3 tutoringTRUE     9.27      1.31       7.09 6.27e-12
```

---

```r
tutoring_centered <- tutoring %>%
  mutate(entrance_centered = entrance_exam - 70)

lm(exit_exam ~ entrance_centered + tutoring,
   data = filter(tutoring_centered, entrance_exam <= 80, 
                 entrance_exam >= 60)) %>% 
  tidy()
```

```
## # A tibble: 3 × 5
##   term              estimate std.error statistic   p.value
##   <chr>                <dbl>     <dbl>     <dbl>     <dbl>
## 1 (Intercept)         60.4       0.752     80.3  2.99e-249
## 2 entrance_centered    0.388     0.114      3.40 7.45e-  4
## 3 tutoringTRUE         9.27      1.31       7.09 6.27e- 12
```

---

---

.box-6.large[What's the difference between weighting with kernels and inverse probability weighting?]

???

- <https://evalsp22.classes.andrewheiss.com/slides/07-slides.html#122>
- <https://evalsp22.classes.andrewheiss.com/slides/10-slides.html#87>
- <https://evalsp22.classes.andrewheiss.com/slides/10-slides.html#95>

---

.box-6.medium[There must be some math behind for the non-parametric lines. Should we care about that or should we just trust in R?]

???

- <https://evalsp22.classes.andrewheiss.com/slides/10-slides.html#75>

---

.box-6.medium[Should we control for confounders?]

---

.box-6.medium[How do we decide on the right model?]

Parametric with `$y = x$`?

With `$y = x^2 + x$`?

With `$y = x^\text{whatever} + x^\text{whatever} + x$`?

Nonparametric?

`rdrobust()` or just `lm()`?

Controls or no controls?

]

---

.box-6.medium[How do you justify a bandwidth?]

.box-6.medium[Does the bandwidth need to be<br>the same on both sides?]

---

.box-6.less-medium[How should we think about the impact of the program on people who score really high or low on the running variable?]

.box-6.less-medium[If we're throwing most of the data away and only looking at a narrow bandwidth of people, what does this say about generalizability?]

---

.box-6.medium[What do we do about noncompliance<br>and manipulation?]

.box-inv-6.medium[Fuzzy regression discontinuity!]

---

.box-6.medium[Why wait for fuzzy regression discontinuity?]

.box-inv-6.small[It's RD + instrumental variables]

---

.box-6.medium[Can other quasi-experimental<br>methods be combined too?]

.box-inv-6.small[Difference in discontinuity!<br>Diff-in-diff + RD]

.small.center[<https://doi.org/10.1016/j.jebo.2023.12.001>]

---

.box-6.huge[RD play time!]

---

.pull-left-narrow[
<figure>
  <img src="img/10-class/vigdor.png" alt="Jake Vigdor working paper" title="Jake Vigdor working paper" width="100%">
</figure>
]

.pull-right-wide.small[
> Teachers in North Carolina Public schools earn a bonus of $750 if the students in their school meet a standard called "expected growth." A summary statistic called "average growth" is computed for each school; the expected growth standard is met when this summary measure exceeds zero.

> Does getting a bonus in year `$t$` cause improved student performance in year `$t + 1$`?
]