-
Notifications
You must be signed in to change notification settings - Fork 269
/
13-ch13.Rmd
1083 lines (767 loc) · 59.5 KB
/
13-ch13.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
# Experiments and Quasi-Experiments {#eaqe}
This chapter discusses statistical tools that are commonly applied in program evaluation, where interest lies in measuring the causal effects of programs, policies or other interventions. An optimal research design for this purpose is what statisticians call an ideal randomized controlled experiment. The basic idea is to randomly assign subjects to two different groups, one that receives the treatment (the treatment group) and one that does not (the control group) and to compare outcomes for both groups in order to get an estimate of the average treatment effect.
Such *experimental* data is fundamentally different from *observational* data. For example, one might use a randomized controlled experiment to measure how much the performance of students in a standardized test differs between two classes where one has a "regular"" student-teacher ratio and the other one has fewer students. The data produced by such an experiment would be different from, e.g., the observed cross-section data on the students' performance used throughout Chapters \@ref(lrwor) to \@ref(nrf) where class sizes are not randomly assigned to students but instead are the results of an economic decision where educational objectives and budgetary aspects were balanced.
For economists, randomized controlled experiments are often difficult or even infeasible to implement. For example, due to ethic, moral and legal reasons it is practically impossible for a business owner to estimate the causal effect on the productivity of workers of setting them under psychological stress using an experiment where workers are randomly assigned either to the treatment group that is under time pressure or to the control group where work is under regular conditions, at best without knowledge of being in an experiment (see the box *The Hawthorne Effect* on p. 528 of the book).
However, sometimes external circumstances produce what is called a *quasi-experiment* or *natural experiment*. This "as if" randomness allows for estimation of causal effects that are of interest for economists using tools which are very similar to those valid for ideal randomized controlled experiments. These tools draw heavily on the theory of multiple regression and also on IV regression (see Chapter \@ref(ivr)). We will review the core aspects of these methods and demonstrate how to apply them in R using the STAR data set (see the [description](https://dataverse.harvard.edu/dataset.xhtml?persistentId=hdl:1902.1/10766) of the data set).
The following packages and their dependencies are needed for reproduction of the code chunks presented throughout this chapter:
+ `r ttcode("AER")` [@R-AER],
+ `r ttcode("dplyr")` [@R-dplyr],
+ `r ttcode("MASS")` [@R-MASS],
+ `r ttcode("mvtnorm")` [@R-mvtnorm],
+ `r ttcode("rddtools")` [@R-rddtools],
+ `r ttcode("scales")` [@R-scales],
+ `r ttcode("stargazer")`[@R-stargazer],
+ `r ttcode("tidyr")` [@R-tidyr].
Make sure the following code chunk runs without any errors.
```{r, warning=FALSE, message=FALSE, eval=FALSE}
library(AER)
library(dplyr)
library(MASS)
library(mvtnorm)
library(rddtools)
library(scales)
library(stargazer)
library(tidyr)
```
## Potential Outcomes, Causal Effects and Idealized Experiments {#poceaie}
We now briefly recap the idea of the average causal effect and how it can be estimated using the *differences estimator*. We advise you to work through Chapter 13.1 of the book for a better understanding.
#### Potential Outcomes and the average causal effect {-}
A *potential outcome* is the outcome for an individual under a potential treatment. For this individual, the causal effect of the treatment is the difference between the potential outcome if the individual receives the treatment and the potential outcome if she does not. Since this causal effect may be different for different individuals and it is not possible to measure the causal effect for a single individual, one is interested in studying the *average causal effect* of the treatment, hence also called the *average treatment effect*.
In an ideal randomized controlled experiment the following conditions are fulfilled:
1. The subjects are selected at random from the population.
2. The subjects are randomly assigned to treatment and control group.
Condition 1 guarantees that the subjects' potential outcomes are drawn randomly from the same population distribution such that the expected value of the causal effect in the sample is equal to the average causal effect in the population. Condition 2 ensures that the receipt of treatment is independent from the subjects' potential outcomes. If both conditions are fulfilled, the expected causal effect is the expected outcome in the treatment group minus the expected outcome in the control group. Using conditional expectations we have $$\text{Average causal effect} = E(Y_i\vert X_i=1) - E(Y_i\vert X_i=0),$$ where $X_i$ is a binary treatment indicator.
The average causal effect can be estimated using the *differences estimator*, which is nothing but the OLS estimator in the simple regression model
\begin{align}
Y_i = \beta_0 + \beta_1 X_i + u_i \ \ , \ \ i=1,\dots,n, (\#eq:diffest)
\end{align}
where random assignment ensures that $E(u_i\vert X_i) = 0$.
The OLS estimator in the regression model
\begin{align}
Y_i = \beta_0 + \beta_1 X_i + \beta_2 W_{1i} + \dots + \beta_{1+r} W_{ri} + u_i \ \ , \ \ i=1,\dots,n (\#eq:diffestwar)
\end{align}
with additional regressors $W_1,\dots,W_r$ is called the *differences estimator with additional regressors*. It is assumed that treatment assignment $X_i$ is random and independent of the pretreatment characteristic $W_i$. This assumption is called *conditional mean independence* and implies $$E(u_i\vert X_i , W_i) = E(u_i\vert W_i) = 0,$$ so the conditional expectation of the error $u_i$ given the treatment indicator $X_i$ and the pretreatment characteristic $W_i$ does not depend on the $X_i$. Conditional mean independence replaces the first least squares assumption in Key Concept 6.4 and thus ensures that the differences estimator of $\beta_1$ is unbiased. The *differences estimator with additional regressors* is more efficient than the *differences estimator* if the additional regressors explain some of the variation in the $Y_i$.
## Threats to Validity of Experiments
The concepts of internal and external validity discussed in Key Concept 9.1 are also applicable for studies based on experimental and quasi-experimental data. Chapter 13.2 of the book provides a thorough explanation of the particular threats to internal and external validity of experiments including examples. We limit ourselves to a short repetition of the threats listed there. Consult the book for a more detailed explanation.
#### Threats to Internal Validity {-}
1. **Failure to Randomize**
If the subjects are not randomly assigned to the treatment group, then the outcomes will be contaminated with the effect of the subjects' individual characteristics or preferences and it is not possible to obtain an unbiased estimate of the treatment effect. One can test for nonrandom assignment using a significance test ($F$-Test) on the coefficients in the regression model $$X_i = \beta_0 + \beta_1 W_{1i} + \dots +\beta_2 W_{ri} + u_i \ \ , \ \ i=1,\dots,n.$$
2. **Failure to Follow the Treatment Protocol**
If subjects do not follow the treatment protocol, i.e., some subjects in the treatment group manage to avoid receiving the treatment and/or some subjects in the control group manage to receive the treatment (*partial compliance*), there is correlation between $X_i$ und $u_i$ such that the OLS estimator of the average treatment effect will be biased. If there are data on *both* treatment actually received ($X_i$) and initial random assignment ($Z_i$), IV regression of the models \@ref(eq:diffest) and \@ref(eq:diffestwar) is a remedy.
3. **Attrition**
Attrition may result in a nonrandomly selected sample. If subjects systematically drop out of the study after being assigned to the control or the treatment group (systematic means that the reason of the dropout is related to the treatment) there will be correlation between $X_i$ and $u_i$ and hence bias in the OLS estimator of the treatment effect.
4. **Experimental Effects**
If human subjects in treatment group and/or control group know that they are in an experiment, they might adapt their behaviour in a way that prevents unbiased estimation of the treatment effect.
5. **Small Sample Sizes**
As we know from the theory of linear regression, small sample sizes lead to imprecise estimation of the coefficients and thus imply imprecise estimation of the causal effect. Furthermore, confidence intervals and hypothesis test may produce wrong inference when the sample size is small.
#### Threats to External Validity {-}
1. **Nonrepresentative Sample**
If the population studied and the population of interest are not sufficiently similar, there is no justification in generalizing the results.
2. **Nonrepresentative Program or Policy**
If the program or policy for the population studied differs considerably from the program (to be) applied to population(s) of interest, the results cannot be generalized. For example, a small-scale program with low funding might have different effects than a widely available scaled-up program that is actually implemented. There are other factors like duration and the extent of monitoring that should be considered here.
3. **General Equilibrium Effects**
If market and/or environmental conditions cannot be kept constant when an internally valid program is implemented broadly, external validity may be doubtful.
## Experimental Estimates of the Effect of Class Size Reductions
### Experimental Design and the Data Set {-}
The Project *Student-Teacher Achievement Ratio* (STAR) was a large randomized controlled experiment with the aim of asserting whether a class size reduction is effective in improving education outcomes. It was conducted in 80 Tennessee elementary schools over a period of four years during the 1980s by the State Department of Education.
In the first year, about 6400 students were randomly assigned into one of three interventions: small class (13 to 17 students per teacher), regular class (22 to 25 students per teacher), and regular-with-aide class (22 to 25 students with a full-time teacher's aide). Teachers were also randomly assigned to the classes they taught. The interventions were initiated as the students entered school in kindergarten and continued through to third grade. Control and treatment groups across grades are summarized in Table \@ref(tab:starstructure).
| | K | 1 | 2 | 3 |
|-----------------|----------------------|----------------------|----------------------|----------------------|
| Treatment 1 | Small class | Small class | Small class | Small class |
| Treatment 2 | Regular class + aide | Regular class + aide | Regular class + aide | Regular class + aide |
| Control | Regular class | Regular class | Regular class | Regular class |
Table: (\#tab:starstructure) Control and treatment groups in the STAR experiment
Each year, the students' learning progress was assessed using the sum of the points scored on the math and reading parts of a standardized test (the [Stanford Achievement Test](https://en.wikipedia.org/wiki/Stanford_Achievement_Test_Series)).
The STAR data set is part of the package `r ttcode("AER")`.
```{r, message=FALSE}
# load the package AER and the STAR dataset
library(AER)
data(STAR)
```
`r ttcode("head(STAR)")` shows that there is a variety of factor variables that describe student and teacher characteristics as well as various school indicators, all of which are separately recorded for the four different grades. The data is in *wide format*. That is, each variable has its own column and for each student, the rows contain observations on these variables. Using `r ttcode("dim(STAR)")` we find that there are a total of 11598 observations on 47 variables.
```{r}
# get an overview
head(STAR, 2)
dim(STAR)
```
```{r}
# get variable names
names(STAR)
```
A majority of the variable names contain a suffix (`r ttcode("k")`, `r ttcode("1")`, `r ttcode("2")` or `r ttcode("3")`) stating the grade which the respective variable is referring to. This facilitates regression analysis because it allows to adjust the `r ttcode("formula")` argument in `r ttcode("lm()")` for each grade by simply changing the variables' suffixes accordingly.
The outcome produced by `r ttcode("head()")` shows that some recorded values are `r ttcode("NA")` and thus, there is no data on this variable for the student under consideration. This lies in the nature of the data: for example, take the first observation `STAR[1,]`.
In the output of `head(STAR, 2)` we find that the student entered the experiment in third grade in a regular class, which is why the class size is recorded in `r ttcode("star3")` and the other class type indicator variables are `r ttcode("NA")`. Her math and reading scores for the third grade are available; however, recordings for other grades are not present for the same reason. Obtaining only her non-missing (non-`r ttcode("NA")`) recordings is straightforward: simply eliminate the `r ttcode("NA")`s using the `r ttcode("!is.na()")` function.
```{r}
# drop NA recordings for the first observation and print to the console
STAR[1, !is.na(STAR[1, ])]
```
`is.na(STAR[1, ])` returns a logical vector with `r ttcode("TRUE")` at positions that correspond to `r ttcode("<NA>")` entries for the first observation. The `r ttcode("!")` operator is used to invert the result such that we obtain only non-`r ttcode("<NA>")` entries for the first observations.
In general it is not necessary to remove rows with missing data because `r ttcode("lm()")` does so by default. Missing data may imply a small sample size and thus may lead to imprecise estimation and wrong inference. This is, however, not an issue for the study at hand since, as we will see below, sample sizes lie beyond 5000 observations for each regression conducted.
### Analysis of the STAR Data {-}
As can be seen from Table \@ref(tab:starstructure) that there are two treatment groups in each grade, small classes with only 13 to 17 students and regular classes with 22 to 25 students and a teaching aide. Thus, two binary variables, each being an indicator for the respective treatment group, are introduced for the differences estimator to capture the treatment effect for each treatment group separately. This yields the population regression model
\begin{align}
Y_i = \beta_0 + \beta_1 SmallClass_i + \beta_2 RegAide_i + u_i, (\#eq:starpopreg)
\end{align}
with test score $Y_i$ which is the small class indicator $SmallClass_i$ and $RegAide_i$ which is the indicator for a regular class with aide.
We reproduce the results presented in Table 13.1 of the book by performing the regression \@ref(eq:starpopreg) for each grade separately. For each student, the dependent variable is simply the sum of the points scored in the math and reading parts, constructed using `r ttcode("I()")`.
```{r}
# compute differences Estimates for each grades
fmk <- lm(I(readk + mathk) ~ stark, data = STAR)
fm1 <- lm(I(read1 + math1) ~ star1, data = STAR)
fm2 <- lm(I(read2 + math2) ~ star2, data = STAR)
fm3 <- lm(I(read3 + math3) ~ star3, data = STAR)
```
```{r}
# obtain coefficient matrix using robust standard errors
coeftest(fmk, vcov = vcovHC, type= "HC1")
coeftest(fm1, vcov = vcovHC, type= "HC1")
coeftest(fm2, vcov = vcovHC, type= "HC1")
coeftest(fm3, vcov = vcovHC, type= "HC1")
```
We gather the results and present them in a table using `r ttcode("stargazer()")`.
```{r}
# compute robust standard errors for each model and gather them in a list
rob_se_1 <- list(sqrt(diag(vcovHC(fmk, type = "HC1"))),
sqrt(diag(vcovHC(fm1, type = "HC1"))),
sqrt(diag(vcovHC(fm2, type = "HC1"))),
sqrt(diag(vcovHC(fm3, type = "HC1"))))
```
```{r, message=F, warning=F, results='asis', eval=FALSE}
library(stargazer)
stargazer(fmk,fm1,fm2,fm3,
title = "Project STAR: Differences Estimates",
header = FALSE,
type = "latex",
model.numbers = F,
omit.table.layout = "n",
digits = 3,
column.labels = c("K", "1", "2", "3"),
dep.var.caption = "Dependent Variable: Grade",
dep.var.labels.include = FALSE,
se = rob_se_1)
```
<!--html_preserve-->
```{r, message=F, warning=F, results='asis', echo=F, purl=F, eval=my_output=="html"}
library(stargazer)
stargazer(fmk,fm1,fm2,fm3,
header = FALSE,
type = "html",
model.numbers = F,
omit.table.layout = "n",
digits = 2,
column.labels = c("K", "1", "2", "3"),
dep.var.caption = "Dependent Variable: Grade",
dep.var.labels.include = FALSE,
se = rob_se_1)
stargazer_html_title("Project STAR - Differences Estimates", "psde")
```
<!--/html_preserve-->
```{r, message=F, warning=F, results='asis', echo=F, purl=F, eval=my_output=="latex"}
library(stargazer)
stargazer(fmk,fm1,fm2,fm3,
title = "\\label{tab:psde} Project STAR - Differences Estimates",
header = FALSE,
digits = 3,
type = "latex",
float.env = "sidewaystable",
column.sep.width = "-3pt",
model.numbers = F,
omit.table.layout = "n",
column.labels = c("K", "1", "2", "3"),
dep.var.caption = "Dependent Variable: Grade",
dep.var.labels.include = FALSE,
se = rob_se_1)
```
The estimates presented in Table \@ref(tab:psde) suggest that the class size reduction improves student performance. Except for grade 1, the estimates of the coefficient on $SmallClass$ are roughly of the same magnitude (the estimates lie between 13.90 and 19.39 points) and they are statistically significant at $1\%$. Furthermore, a teaching aide has little, possibly zero, effect on the performance of the students.
Following the book, we augment the regression model \@ref(eq:starpopreg) by different sets of regressors for two reasons:
1. If the additional regressors explain some of the observed variation in the dependent variable, we obtain more efficient estimates of the coefficients of interest.
2. If the treatment is not received at random due to failures to follow the treatment protocol (see Chapter 13.3 of the book), the estimates obtained using \@ref(eq:starpopreg) may be biased. Adding additional regressors may solve or mitigate this problem.
In particular, we consider the following student and teacher characteristics
- $experience$ --- Teacher's years of experience
- $boy$ --- Student is a boy (dummy)
- $lunch$ --- Free lunch eligibility (dummy)
- $black$ --- Student is African-American (dummy)
- $race$ --- Student's race is other than black or white (dummy)
- $\text{schoolid}$ --- School indicator variables
in the four population regression specifications
\begin{align}
Y_i =& \beta_0 + \beta_1 SmallClass_i + \beta_2 RegAide_i + u_i, (\#eq:augstarpopreg1) \\
Y_i =& \beta_0 + \beta_1 SmallClass_i + \beta_2 RegAide_i + \beta_3 experience_i + u_i, (\#eq:augstarpopreg2) \\
Y_i =& \beta_0 + \beta_1 SmallClass_i + \beta_2 RegAide_i + \beta_3 experience_i + schoolid + u_i, (\#eq:augstarpopreg3)
\end{align}
and
\begin{align}
Y_i =& \beta_0 + \beta_1 SmallClass_i + \beta_2 RegAide_i + \beta_3 experience_i + \beta_4 boy + \beta_5 lunch \\
& + \beta_6 black + \beta_7 race + schoolid + u_i. (\#eq:augstarpopreg4)
\end{align}
Prior to estimation, we perform some subsetting and data wrangling using functions from the packages `r ttcode("dplyr")` and `r ttcode("tidyr")`. These are both part of `r ttcode("tidyverse")`, a collection of `r ttcode("R")` packages designed for data science and handling big datasets (see the [official site](https://www.tidyverse.org/) for more on `r ttcode("tidyverse")` packages). The functions `r ttcode("%>%")`, `r ttcode("transmute()")` and `r ttcode("mutate()")` are sufficient for us here:
+ `r ttcode("%>%")` allows to chain function calls.
+ `r ttcode("transmute()")` allows to subset the data set by naming the variables to be kept.
+ `r ttcode("mutate()")` is convenient for adding new variables based on existing ones while preserving the latter.
The regression models \@ref(eq:augstarpopreg1) to \@ref(eq:augstarpopreg4) require the variables `r ttcode("gender")`, `r ttcode("ethnicity")`, `r ttcode("stark")`, `r ttcode("readk")`, `r ttcode("mathk")`, `r ttcode("lunchk")`, `r ttcode("experiencek")` and `r ttcode("schoolidk")`. After dropping the remaining variables using `r ttcode("transmute()")`, we use `r ttcode("mutate()")` to add three additional binary variables which are derivatives of existing ones: `r ttcode("black")`, `r ttcode("race")` and `r ttcode("boy")`. They are generated using logical statements within the function `r ttcode("ifelse()")`.
```{r, message=FALSE, warning=FALSE}
# load packages 'dplyr' and 'tidyr' for data wrangling functionalities
library(dplyr)
library(tidyr)
# generate subset with kindergarten data
STARK <- STAR %>%
transmute(gender,
ethnicity,
stark,
readk,
mathk,
lunchk,
experiencek,
schoolidk) %>%
mutate(black = ifelse(ethnicity == "afam", 1, 0),
race = ifelse(ethnicity == "afam" | ethnicity == "cauc", 1, 0),
boy = ifelse(gender == "male", 1, 0))
```
```{r, message=FALSE, warning=FALSE}
# estimate the models
gradeK1 <- lm(I(mathk + readk) ~ stark + experiencek,
data = STARK)
gradeK2 <- lm(I(mathk + readk) ~ stark + experiencek + schoolidk,
data = STARK)
gradeK3 <- lm(I(mathk + readk) ~ stark + experiencek + boy + lunchk
+ black + race + schoolidk,
data = STARK)
```
For brevity, we exclude the coefficients for the indicator dummies in the output of the `r ttcode("coeftest()")`by subsetting the matrices.
```{r, message=FALSE, warning=FALSE}
# obtain robust inference on the significance of coefficients
coeftest(gradeK1, vcov. = vcovHC, type = "HC1")
coeftest(gradeK2, vcov. = vcovHC, type = "HC1")[1:4, ]
coeftest(gradeK3, vcov. = vcovHC, type = "HC1")[1:7, ]
```
We now use `r ttcode("stargazer()")` to gather all relevant information in a structured table.
```{r}
# compute robust standard errors for each model and gather them in a list
rob_se_2 <- list(sqrt(diag(vcovHC(fmk, type = "HC1"))),
sqrt(diag(vcovHC(gradeK1, type = "HC1"))),
sqrt(diag(vcovHC(gradeK2, type = "HC1"))),
sqrt(diag(vcovHC(gradeK3, type = "HC1"))))
```
```{r, message=F, warning=F, results='asis', eval=FALSE}
stargazer(fmk, gradeK1, gradeK2, gradeK3,
title = "Project STAR - Differences Estimates with
Additional Regressors for Kindergarten",
header = FALSE,
type = "latex",
model.numbers = F,
omit.table.layout = "n",
digits = 3,
column.labels = c("(1)", "(2)", "(3)", "(4)"),
dep.var.caption = "Dependent Variable: Test Score in Kindergarten",
dep.var.labels.include = FALSE,
se = rob_se_2)
```
<!--html_preserve-->
```{r, message=F, warning=F, results='asis', echo=F, purl=F, eval=my_output=="html"}
stargazer(fmk,gradeK1,gradeK2,gradeK3,
header = FALSE,
type = "html",
model.numbers = F,
omit.table.layout = "n",
digits = 3,
column.labels = c("(1)", "(2)", "(3)", "(4)"),
dep.var.caption = "Dependent Variable: Test Score in Kindergarten",
dep.var.labels.include = FALSE,
se = rob_se_2,
omit = "schoolid",
nobs = TRUE,
add.lines = list(
c("School indicators?", "no", "no", "yes", "yes")
)
)
stargazer_html_title("Project STAR - Differences Estimates with Additional Regressors for Kindergarten", "psdewarfk")
```
<!--/html_preserve-->
```{r, message=F, warning=F, results='asis', echo=F, purl=F, eval=my_output=="latex"}
stargazer(fmk,gradeK1,gradeK2,gradeK3,
title = "\\label{tab:psdewarfk} Project STAR - Differences Estimates with
Additional Regressors for Kindergarten",
header = FALSE,
digits = 3,
type = "latex",
float.env = "sidewaystable",
column.sep.width = "-5pt",
model.numbers = F,
omit.table.layout = "n",
column.labels = c("(1)", "(2)", "(3)", "(4)"),
dep.var.caption = "Dependent Variable: Test Score in Kindergarten",
dep.var.labels.include = FALSE,
se = rob_se_2,
omit = "schoolid",
nobs = TRUE,
add.lines = list(
c("School indicators?", "no", "no", "yes", "yes")
)
)
```
The results in column (1) of Table \@ref(tab:psdewarfk) same as the results obtained for \@ref(eq:starpopreg). Columns (2) to (4) reveal that adding student characteristics and school fixed effects does not lead to substantially different estimates of the treatment effects. This result makes it more plausible that the estimates of the effects obtained using model \@ref(eq:starpopreg) do not suffer from failure of random assignment. There is some decrease in the standard errors and some increase in $\bar{R}^2$, implying that the estimates are more precise.
Because teachers were randomly assigned to classes, inclusion of school fixed effect allows us to estimate the causal effect of a teacher's experience on test scores of students in kindergarten. Regression (3) predicts the average effect of 10 years experience on test scores to be $10\cdot 0.74=7.4$ points. Be aware that the other estimates on student characteristics in regression (4) *do not* have causal interpretation due to nonrandom assignment (see Chapter 13.3 of the book for a detailed discussion).
Are the estimated effects presented in Table \@ref(tab:psdewarfk) large or small in a practical sense? Let us translate the predicted changes in test scores to units of standard deviation in order to allow for a comparison (see Section \@ref(etsacs) for a similar argument).
```{r}
# compute the sample standard deviations of test scores
SSD <- c("K" = sd(na.omit(STAR$readk + STAR$mathk)),
"1" = sd(na.omit(STAR$read1 + STAR$math1)),
"2" = sd(na.omit(STAR$read2 + STAR$math2)),
"3" = sd(na.omit(STAR$read3 + STAR$math3)))
# translate the effects of small classes to standard deviations
Small <- c("K" = as.numeric(coef(fmk)[2]/SSD[1]),
"1" = as.numeric(coef(fm1)[2]/SSD[2]),
"2" = as.numeric(coef(fm2)[2]/SSD[3]),
"3" = as.numeric(coef(fm3)[2]/SSD[4]))
# adjust the standard errors
SmallSE <- c("K" = as.numeric(rob_se_1[[1]][2]/SSD[1]),
"1" = as.numeric(rob_se_1[[2]][2]/SSD[2]),
"2" = as.numeric(rob_se_1[[3]][2]/SSD[3]),
"3" = as.numeric(rob_se_1[[4]][2]/SSD[4]))
# translate the effects of regular classes with aide to standard deviations
RegAide<- c("K" = as.numeric(coef(fmk)[3]/SSD[1]),
"1" = as.numeric(coef(fm1)[3]/SSD[2]),
"2" = as.numeric(coef(fm2)[3]/SSD[3]),
"3" = as.numeric(coef(fm3)[3]/SSD[4]))
# adjust the standard errors
RegAideSE <- c("K" = as.numeric(rob_se_1[[1]][3]/SSD[1]),
"1" = as.numeric(rob_se_1[[2]][3]/SSD[2]),
"2" = as.numeric(rob_se_1[[3]][3]/SSD[3]),
"3" = as.numeric(rob_se_1[[4]][3]/SSD[4]))
# gather the results in a data.frame and round
df <- t(round(data.frame(
Small, SmallSE, RegAide, RegAideSE, SSD),
digits = 2))
```
It is fairly easy to turn the `r ttcode("data.frame")` `r ttcode("df")` into a table.
```{r, eval=FALSE}
# generate a simple table using stargazer
stargazer(df,
title = "Estimated Class Size Effects
(in Units of Standard Deviations)",
type = "html",
summary = FALSE,
header = FALSE
)
```
<!--html_preserve-->
```{r, message=F, warning=F, results='asis', echo=F, purl=F, eval=my_output=="html"}
stargazer(df,
type = "html",
header = FALSE,
summary = FALSE)
stargazer_html_title("Estimated Class Size Effects
(in Units of Standard Deviations)", "ecse")
```
<!--/html_preserve-->
```{r, message=F, warning=F, results='asis', echo=F, purl=F, eval=my_output=="latex"}
stargazer(df,
title = "\\label{tab:ecse} Estimated Class Size Effects
(in Units of Standard Deviations)",
type = "latex",
header = FALSE,
summary = FALSE)
```
The estimated effect of a small class is largest for grade 1. As pointed out in the book, this is probably because students in the control group for grade 1 did poorly on the test for some unknown reason or simply due to random variation. The difference between the estimated effect of being in a small class and being in a regular class with an aide is roughly 0.2 standard deviations for all grades. This leads to the conclusion that the effect of being in a regular sized class with an aide is zero and the effect of being in a small class is roughly the same for all grades.
The remainder of Chapter 13.3 in the book discusses to what extent these experimental estimates are comparable with observational estimates obtained using data on school districts in California and Massachusetts in Chapter \@ref(asbomr). It turns out that the estimates are indeed very similar. Please refer to the aforementioned section in the book for a more detailed discussion.
## Quasi Experiments {#qe}
In quasi-experiments, "as if" randomness is exploited to use methods similar to those that have been discussed in the previous chapter. There are two types of quasi-experiments:^[See Chapter 13.4 of the book for some example studies that are based on quasi-experiments.]
1. Random variations in individual circumstances allow to view the treatment "as if" it was randomly determined.
2. The treatment is only partially determined by "as if" random variation.
The former allows to estimate the effect using either model \@ref(eq:diffestwar), i.e., the *difference estimator with additional regressors*, or, if there is doubt that the "as if" randomness does not entirely ensure that there are no systematic differences between control and treatment group, using the *differences-in-differences* (DID) estimator. In the latter case, an IV approach for estimation of a model like \@ref(eq:diffestwar) which uses the source of "as if" randomness in treatment assignment as the instrument may be applied.
Some more advanced techniques that are helpful in settings where the treatment assignment is (partially) determined by a threshold in a so-called running variable are *sharp regression discontinuity design* (RDD) and *fuzzy regression discontinuity design* (FRDD).
We briefly review these techniques and, since the book does not provide any empirical examples in this section, we will use our own simulated data in a minimal example to discuss how DID, RDD and FRDD can be applied in `r ttcode("R")`.
### The Differences-in-Differences Estimator {-}
In quasi-experiments the source of "as if" randomness in treatment assignment can often not entirely prevent systematic differences between control and treatment groups. This problem was encountered by @card1994 who use geography as the "as if" random treatment assignment to study the effect on employment in fast-food restaurants caused by an increase in the state minimum wage in New Jersey in the year of 1992. Their idea was to use the fact that the increase in minimum wage applied to employees in New Jersey (treatment group) but not to those living in neighboring Pennsylvania (control group).
It is quite conceivable that such a wage hike is not correlated with other determinants of employment. However, there still might be some state-specific differences and thus differences between control and treatment group. This would render the *differences estimator* biased and inconsistent. @card1994 solved this by using a DID estimator: they collected data in February 1992 (before the treatment) and November 1992 (after the treatment) for the same restaurants and estimated the effect of the wage hike by analyzing differences in the differences in employment for New Jersey and Pennsylvania before and after the increase.^[Also see the box *What is the Effect on Employment of the Minimum Wage?* in Chapter 13.4 of the book.] The DID estimator is
\begin{align}
\widehat{\beta}_1^{\text{diffs-in-diffs}} =& \, (\overline{Y}^{\text{treatment,after}} - \overline{Y}^{\text{treatment,before}}) - (\overline{Y}^{\text{control,after}} - \overline{Y}^{\text{control,before}}) \\
=& \Delta \overline{Y}^{\text{treatment}} - \Delta \overline{Y}^{\text{control}}, (\#eq:DID)
\end{align}
with
+ $\overline{Y}^{\text{treatment,before}}$ - the sample average in the treatment group before the treatment
+ $\overline{Y}^{\text{treatment,after}}$ - the sample average in the treatment group after the treatment
+ $\overline{Y}^{\text{treatment,before}}$ - the sample average in the control group before the treatment
+ $\overline{Y}^{\text{treatment,after}}$ - the sample average in the control group after the treatment.
We now use `r ttcode("R")` to reproduce Figure 13.1 of the book.
```{r, fig.align='center'}
# initialize plot and add control group
plot(c(0, 1), c(6, 8),
type = "p",
ylim = c(5, 12),
xlim = c(-0.3, 1.3),
main = "The Differences-in-Differences Estimator",
xlab = "Period",
ylab = "Y",
col = "steelblue",
pch = 20,
xaxt = "n",
yaxt = "n")
axis(1, at = c(0, 1), labels = c("before", "after"))
axis(2, at = c(0, 13))
# add treatment group
points(c(0, 1, 1), c(7, 9, 11),
col = "darkred",
pch = 20)
# add line segments
lines(c(0, 1), c(7, 11), col = "darkred")
lines(c(0, 1), c(6, 8), col = "steelblue")
lines(c(0, 1), c(7, 9), col = "darkred", lty = 2)
lines(c(1, 1), c(9, 11), col = "black", lty = 2, lwd = 2)
# add annotations
text(1, 10, expression(hat(beta)[1]^{DID}), cex = 0.8, pos = 4)
text(0, 5.5, "s. mean control", cex = 0.8 , pos = 4)
text(0, 6.8, "s. mean treatment", cex = 0.8 , pos = 4)
text(1, 7.9, "s. mean control", cex = 0.8 , pos = 4)
text(1, 11.1, "s. mean treatment", cex = 0.8 , pos = 4)
```
The DID estimator \@ref(eq:DID) can also be written in regression notation: $\widehat{\beta}_1^{\text{DID}}$ is the OLS estimator of $\beta_1$ in
\begin{align}
\Delta Y_i = \beta_0 + \beta_1 X_i + u_i, (\#eq:did)
\end{align}
where $\Delta Y_i$ denotes the difference in pre- and post-treatment outcomes of individual $i$ and $X_i$ is the treatment indicator.
Adding additional regressors that measure pre-treatment characteristics to \@ref(eq:did) we obtain
\begin{align}
\Delta Y_i = \beta_0 + \beta_1 X_i + \beta_2 W_{1i} + \dots + \beta_{1+r} W_{ri} + u_i, (\#eq:didwar)
\end{align}
the *difference-in-differences estimator* with additional regressors. The additional regressors may lead to a more precise estimate of $\beta_1$.
We keep things simple and focus on estimation of the treatment effect using DID in the simplest case, that is a control and a treatment group observed for two time periods --- one before and one after the treatment. In particular, we will see that there are three different ways to proceed.
First, we simulate pre- and post-treatment data using `r ttcode("R")`.
```{r}
# set sample size
n <- 200
# define treatment effect
TEffect <- 4
# generate treatment dummy
TDummy <- c(rep(0, n/2), rep(1, n/2))
# simulate pre- and post-treatment values of the dependent variable
y_pre <- 7 + rnorm(n)
y_pre[1:n/2] <- y_pre[1:n/2] - 1
y_post <- 7 + 2 + TEffect * TDummy + rnorm(n)
y_post[1:n/2] <- y_post[1:n/2] - 1
```
Next we plot the data. The function `r ttcode("jitter()")` is used to add some artificial dispersion in the horizontal component of the points so that there is less overplotting. The function `r ttcode("alpha()")` from the package `r ttcode("scales")` allows to adjust the opacity of colors used in plots.
```{r, fig.align='center'}
library(scales)
pre <- rep(0, length(y_pre[TDummy==0]))
post <- rep(1, length(y_pre[TDummy==0]))
# plot control group in t=1
plot(jitter(pre, 0.6),
y_pre[TDummy == 0],
ylim = c(0, 16),
col = alpha("steelblue", 0.3),
pch = 20,
xlim = c(-0.5, 1.5),
ylab = "Y",
xlab = "Period",
xaxt = "n",
main = "Artificial Data for DID Estimation")
axis(1, at = c(0, 1), labels = c("before", "after"))
# add treatment group in t=1
points(jitter(pre, 0.6),
y_pre[TDummy == 1],
col = alpha("darkred", 0.3),
pch = 20)
# add control group in t=2
points(jitter(post, 0.6),
y_post[TDummy == 0],
col = alpha("steelblue", 0.5),
pch = 20)
# add treatment group in t=2
points(jitter(post, 0.6),
y_post[TDummy == 1],
col = alpha("darkred", 0.5),
pch = 20)
```
After the treatment, both the control and treatment groups have a higher mean, but the increase is stronger in the treatment group. Using DID we may estimate how much of that difference is due to the treatment.
It is straightforward to compute the DID estimate in the fashion of \@ref(eq:DID).
```{r}
# compute the DID estimator for the treatment effect 'by hand'
mean(y_post[TDummy == 1]) - mean(y_pre[TDummy == 1]) -
(mean(y_post[TDummy == 0]) - mean(y_pre[TDummy == 0]))
```
Notice that the estimate is close to $4$, the value chosen as the treatment effect `r ttcode("TEffect")` above. Since \@ref(eq:did) is a simple linear model, we may perform OLS estimation of this regression specification using `r ttcode("lm()")`.
```{r}
# compute the DID estimator using a linear model
lm(I(y_post - y_pre) ~ TDummy)
```
We find that the estimates coincide. Furthermore, one can show that the DID estimate obtained by estimating specification \@ref(eq:did) is the same as the OLS estimate of $\beta_{TE}$ in
\begin{align}
Y_i =& \beta_0 + \beta_1 D_i + \beta_2 Period_i + \beta_{TE} (Period_i \times D_i) + \varepsilon_i, (\#eq:DIDint)
\end{align}
where $D_i$ is the binary treatment indicator, $Period_i$ is a binary indicator for the after-treatment period and the $Period_i \times D_i$ is the interaction of both.
As for \@ref(eq:did), estimation of \@ref(eq:DIDint) using `r ttcode("R")` is straightforward. See Chapter \@ref(nrf) for a discussion of interaction terms.
```{r}
# prepare data for DID regression using the interaction term
d <- data.frame("Y" = c(y_pre,y_post),
"Treatment" = TDummy,
"Period" = c(rep("1", n), rep("2", n)))
# estimate the model
lm(Y ~ Treatment * Period, data = d)
```
As expected, the estimate of the coefficient on the interaction of the treatment dummy and the time dummy coincide with the estimates obtained using \@ref(eq:DID) and OLS estimation of \@ref(eq:did).
### Regression Discontinuity Estimators {-}
Consider the model
\begin{align}
Y_i =& \beta_0 + \beta_1 X_i + \beta_2 W_i + u_i (\#eq:SRDDsetting)
\end{align}
and let
\begin{align*}
X_i =&
\begin{cases}
1, & W_i \geq c \\
0, & W_i < c,
\end{cases}
\end{align*}
so that the receipt of treatment, $X_i$, is determined by some threshold $c$ of a continuous variable $W_i$, the so called running variable. The idea of *regression discontinuity design* is to use observations with a $W_i$ close to $c$ for estimation of $\beta_1$. $\beta_1$ is the average treatment effect for individuals with $W_i = c$ which is assumed to be a good approximation to the treatment effect in the population. \@ref(eq:SRDDsetting) is called a *sharp regression discontinuity design* because treatment assignment is deterministic and discontinuous at the cutoff: all observations with $W_i < c$ do not receive treatment and all observations where $W_i \geq c$ are treated.
The subsequent code chunks show how to estimate a linear SRDD using `r ttcode("R")` and how to produce plots in the way of Figure 13.2 of the book.
```{r, message=FALSE}
# generate some sample data
W <- runif(1000, -1, 1)
y <- 3 + 2 * W + 10 * (W>=0) + rnorm(1000)
```
```{r, fig.align='center', message=FALSE}
# load the package 'rddtools'
library(rddtools)
# construct rdd_data
data <- rdd_data(y, W, cutpoint = 0)
# plot the sample data
plot(data,
col = "steelblue",
cex = 0.35,
xlab = "W",
ylab = "Y")
```
The argument `r ttcode("nbins")` sets the number of bins the running variable is divided into for aggregation. The dots represent bin averages of the outcome variable.
We may use the function `r ttcode("rdd_reg_lm()")` to estimate the treatment effect using model \@ref(eq:SRDDsetting) for the artificial data generated above. By choosing `r ttcode('slope = "same"')` we restrict the slopes of the estimated regression function to be the same on both sides of the jump at the cutpoint $W=0$.
```{r, message=FALSE, warning=FALSE}
# estimate the sharp RDD model
rdd_mod <- rdd_reg_lm(rdd_object = data,
slope = "same")
summary(rdd_mod)
```
The coefficient estimate of interest is labeled `r ttcode("D")`. The estimate is very close to the treatment effect chosen in the DGP above.
It is easy to visualize the result: simply call `r ttcode("plot()")` on the estimated model object.
```{r fig.align='center'}
# plot the RDD model along with binned observations
plot(rdd_mod,
cex = 0.35,
col = "steelblue",
xlab = "W",
ylab = "Y")
```
As above, the dots represent averages of binned observations.
So far we assumed that crossing of the threshold determines receipt of treatment so that the jump of the population regression functions at the threshold can be regarded as the causal effect of the treatment.
When crossing the threshold $c$ is not the only cause for receipt of the treatment, treatment is no a deterministic function of $W_i$. Instead, it is useful to think of $c$ as a threshold where the *probability* of receiving the treatment jumps.
This jump may be due to unobservable variables that have impact on the probability of being treated. Thus, $X_i$ in \@ref(eq:SRDDsetting) will be correlated with the error $u_i$ and it becomes more difficult to consistently estimate the treatment effect. In this setting, using a *fuzzy regression discontinuity design* which is based an IV approach may be a remedy: take the binary variable $Z_i$ as an indicator for crossing of the threshold,
\begin{align*}
Z_i = \begin{cases}
1, & W_i \geq c \\
0, & W_i < c,
\end{cases}
\end{align*}
and assume that $Z_i$ relates to $Y_i$ only through the treatment indicator $X_i$. Then $Z_i$ and $u_i$ are uncorrelated but $Z_i$ influences receipt of treatment so it is correlated with $X_i$. Thus, $Z_i$ is a valid instrument for $X_i$ and \@ref(eq:SRDDsetting) can be estimated using TSLS.
The following code chunk generates sample data where observations with a value of the running variable $W_i$ below the cutoff $c=0$ do not receive treatment and observations with $W_i \geq 0$ do receive treatment with a probability of $80\%$ so that treatment status is only partially determined by the running variable and the cutoff. Treatment leads to an increase in $Y$ by $2$ units. Observations with $W_i \geq 0$ that do not receive treatment are called *no-shows*: think of an individual that was assigned to receive the treatment but somehow manages to avoid it.
```{r, message=FALSE}
library(MASS)
# generate sample data
mu <- c(0, 0)
sigma <- matrix(c(1, 0.7, 0.7, 1), ncol = 2)
set.seed(1234)
d <- as.data.frame(mvrnorm(2000, mu, sigma))
colnames(d) <- c("W", "Y")
# introduce fuzziness
d$treatProb <- ifelse(d$W < 0, 0, 0.8)
fuzz <- sapply(X = d$treatProb, FUN = function(x) rbinom(1, 1, prob = x))
# treatment effect
d$Y <- d$Y + fuzz * 2
```
`r ttcode("sapply()")` applies the function provided to `r ttcode("FUN")` to every element of the argument `r ttcode("X")`. Here, since `r ttcode("d$treatProb")` is a vector, the result is a vector, too.
We plot all observations and use blue color to mark individuals that did not receive the treatment and use red color for those who received the treatment.
```{r, fig.align='center'}
# generate a colored plot of treatment and control group
plot(d$W, d$Y,
col = c("steelblue", "darkred")[factor(fuzz)],
pch= 20,
cex = 0.5,
xlim = c(-3, 3),
ylim = c(-3.5, 5),
xlab = "W",
ylab = "Y")
# add a dashed vertical line at cutoff
abline(v = 0, lty = 2)
#add legend
legend("topleft",pch=20,col=c("steelblue","darkred"),
legend=c("Do not receive treatment","Receive treatment"))
```
Obviously, receipt of treatment is no longer a deterministic function of the running variable $W$. Some observations with $W\geq0$ *did not* receive the treatment. We may estimate a FRDD by additionally setting `r ttcode("treatProb")` as the assignment variable `r ttcode("z")` in `r ttcode("rdd_data()")`. Then `r ttcode("rdd_reg_lm()")` applies the following TSLS procedure: treatment is predicted using $W_i$ and the cutoff dummy $Z_i$, the instrumental variable, in the first stage regression. The fitted values from the first stage regression are used to obtain a consistent estimate of the treatment effect using the second stage where the outcome $Y$ is regressed on the fitted values and the running variable $W$.
```{r}
# estimate the Fuzzy RDD
data <- rdd_data(d$Y, d$W,
cutpoint = 0,
z = d$treatProb)
frdd_mod <- rdd_reg_lm(rdd_object = data,
slope = "same")
frdd_mod
```
The estimate is close to $2$, the population treatment effect. We may call `r ttcode("plot()")` on the model object to obtain a figure consisting of binned data and the estimated regression function.
```{r, fig.align='center'}
# plot estimated FRDD function
plot(frdd_mod,
cex = 0.5,
lwd = 0.4,
xlim = c(-4, 4),
ylim = c(-3.5, 5),
xlab = "W",
ylab = "Y")
```
What if we used a SRDD instead, thereby ignoring the fact that treatment is not perfectly determined by the cutoff in $W$? We may get an impression of the consequences by estimating an SRDD using the previously simulated data.
```{r}
# estimate SRDD
data <- rdd_data(d$Y,
d$W,
cutpoint = 0)
srdd_mod <- rdd_reg_lm(rdd_object = data,
slope = "same")
srdd_mod
```
The estimate obtained using a SRDD is suggestive of a substantial downward bias. In fact, this procedure is inconsistent for the true causal effect so increasing the sample would not alleviate the bias.
The book continues with a discussion of potential problems with quasi-experiments. As for all empirical studies, these potential problems are related to internal and external validity. This part is followed by a technical discussion of treatment effect estimation when the causal effect of treatment is heterogeneous in the population. We encourage you to work on these sections on your own.
#### Summary {-}
This chapter has introduced the concept of causal effects in randomized controlled experiments and quasi-experiments where variations in circumstances or accidents of nature are treated as sources of "as if" random assignment to treatment. We have also discussed methods that allow for consistent estimation of these effects in both settings. These included the *differences estimator*, the *differences-in-differences estimator* as well as *sharp* and *fuzzy regression discontinuity design* estimators. It was shown how to apply these estimation techniques in `r ttcode("R")`.
In an empirical application we have shown how to replicate the results of the analysis of the STAR data presented in Chapter 13.3 of the book using `r ttcode("R")`. This study uses a randomized controlled experiment to assess whether smaller classes improve students' performance on standardized tests. Being related to a randomized controlled experiment, the data of this study is fundamentally different to those used in the cross-section studies in Chapters \@ref(lrwor) to \@ref(nrf). We therefore have motivated usage of a *differences estimator*.
Chapter \@ref(attdfc) demonstrated how estimates of treatment effects can be obtained when the design of the study is a quasi-experiment that allows for *differences-in-differences* or *regression discontinuity design* estimators. In particular, we have introduced functions of the package `r ttcode("rddtools")` that are convenient for estimation as well as graphical analysis when estimating a regression discontinuity design.
## Exercises {#exercises-13}
The subsequent exercises guide you in reproducing some of the results presented in one of the most famous DID studies by @card1994. The authors use geography as the “as if” random treatment assignment to study the effect on employment in fast food restaurants caused by an increase in the state minimum wage in New Jersey in the year of 1992, see Chapter \@ref(qe).
The study is based on survey data collected in February 1992 and in November 1992, after New Jersey's minimum wage rose by $\$0.80$ from $\$4.25$ to $\$5.05$ in April 1992.
Estimating the effect of the wage increase simply by computing the change in employment in New Jersey (as you are asked to do in Exercise 3) would fail to control for omitted variables. By using Pennsylvania as a control in a difference-in-differences (DID) model one can control for variables with a common influence on New Jersey (treatment group) and Pennsylvania (control group). This reduces the risk of omitted variable bias enormously and even works when these variables are unobserved.
For the DID approach to work we must assume that New Jersey and Pennsylvania have parallel trends over time, i.e., we assume that the (unobserved) factors influence employment in Pennsylvania and New Jersey in the same manner. This allows to interpret an observed change in employment in Pennsylvania as the change New Jersey would have experienced if there was no increase in minimum wage (and vice versa).
Against to what standard economic theory would suggest, the authors did not find evidence that the increased minimum wage induced an increase in unemployment in New Jersey using the DID approach: quite the contrary, their results suggest that the $\$0.80$ minimum wage increase in New Jersey led to a 2.75 full-time equivalent (FTE) increase in employment.
```{r, echo=F, purl=F, results='asis'}
if (my_output == "html") {
cat('
<div class = "DCexercise">
#### 1. The Data from Card & Krueger (1994) {-}
<tt>fastfood.dat</tt>, the dataset used by Card & Krueger (1994) can be downloaded [here](http://www.stat.ucla.edu/projects/datasets/fastfood.dta). See this [link](http://www.stat.ucla.edu/projects/datasets/fastfood-explanation.html) for a detailed explanation of the variables.
This exercise asks you to import the dataset in <tt>R</tt> and to perform some formatting necessary for the subsequent analysis. This can be tedious using base <tt>R</tt> functions but is easily done using the <tt>dplyr</tt> package introducted in Chapter \\@ref(aattggoe).
The URL to the dataset is saved in <tt>data_URL</tt>.
**Instructions:**
- Attach the packages <tt>dplyr</tt> and <tt>foreign</tt>.
- Read in the dataset <tt>fastfood.dta</tt> using <tt>data_URL</tt> and assign it to a <tt>data.frame</tt> named <tt>dat</tt>.
In their study, Card & Krueger (1994) measure employment in full time equivalents which they define as the number of full time employees (<tt>empft</tt> and <tt>empft2</tt>) plus the number of managers (<tt>nmgrs</tt> and <tt>nmgrs2</tt>) plus 0.5 times the number part-time employees (<tt>emppt</tt> / <tt>emppt2</tt>).
- Define full-time employment before (<tt>FTE</tt>) and after the wage increase (<tt>FTE2</tt>) and add both variables to <tt>dat</tt>
<iframe src="DCL/ex13_1.html" frameborder="0" scrolling="no" style="width:100%;height:340px"></iframe>
**Hints:**
- <tt>read.dta()</tt> from the <tt>foreign</tt> package reads <tt>.dta</tt> files, a format used by the statistical software package *STATA*.
- <tt>mutate()</tt> generates new columns using existing ones.
</div>')}
```
```{r, echo=F, purl=F, results='asis'}
if (my_output == "html") {
cat('
<div class = "DCexercise">
#### 2. State Specific Estimates of Full-Time Employment --- I {-}
This exercise asks you to perform a quick calculation of state specific sample means in order to check whether our data on full-time employment is in alignment with the data used by Card & Krueger (1994).
**Instructions:**
- Generate subsets of <tt>dat</tt> to seperate observations for New Jersey and Pennsylvania. Save them as <tt>dat_NJ</tt> and <tt>dat_PA</tt>.
- Compute sample means of full-time employment equivalents for New Jersey and Pennsylvania both before and after the minium wage increase in New Jersey. It suffices if your code prints the correct values to the console.
<iframe src="DCL/ex13_2.html" frameborder="0" scrolling="no" style="width:100%;height:340px"></iframe>
**Hints:**
- You may use <tt>group_by()</tt> in conjunction with <tt>summarise()</tt> to compute groupwise means. Both function come with the <tt>dplyr</tt> package.
</div>')}
```
```{r, echo=F, purl=F, results='asis'}
if (my_output == "html") {
cat('
<div class = "DCexercise">
#### 3. State Specific Estimates of Full-Time Employment --- II {-}
A naive approach to investigate the impact of the minimum wage increase on employment is to use the estimated difference in mean employment before and after the wage increase for New Jersey fast food restaurants.
This exercise asks you to do the aforementioned and further to test if the estimated difference is significantly different from zero using a *robust* $t$-test.
The subsets <tt>dat_NJ</tt> and <tt>dat_PA</tt> from the previous exercise are available in your working environment.
**Instructions:**
- Use <tt>dat_NJ</tt> for a robust test of the hypothesis that there is no difference in full-time employment before and after the wage hike in New Jersey at the level of $5\\%$.
<iframe src="DCL/ex13_3.html" frameborder="0" scrolling="no" style="width:100%;height:340px"></iframe>
**Hints:**
- The testing problem amounts to a two-sample $t$-test which is conveniently done using <tt>t.test()</tt>.
</div>')}
```
```{r, echo=F, purl=F, results='asis'}
if (my_output == "html") {
cat('
<div class = "DCexercise">
#### 4. Preparing the Data for Regression {-}
The estimations done in Exercise 3 and the difference-in-differences approach we are working towards can be shown to produce the same results OLS applied to specific regression models, see Chapters \\@ref(poceaie) and \\@ref(aattggoe).
This exercise asks you to construct a dataset which is more convenient for this purpose than the dataset <tt>dat</tt>.
**Instructions:**
Generate the dataset <tt>reg_dat</tt> from <tt>dat</tt> in *long format*, i.e., make sure that for each restaurant (identified by <tt>sheet</tt>) one observation before and one after the minimum wage increase (identified by <tt>D</tt>) are included.
Only consider the following variables:
- <tt>id</tt>: sheet number (unique store id)
- <tt>chain</tt>: chain 1=Burger King; 2=KFC; 3=Roy Rogers; 4=Wendys
- <tt>state</tt>: 1 if New Jersey; 0 if Pennsylvania
- <tt>empl</tt>: measure of full-time employment (<tt>FTE</tt> / <tt>FTE2</tt>)
- <tt>D</tt>: dummy indicating if the observation was made before or after the minimum wage increase in New Jersey.
<iframe src="DCL/ex13_4.html" frameborder="0" scrolling="no" style="width:100%;height:340px"></iframe>
**Hints:**
- The original dataset <tt>dat</tt> has 410 observations of 48 variables (check this using <tt>dim(dat)</tt>). The dataset <tt>reg_dat</tt> you are asked to generate must consist of 820 observations of the variables listed above.
- It is straightforward to generate a <tt>data.frame</tt> from the columns of another <tt>data.frame</tt> using <tt>data.frame(...)</tt>.
- Use <tt>rbind()</tt> to combine two objects of type <tt>data.frame</tt> by row.
</div>')}
```
```{r, echo=F, purl=F, results='asis'}
if (my_output == "html") {
cat('
<div class = "DCexercise">
#### 5. A Difference Estimate using Data from Card & Krueger (1994) --- II {-}
<tt>reg_dat</tt> from Exercise 4 is a *panel dataset* as it has two observations for each fast food restaurant $i=1,\\dots,410$, at time periods $t=0,1$.
Thus we may write down the simple regression model
$$employment_{i,t} = \\beta_0 + \\beta_1 D_t + \\varepsilon_{i,t},$$
where $D_t$ is a dummy variable which equals $0$ if the observation was made before the minimum wage change ($t=0$) and $1$ after the minimum wage change ($t=1$), i.e.,