13-ch13.Rmd

# Experiments and Quasi-Experiments {#eaqe}

This chapter discusses statistical tools that are commonly applied in program evaluation, where interest lies in measuring the causal effects of programs, policies or other interventions. An optimal research design for this purpose is what statisticians call an ideal randomized controlled experiment. The basic idea is to randomly assign subjects to two different groups, one that receives the treatment (the treatment group) and one that does not (the control group) and to compare outcomes for both groups in order to get an estimate of the average treatment effect. 

Such *experimental* data is fundamentally different from *observational* data. For example, one might use a randomized controlled experiment to measure how much the performance of students in a standardized test differs between two classes where one has a "regular"" student-teacher ratio and the other one has fewer students. The data produced by such an experiment would be different from, e.g., the observed cross-section data on the students' performance used throughout Chapters \@ref(lrwor) to \@ref(nrf) where class sizes are not randomly assigned to students but instead are the results of an economic decision where educational objectives and budgetary aspects were balanced.

For economists, randomized controlled experiments are often difficult or even infeasible to implement. For example, due to ethic, moral and legal reasons it is practically impossible for a business owner to estimate the causal effect on the productivity of workers of setting them under psychological stress using an experiment where workers are randomly assigned either to the treatment group that is under time pressure or to the control group where work is under regular conditions, at best without knowledge of being in an experiment (see the box *The Hawthorne Effect* on p. 528 of the book).

However, sometimes external circumstances produce what is called a *quasi-experiment* or *natural experiment*. This "as if" randomness allows for estimation of causal effects that are of interest for economists using tools which are very similar to those valid for ideal randomized controlled experiments. These tools draw heavily on the theory of multiple regression and also on IV regression (see Chapter \@ref(ivr)). We will review the core aspects of these methods and demonstrate how to apply them in R using the STAR data set (see the [description](https://dataverse.harvard.edu/dataset.xhtml?persistentId=hdl:1902.1/10766) of the data set).

The following packages and their dependencies are needed for reproduction of the code chunks presented throughout this chapter:

+ `r ttcode("AER")` [@R-AER],
+ `r ttcode("dplyr")` [@R-dplyr],
+ `r ttcode("MASS")` [@R-MASS],
+ `r ttcode("mvtnorm")` [@R-mvtnorm],
+ `r ttcode("rddtools")` [@R-rddtools],
+ `r ttcode("scales")` [@R-scales],
+ `r ttcode("stargazer")`[@R-stargazer],
+ `r ttcode("tidyr")` [@R-tidyr].

Make sure the following code chunk runs without any errors.

```{r, warning=FALSE, message=FALSE, eval=FALSE}
library(AER)
library(dplyr)
library(MASS)
library(mvtnorm)
library(rddtools)
library(scales)
library(stargazer)
library(tidyr)
```

## Potential Outcomes, Causal Effects and Idealized Experiments {#poceaie}

We now briefly recap the idea of the average causal effect and how it can be estimated using the *differences estimator*. We advise you to work through Chapter 13.1 of the book for a better understanding.

#### Potential Outcomes and the average causal effect {-}

A *potential outcome* is the outcome for an individual under a potential treatment. For this individual, the causal effect of the treatment is the difference between the potential outcome if the individual receives the treatment and the potential outcome if she does not. Since this causal effect may be different for different individuals and it is not possible to measure the causal effect for a single individual, one is interested in studying the *average causal effect* of the treatment, hence also called the *average treatment effect*.

In an ideal randomized controlled experiment the following conditions are fulfilled:

1. The subjects are selected at random from the population. 
2. The subjects are randomly assigned to treatment and control group.

Condition 1 guarantees that the subjects' potential outcomes are drawn randomly from the same population distribution such that the expected value of the causal effect in the sample is equal to the average causal effect in the population. Condition 2 ensures that the receipt of treatment is independent from the subjects' potential outcomes. If both conditions are fulfilled, the expected causal effect is the expected outcome in the treatment group minus the expected outcome in the control group. Using conditional expectations we have $$\text{Average causal effect} =  E(Y_i\vert X_i=1) -  E(Y_i\vert X_i=0),$$ where $X_i$ is a binary treatment indicator.

The average causal effect can be estimated using the *differences estimator*, which is nothing but the OLS estimator in the simple regression model 
\begin{align}
  Y_i = \beta_0 + \beta_1 X_i + u_i \ \ , \ \ i=1,\dots,n, (\#eq:diffest)
\end{align}
where random assignment ensures that $E(u_i\vert X_i) = 0$.

The OLS estimator in the regression model 
\begin{align}
  Y_i = \beta_0 + \beta_1 X_i + \beta_2 W_{1i} + \dots + \beta_{1+r} W_{ri} + u_i \ \ , \ \ i=1,\dots,n (\#eq:diffestwar)
\end{align}
with additional regressors $W_1,\dots,W_r$ is called the *differences estimator with additional regressors*. It is assumed that treatment assignment $X_i$ is random and independent of the pretreatment characteristic $W_i$. This assumption is called *conditional mean independence* and implies $$E(u_i\vert X_i , W_i) = E(u_i\vert W_i) = 0,$$ so the conditional expectation of the error $u_i$ given the treatment indicator $X_i$ and the pretreatment characteristic $W_i$ does not depend on the $X_i$. Conditional mean independence replaces the first least squares assumption in Key Concept 6.4 and thus ensures that the differences estimator of $\beta_1$ is unbiased. The *differences estimator with additional regressors* is more efficient than the *differences estimator* if the additional regressors explain some of the variation in the $Y_i$.

## Threats to Validity of Experiments

The concepts of internal and external validity discussed in Key Concept 9.1 are also applicable for studies based on experimental and quasi-experimental data. Chapter 13.2 of the book provides a thorough explanation of the particular threats to internal and external validity of experiments including examples. We limit ourselves to a short repetition of the threats listed there. Consult the book for a more detailed explanation.

#### Threats to Internal Validity {-}

1. **Failure to Randomize**

    If the subjects are not randomly assigned to the treatment group, then the outcomes will be contaminated with the effect of the subjects' individual characteristics or preferences and it is not possible to obtain an unbiased estimate of the treatment effect. One can test for nonrandom assignment using a significance test ($F$-Test) on the coefficients in the regression model $$X_i = \beta_0 + \beta_1 W_{1i} + \dots +\beta_2 W_{ri} + u_i \ \ , \ \ i=1,\dots,n.$$
    
2. **Failure to Follow the Treatment Protocol**

    If subjects do not follow the treatment protocol, i.e., some subjects in the treatment group manage to avoid receiving the treatment and/or some subjects in the control group manage to receive the treatment (*partial compliance*), there is correlation between $X_i$ und $u_i$ such that the OLS estimator of the average treatment effect will be biased. If there are data on *both* treatment actually received ($X_i$) and initial random assignment ($Z_i$), IV regression of the models \@ref(eq:diffest) and \@ref(eq:diffestwar) is a remedy.

3. **Attrition**

    Attrition may result in a nonrandomly selected sample. If subjects systematically drop out of the study after being assigned to the control or the treatment group (systematic means that the reason of the dropout is related to the treatment) there will be correlation between $X_i$ and $u_i$ and hence bias in the OLS estimator of the treatment effect.

4. **Experimental Effects**
  
    If human subjects in treatment group and/or control group know that they are in an experiment, they might adapt their behaviour in a way that prevents unbiased estimation of the treatment effect.

5. **Small Sample Sizes**

    As we know from the theory of linear regression, small sample sizes lead to imprecise estimation of the coefficients and thus imply imprecise estimation of the causal effect. Furthermore, confidence intervals and hypothesis test may produce wrong inference when the sample size is small.

#### Threats to External Validity {-}


1. **Nonrepresentative Sample**

    If the population studied and the population of interest are not sufficiently similar, there is no justification in generalizing the results.

2. **Nonrepresentative Program or Policy**

    If the program or policy for the population studied differs considerably from the program (to be) applied to population(s) of interest, the results cannot be generalized. For example, a small-scale program with low funding might have different effects than a widely available scaled-up program that is actually implemented. There are other factors like duration and the extent of monitoring that should be considered here. 
    
3. **General Equilibrium Effects**

    If market and/or environmental conditions cannot be kept constant when an internally valid program is implemented broadly, external validity may be doubtful.

## Experimental Estimates of the Effect of Class Size Reductions

### Experimental Design and the Data Set {-}

The Project *Student-Teacher Achievement Ratio* (STAR) was a large randomized controlled experiment with the aim of asserting whether a class size reduction is effective in improving education outcomes. It was conducted in 80 Tennessee elementary schools over a period of four years during the 1980s by the State Department of Education.

In the first year, about 6400 students were randomly assigned into one of three interventions: small class (13 to 17 students per teacher), regular class (22 to 25 students per teacher), and regular-with-aide class (22 to 25 students with a full-time teacher's aide). Teachers were also randomly assigned to the classes they taught. The interventions were initiated as the students entered school in kindergarten and continued through to third grade. Control and treatment groups across grades are summarized in Table \@ref(tab:starstructure).

|                 | K                    | 1                    | 2                    | 3                    |
|-----------------|----------------------|----------------------|----------------------|----------------------|
| Treatment 1 | Small class          | Small class          | Small class          | Small class          |
| Treatment 2 | Regular class + aide | Regular class + aide | Regular class + aide | Regular class + aide |
| Control     | Regular class        | Regular class        | Regular class        | Regular class        |

Table: (\#tab:starstructure) Control and treatment groups in the STAR experiment

Each year, the students' learning progress was assessed using the sum of the points scored on the math and reading parts of a standardized test (the [Stanford Achievement Test](https://en.wikipedia.org/wiki/Stanford_Achievement_Test_Series)).

The STAR data set is part of the package `r ttcode("AER")`.

```{r, message=FALSE}
# load the package AER and the STAR dataset
library(AER)
data(STAR)
```

`r ttcode("head(STAR)")` shows that there is a variety of factor variables that describe student and teacher characteristics as well as various school indicators, all of which are separately recorded for the four different grades. The data is in *wide format*. That is, each variable has its own column and for each student, the rows contain observations on these variables. Using `r ttcode("dim(STAR)")` we find that there are a total of 11598 observations on 47 variables. 

```{r}
# get an overview
head(STAR, 2)
dim(STAR)
```

```{r}
# get variable names
names(STAR)
```

A majority of the variable names contain a suffix (`r ttcode("k")`, `r ttcode("1")`, `r ttcode("2")` or `r ttcode("3")`) stating the grade which the respective variable is referring to. This facilitates regression analysis because it allows to adjust the `r ttcode("formula")` argument in `r ttcode("lm()")` for each grade by simply changing the variables' suffixes accordingly.

The outcome produced by `r ttcode("head()")` shows that some recorded values are `r ttcode("NA")` and thus, there is no data on this variable for the student under consideration. This lies in the nature of the data: for example, take the first observation `STAR[1,]`. 

In the output of `head(STAR, 2)` we find that the student entered the experiment in third grade in a regular class, which is why the class size is recorded in `r ttcode("star3")` and the other class type indicator variables are `r ttcode("NA")`. Her math and reading scores for the third grade are available; however, recordings for other grades are not present for the same reason. Obtaining only her non-missing (non-`r ttcode("NA")`) recordings is straightforward: simply eliminate the `r ttcode("NA")`s using the `r ttcode("!is.na()")` function.

```{r}
# drop NA recordings for the first observation and print to the console
STAR[1, !is.na(STAR[1, ])]
```

`is.na(STAR[1, ])` returns a logical vector with `r ttcode("TRUE")` at positions that correspond to `r ttcode("<NA>")` entries for the first observation. The `r ttcode("!")` operator is used to invert the result such that we obtain only non-`r ttcode("<NA>")` entries for the first observations.

In general it is not necessary to remove rows with missing data because `r ttcode("lm()")` does so by default. Missing data may imply a small sample size and thus may lead to imprecise estimation and wrong inference. This is, however, not an issue for the study at hand since, as we will see below, sample sizes lie beyond 5000 observations for each regression conducted.

### Analysis of the STAR Data {-}

As can be seen from Table \@ref(tab:starstructure) that there are two treatment groups in each grade, small classes with only 13 to 17 students and regular classes with 22 to 25 students and a teaching aide. Thus, two binary variables, each being an indicator for the respective treatment group, are introduced for the differences estimator to capture the treatment effect for each treatment group separately. This yields the population regression model 
\begin{align}
  Y_i = \beta_0 + \beta_1 SmallClass_i + \beta_2 RegAide_i + u_i, (\#eq:starpopreg)
\end{align}
with test score $Y_i$ which is the small class indicator $SmallClass_i$ and $RegAide_i$ which is the indicator for a regular class with aide.

We reproduce the results presented in Table 13.1 of the book by performing the regression \@ref(eq:starpopreg) for each grade separately. For each student, the dependent variable is simply the sum of the points scored in the math and reading parts, constructed using `r ttcode("I()")`.

```{r}
# compute differences Estimates for each grades
fmk <- lm(I(readk + mathk) ~ stark, data = STAR)
fm1 <- lm(I(read1 + math1) ~ star1, data = STAR)
fm2 <- lm(I(read2 + math2) ~ star2, data = STAR)
fm3 <- lm(I(read3 + math3) ~ star3, data = STAR)
```

```{r}
# obtain coefficient matrix using robust standard errors
coeftest(fmk, vcov = vcovHC, type= "HC1")
coeftest(fm1, vcov = vcovHC, type= "HC1")
coeftest(fm2, vcov = vcovHC, type= "HC1")
coeftest(fm3, vcov = vcovHC, type= "HC1")
```

We gather the results and present them in a table using `r ttcode("stargazer()")`.

```{r}
# compute robust standard errors for each model and gather them in a list
rob_se_1 <- list(sqrt(diag(vcovHC(fmk, type = "HC1"))),
                 sqrt(diag(vcovHC(fm1, type = "HC1"))),
                 sqrt(diag(vcovHC(fm2, type = "HC1"))),
                 sqrt(diag(vcovHC(fm3, type = "HC1"))))
```

```{r, message=F, warning=F, results='asis', eval=FALSE}
library(stargazer)

stargazer(fmk,fm1,fm2,fm3,
  title = "Project STAR: Differences Estimates",
  header = FALSE, 
  type = "latex",
  model.numbers = F,
  omit.table.layout = "n",
  digits = 3, 
  column.labels = c("K", "1", "2", "3"),
  dep.var.caption  = "Dependent Variable: Grade",
  dep.var.labels.include = FALSE,
  se = rob_se_1)
```

<!--html_preserve-->

```{r, message=F, warning=F, results='asis', echo=F, purl=F, eval=my_output=="html"}
library(stargazer)

stargazer(fmk,fm1,fm2,fm3,
  header = FALSE, 
  type = "html",
  model.numbers = F,
  omit.table.layout = "n",
  digits = 2, 
  column.labels = c("K", "1", "2", "3"),
  dep.var.caption  = "Dependent Variable: Grade",
  dep.var.labels.include = FALSE,
  se = rob_se_1)

stargazer_html_title("Project STAR - Differences Estimates", "psde")
```

<!--/html_preserve-->

```{r, message=F, warning=F, results='asis', echo=F, purl=F, eval=my_output=="latex"}
library(stargazer)

stargazer(fmk,fm1,fm2,fm3,
  title = "\\label{tab:psde} Project STAR - Differences Estimates",
  header = FALSE, 
  digits = 3,
  type = "latex",
  float.env = "sidewaystable",
  column.sep.width = "-3pt",
  model.numbers = F,
  omit.table.layout = "n",
  column.labels = c("K", "1", "2", "3"),
  dep.var.caption  = "Dependent Variable: Grade",
  dep.var.labels.include = FALSE,
  se = rob_se_1) 
```

The estimates presented in Table \@ref(tab:psde) suggest that the class size reduction improves student performance. Except for grade 1, the estimates of the coefficient on $SmallClass$ are roughly of the same magnitude (the estimates lie between 13.90 and 19.39 points) and they are statistically significant at $1\%$. Furthermore, a teaching aide has little, possibly zero, effect on the performance of the students. 

Following the book, we augment the regression model \@ref(eq:starpopreg) by different sets of regressors for two reasons: 

1. If the additional regressors explain some of the observed variation in the dependent variable, we obtain more efficient estimates of the coefficients of interest.
2. If the treatment is not received at random due to failures to follow the treatment protocol (see Chapter 13.3 of the book), the estimates obtained using \@ref(eq:starpopreg) may be biased. Adding additional regressors may solve or mitigate this problem.

In particular, we consider the following student and teacher characteristics

- $experience$ --- Teacher's years of experience
- $boy$ --- Student is a boy (dummy)
- $lunch$ --- Free lunch eligibility (dummy)
- $black$ --- Student is African-American (dummy)
- $race$ --- Student's race is other than black or white (dummy)
- $\text{schoolid}$ --- School indicator variables

in the four population regression specifications
\begin{align}
Y_i =& \beta_0 + \beta_1 SmallClass_i + \beta_2 RegAide_i + u_i, (\#eq:augstarpopreg1) \\
Y_i =& \beta_0 + \beta_1 SmallClass_i + \beta_2 RegAide_i + \beta_3 experience_i + u_i, (\#eq:augstarpopreg2) \\
Y_i =& \beta_0 + \beta_1 SmallClass_i + \beta_2 RegAide_i + \beta_3 experience_i + schoolid + u_i, (\#eq:augstarpopreg3)
\end{align}
and
\begin{align}
Y_i =& \beta_0 + \beta_1 SmallClass_i + \beta_2 RegAide_i + \beta_3 experience_i + \beta_4 boy + \beta_5 lunch \\ 
& + \beta_6 black + \beta_7 race + schoolid + u_i. (\#eq:augstarpopreg4)
\end{align}

Prior to estimation, we perform some subsetting and data wrangling using functions from the packages `r ttcode("dplyr")` and `r ttcode("tidyr")`. These are both part of `r ttcode("tidyverse")`, a collection of `r ttcode("R")` packages designed for data science and handling big datasets (see the [official site](https://www.tidyverse.org/) for more on `r ttcode("tidyverse")` packages). The functions `r ttcode("%>%")`, `r ttcode("transmute()")` and `r ttcode("mutate()")` are sufficient for us here: 

+ `r ttcode("%>%")` allows to chain function calls.
+ `r ttcode("transmute()")` allows to subset the data set by naming the variables to be kept.
+ `r ttcode("mutate()")` is convenient for adding new variables based on existing ones while preserving the latter.

The regression models \@ref(eq:augstarpopreg1) to \@ref(eq:augstarpopreg4) require the variables `r ttcode("gender")`, `r ttcode("ethnicity")`, `r ttcode("stark")`, `r ttcode("readk")`, `r ttcode("mathk")`, `r ttcode("lunchk")`, `r ttcode("experiencek")` and `r ttcode("schoolidk")`. After dropping the remaining variables using `r ttcode("transmute()")`, we use `r ttcode("mutate()")` to add three additional binary variables which are derivatives of existing ones: `r ttcode("black")`, `r ttcode("race")` and `r ttcode("boy")`. They are generated using logical statements within the function `r ttcode("ifelse()")`.

```{r, message=FALSE, warning=FALSE}
# load packages 'dplyr' and 'tidyr' for data wrangling functionalities
library(dplyr)
library(tidyr)

# generate subset with kindergarten data
STARK <- STAR %>% 
      transmute(gender,
                ethnicity,
                stark,
                readk,
                mathk,
                lunchk,
                experiencek,
                schoolidk) %>% 
      mutate(black = ifelse(ethnicity == "afam", 1, 0),
             race = ifelse(ethnicity == "afam" | ethnicity == "cauc", 1, 0),
             boy = ifelse(gender == "male", 1, 0))
```

```{r, message=FALSE, warning=FALSE}
# estimate the models 
gradeK1 <- lm(I(mathk + readk) ~ stark + experiencek, 
              data = STARK)

gradeK2 <- lm(I(mathk + readk) ~ stark + experiencek + schoolidk, 
              data = STARK)

gradeK3 <- lm(I(mathk + readk) ~ stark + experiencek + boy + lunchk 
              + black + race + schoolidk, 
              data = STARK)
```

For brevity, we exclude the coefficients for the indicator dummies in the output of the `r ttcode("coeftest()")`by subsetting the matrices.

```{r, message=FALSE, warning=FALSE}
# obtain robust inference on the significance of coefficients
coeftest(gradeK1, vcov. = vcovHC, type = "HC1")
coeftest(gradeK2, vcov. = vcovHC, type = "HC1")[1:4, ]
coeftest(gradeK3, vcov. = vcovHC, type = "HC1")[1:7, ]
```

We now use `r ttcode("stargazer()")` to gather all relevant information in a structured table.

```{r}
# compute robust standard errors for each model and gather them in a list
rob_se_2 <- list(sqrt(diag(vcovHC(fmk, type = "HC1"))),
                 sqrt(diag(vcovHC(gradeK1, type = "HC1"))),
                 sqrt(diag(vcovHC(gradeK2, type = "HC1"))),
                 sqrt(diag(vcovHC(gradeK3, type = "HC1"))))
```

```{r, message=F, warning=F, results='asis', eval=FALSE}
stargazer(fmk, gradeK1, gradeK2, gradeK3,
  title = "Project STAR - Differences Estimates with 
  Additional Regressors for Kindergarten",
  header = FALSE, 
  type = "latex",
  model.numbers = F,
  omit.table.layout = "n",
  digits = 3, 
  column.labels = c("(1)", "(2)", "(3)", "(4)"),
  dep.var.caption  = "Dependent Variable: Test Score in Kindergarten",
  dep.var.labels.include = FALSE,
  se = rob_se_2) 
```

<!--html_preserve-->

```{r, message=F, warning=F, results='asis', echo=F, purl=F, eval=my_output=="html"}
stargazer(fmk,gradeK1,gradeK2,gradeK3,
  header = FALSE, 
  type = "html",
  model.numbers = F,
  omit.table.layout = "n",
  digits = 3, 
  column.labels = c("(1)", "(2)", "(3)", "(4)"),
  dep.var.caption  = "Dependent Variable: Test Score in Kindergarten",
  dep.var.labels.include = FALSE,
  se = rob_se_2,
  omit = "schoolid",
  nobs = TRUE,
  add.lines = list(
                   c("School indicators?", "no", "no", "yes", "yes")
                   )
  ) 

stargazer_html_title("Project STAR - Differences Estimates with Additional Regressors for Kindergarten", "psdewarfk")
```

<!--/html_preserve-->

```{r, message=F, warning=F, results='asis', echo=F, purl=F, eval=my_output=="latex"}
stargazer(fmk,gradeK1,gradeK2,gradeK3,
  title = "\\label{tab:psdewarfk} Project STAR - Differences Estimates with 
  Additional Regressors for Kindergarten",
  header = FALSE, 
  digits = 3,
  type = "latex",
  float.env = "sidewaystable",
  column.sep.width = "-5pt",
  model.numbers = F,
  omit.table.layout = "n",
  column.labels = c("(1)", "(2)", "(3)", "(4)"),
  dep.var.caption  = "Dependent Variable: Test Score in Kindergarten",
  dep.var.labels.include = FALSE,
  se = rob_se_2,
  omit = "schoolid",
  nobs = TRUE,
  add.lines = list(
                   c("School indicators?", "no", "no", "yes", "yes")
                   )
  ) 
```

The results in column (1) of Table \@ref(tab:psdewarfk) same as the results obtained for \@ref(eq:starpopreg). Columns (2) to (4) reveal that adding student characteristics and school fixed effects does not lead to substantially different estimates of the treatment effects. This result makes it more plausible that the estimates of the effects obtained using model \@ref(eq:starpopreg) do not suffer from failure of random assignment. There is some decrease in the standard errors and some increase in $\bar{R}^2$, implying that the estimates are more precise.

Because teachers were randomly assigned to classes, inclusion of school fixed effect allows us to estimate the causal effect of a teacher's experience on test scores of students in kindergarten. Regression (3) predicts the average effect of 10 years experience on test scores to be $10\cdot 0.74=7.4$ points. Be aware that the other estimates on student characteristics in regression (4) *do not* have causal interpretation due to nonrandom assignment (see Chapter 13.3 of the book for a detailed discussion).

Are the estimated effects presented in Table \@ref(tab:psdewarfk) large or small in a practical sense? Let us translate the predicted changes in test scores to units of standard deviation in order to allow for a comparison (see Section \@ref(etsacs) for a similar argument).

```{r}
# compute the sample standard deviations of test scores
SSD <- c("K" = sd(na.omit(STAR$readk + STAR$mathk)),
         "1" = sd(na.omit(STAR$read1 + STAR$math1)),
         "2" = sd(na.omit(STAR$read2 + STAR$math2)),
         "3" = sd(na.omit(STAR$read3 + STAR$math3)))

# translate the effects of small classes to standard deviations
Small <- c("K" = as.numeric(coef(fmk)[2]/SSD[1]),
           "1" = as.numeric(coef(fm1)[2]/SSD[2]),
           "2" = as.numeric(coef(fm2)[2]/SSD[3]),
           "3" = as.numeric(coef(fm3)[2]/SSD[4]))

# adjust the standard errors
SmallSE <- c("K" = as.numeric(rob_se_1[[1]][2]/SSD[1]),
             "1" = as.numeric(rob_se_1[[2]][2]/SSD[2]),
             "2" = as.numeric(rob_se_1[[3]][2]/SSD[3]),
             "3" = as.numeric(rob_se_1[[4]][2]/SSD[4]))

# translate the effects of regular classes with aide to standard deviations
RegAide<- c("K" = as.numeric(coef(fmk)[3]/SSD[1]),
            "1" = as.numeric(coef(fm1)[3]/SSD[2]),
            "2" = as.numeric(coef(fm2)[3]/SSD[3]),
            "3" = as.numeric(coef(fm3)[3]/SSD[4]))

# adjust the standard errors
RegAideSE <- c("K" = as.numeric(rob_se_1[[1]][3]/SSD[1]),
               "1" = as.numeric(rob_se_1[[2]][3]/SSD[2]),
               "2" = as.numeric(rob_se_1[[3]][3]/SSD[3]),
               "3" = as.numeric(rob_se_1[[4]][3]/SSD[4]))

# gather the results in a data.frame and round
df <- t(round(data.frame(
                        Small, SmallSE, RegAide, RegAideSE, SSD),
                        digits =  2))
```

It is fairly easy to turn the `r ttcode("data.frame")` `r ttcode("df")` into a table.

```{r, eval=FALSE}
# generate a simple table using stargazer
stargazer(df,
          title = "Estimated Class Size Effects 
          (in Units of Standard Deviations)",
          type = "html", 
          summary = FALSE,
          header = FALSE
          )
```

<!--html_preserve-->

```{r, message=F, warning=F, results='asis', echo=F, purl=F, eval=my_output=="html"}
stargazer(df,
          type = "html",
          header = FALSE,
          summary = FALSE)

stargazer_html_title("Estimated Class Size Effects 
          (in Units of Standard Deviations)", "ecse")
```

<!--/html_preserve-->

```{r, message=F, warning=F, results='asis', echo=F, purl=F, eval=my_output=="latex"}
stargazer(df,
          title = "\\label{tab:ecse} Estimated Class Size Effects 
          (in Units of Standard Deviations)",
          type = "latex",
          header = FALSE,
          summary = FALSE)
```

The estimated effect of a small class is largest for grade 1. As pointed out in the book, this is probably because students in the control group for grade 1 did poorly on the test for some unknown reason or simply due to random variation. The difference between the estimated effect of being in a small class and being in a regular class with an aide is roughly 0.2 standard deviations for all grades. This leads to the conclusion that the effect of being in a regular sized class with an aide is zero and the effect of being in a small class is roughly the same for all grades.

The remainder of Chapter 13.3 in the book discusses to what extent these experimental estimates are comparable with observational estimates obtained using data on school districts in California and Massachusetts in Chapter \@ref(asbomr). It turns out that the estimates are indeed very similar. Please refer to the aforementioned section in the book for a more detailed discussion.

## Quasi Experiments {#qe}

In quasi-experiments, "as if" randomness is exploited to use methods similar to those that have been discussed in the previous chapter. There are two types of quasi-experiments:^[See Chapter 13.4 of the book for some example studies that are based on quasi-experiments.]

1. Random variations in individual circumstances allow to view the treatment "as if" it was randomly determined.

2. The treatment is only partially determined by "as if" random variation.

The former allows to estimate the effect using either model \@ref(eq:diffestwar), i.e., the *difference estimator with additional regressors*, or, if there is doubt that the "as if" randomness does not entirely ensure that there are no systematic differences between control and treatment group, using the *differences-in-differences* (DID) estimator. In the latter case, an IV approach for estimation of a model like \@ref(eq:diffestwar) which uses the source of "as if" randomness in treatment assignment as the instrument may be applied.

Some more advanced techniques that are helpful in settings where the treatment assignment is (partially) determined by a threshold in a so-called running variable are *sharp regression discontinuity design* (RDD) and *fuzzy regression discontinuity design* (FRDD).

We briefly review these techniques and, since the book does not provide any empirical examples in this section, we will use our own simulated data in a minimal example to discuss how DID, RDD and FRDD can be applied in `r ttcode("R")`.

### The Differences-in-Differences Estimator {-} 

In quasi-experiments the source of "as if" randomness in treatment assignment can often not entirely prevent systematic differences between control and treatment groups. This problem was encountered by @card1994 who use geography as the "as if" random treatment assignment to study the effect on employment in fast-food restaurants caused by an increase in the state minimum wage in New Jersey in the year of 1992. Their idea was to use the fact that the increase in minimum wage applied to employees in New Jersey (treatment group) but not to those living in neighboring Pennsylvania (control group). 

It is quite conceivable that such a wage hike is not correlated with other determinants of employment. However, there still might be some state-specific differences and thus differences between control and treatment group. This would render the *differences estimator* biased and inconsistent. @card1994 solved this by using a DID estimator: they collected data in February 1992 (before the treatment) and November 1992 (after the treatment) for the same restaurants and estimated the effect of the wage hike by analyzing differences in the differences in employment for New Jersey and Pennsylvania before and after the increase.^[Also see the box *What is the Effect on Employment of the Minimum Wage?* in Chapter 13.4 of the book.] The DID estimator is

\begin{align}
  \widehat{\beta}_1^{\text{diffs-in-diffs}} =& \, (\overline{Y}^{\text{treatment,after}} - \overline{Y}^{\text{treatment,before}}) - (\overline{Y}^{\text{control,after}} - \overline{Y}^{\text{control,before}}) \\
  =& \Delta \overline{Y}^{\text{treatment}} - \Delta \overline{Y}^{\text{control}}, (\#eq:DID)
\end{align}

with

+ $\overline{Y}^{\text{treatment,before}}$ - the sample average in the treatment group before the treatment

+ $\overline{Y}^{\text{treatment,after}}$ - the sample average in the treatment group after the treatment

+ $\overline{Y}^{\text{treatment,before}}$ - the sample average in the control group before the treatment

+ $\overline{Y}^{\text{treatment,after}}$ - the sample average in the control group after the treatment.

We now use `r ttcode("R")` to reproduce Figure 13.1 of the book.


```{r, fig.align='center'}
# initialize plot and add control group
plot(c(0, 1), c(6, 8), 
     type = "p",
     ylim = c(5, 12),
     xlim = c(-0.3, 1.3),
     main = "The Differences-in-Differences Estimator",
     xlab = "Period",
     ylab = "Y",
     col = "steelblue",
     pch = 20,
     xaxt = "n",
     yaxt = "n")

axis(1, at = c(0, 1), labels = c("before", "after"))
axis(2, at = c(0, 13))

# add treatment group
points(c(0, 1, 1), c(7, 9, 11), 
       col = "darkred",
       pch = 20)

# add line segments
lines(c(0, 1), c(7, 11), col = "darkred")
lines(c(0, 1), c(6, 8), col = "steelblue")
lines(c(0, 1), c(7, 9), col = "darkred", lty = 2)
lines(c(1, 1), c(9, 11), col = "black", lty = 2, lwd = 2)

# add annotations
text(1, 10, expression(hat(beta)[1]^{DID}), cex = 0.8, pos = 4)
text(0, 5.5, "s. mean control", cex = 0.8 , pos = 4)
text(0, 6.8, "s. mean treatment", cex = 0.8 , pos = 4)
text(1, 7.9, "s. mean control", cex = 0.8 , pos = 4)
text(1, 11.1, "s. mean treatment", cex = 0.8 , pos = 4)
```


The DID estimator \@ref(eq:DID) can also be written in regression notation: $\widehat{\beta}_1^{\text{DID}}$ is the OLS estimator of $\beta_1$ in
\begin{align}
  \Delta Y_i = \beta_0 + \beta_1 X_i + u_i, (\#eq:did)
\end{align}
where $\Delta Y_i$ denotes the difference in pre- and post-treatment outcomes of individual $i$ and $X_i$ is the treatment indicator.

Adding additional regressors that measure pre-treatment characteristics to \@ref(eq:did) we obtain
\begin{align}
  \Delta Y_i = \beta_0 + \beta_1 X_i + \beta_2 W_{1i} + \dots + \beta_{1+r} W_{ri} + u_i, (\#eq:didwar)
\end{align}
the *difference-in-differences estimator* with additional regressors. The additional regressors may lead to a more precise estimate of $\beta_1$. 

We keep things simple and focus on estimation of the treatment effect using DID in the simplest case, that is a control and a treatment group observed for two time periods --- one before and one after the treatment. In particular, we will see that there are three different ways to proceed. 

First, we simulate pre- and post-treatment data using `r ttcode("R")`. 

```{r}
# set sample size
n <- 200

# define treatment effect
TEffect <- 4

# generate treatment dummy
TDummy <- c(rep(0, n/2), rep(1, n/2))

# simulate pre- and post-treatment values of the dependent variable
y_pre <- 7 + rnorm(n)
y_pre[1:n/2] <- y_pre[1:n/2] - 1
y_post <- 7 + 2 + TEffect * TDummy + rnorm(n)
y_post[1:n/2] <- y_post[1:n/2] - 1 
```

Next we plot the data. The function `r ttcode("jitter()")` is used to add some artificial dispersion in the horizontal component of the points so that there is less overplotting. The function `r ttcode("alpha()")` from the package `r ttcode("scales")` allows to adjust the opacity of colors used in plots.


```{r, fig.align='center'}
library(scales)

pre <- rep(0, length(y_pre[TDummy==0]))
post <- rep(1, length(y_pre[TDummy==0]))

# plot control group in t=1
plot(jitter(pre, 0.6), 
     y_pre[TDummy == 0], 
     ylim = c(0, 16), 
     col = alpha("steelblue", 0.3),
     pch = 20, 
     xlim = c(-0.5, 1.5),
     ylab = "Y",
     xlab = "Period",
     xaxt = "n",
     main = "Artificial Data for DID Estimation")

axis(1, at = c(0, 1), labels = c("before", "after"))

# add treatment group in t=1
points(jitter(pre, 0.6), 
       y_pre[TDummy == 1], 
       col = alpha("darkred", 0.3), 
       pch = 20)

# add control group in t=2
points(jitter(post, 0.6),
       y_post[TDummy == 0], 
       col = alpha("steelblue", 0.5),
       pch = 20)

# add treatment group in t=2
points(jitter(post, 0.6), 
       y_post[TDummy == 1], 
       col = alpha("darkred", 0.5),
       pch = 20)
```


After the treatment, both the control and treatment groups have a higher mean, but the increase is stronger in the treatment group. Using DID we may estimate how much of that difference is due to the treatment.

It is straightforward to compute the DID estimate in the fashion of \@ref(eq:DID).

```{r}
# compute the DID estimator for the treatment effect 'by hand'
mean(y_post[TDummy == 1]) - mean(y_pre[TDummy == 1]) - 
(mean(y_post[TDummy == 0]) - mean(y_pre[TDummy == 0]))
```

Notice that the estimate is close to $4$, the value chosen as the treatment effect `r ttcode("TEffect")` above. Since \@ref(eq:did) is a simple linear model, we may perform OLS estimation of this regression specification using `r ttcode("lm()")`.

```{r}
# compute the DID estimator using a linear model
lm(I(y_post - y_pre) ~ TDummy)
```

We find that the estimates coincide. Furthermore, one can show that the DID estimate obtained by estimating specification \@ref(eq:did) is the same as the OLS estimate of $\beta_{TE}$ in 
\begin{align}
  Y_i =& \beta_0 + \beta_1 D_i + \beta_2 Period_i + \beta_{TE} (Period_i \times D_i) + \varepsilon_i, (\#eq:DIDint)
\end{align}
where $D_i$ is the binary treatment indicator, $Period_i$ is a binary indicator for the after-treatment period and the $Period_i \times D_i$ is the interaction of both.

As for \@ref(eq:did), estimation of \@ref(eq:DIDint) using `r ttcode("R")` is straightforward. See Chapter \@ref(nrf) for a discussion of interaction terms.

```{r}
# prepare data for DID regression using the interaction term 
d <- data.frame("Y" = c(y_pre,y_post),
                "Treatment" = TDummy, 
                "Period" = c(rep("1", n), rep("2", n)))

# estimate the model
lm(Y ~ Treatment * Period, data = d)
```

As expected, the estimate of the coefficient on the interaction of the treatment dummy and the time dummy coincide with the estimates obtained using \@ref(eq:DID) and OLS estimation of \@ref(eq:did).

### Regression Discontinuity Estimators {-}

Consider the model
\begin{align}
  Y_i =& \beta_0 + \beta_1 X_i + \beta_2 W_i + u_i (\#eq:SRDDsetting)
\end{align}
and let
\begin{align*}
X_i =& 
  \begin{cases}
    1, & W_i \geq c \\
    0, & W_i < c,
  \end{cases}
\end{align*}
so that the receipt of treatment, $X_i$, is determined by some threshold $c$ of a continuous variable $W_i$, the so called running variable. The idea of *regression discontinuity design* is to use observations with a $W_i$ close to $c$ for estimation of $\beta_1$. $\beta_1$ is the average treatment effect for individuals with $W_i = c$ which is assumed to be a good approximation to the treatment effect in the population. \@ref(eq:SRDDsetting) is called a *sharp regression discontinuity design* because treatment assignment is deterministic and discontinuous at the cutoff: all observations with $W_i < c$ do not receive treatment and all observations where $W_i \geq c$ are treated.   

The subsequent code chunks show how to estimate a linear SRDD using `r ttcode("R")` and how to produce plots in the way of Figure 13.2 of the book.

```{r, message=FALSE}
# generate some sample data
W <- runif(1000, -1, 1)
y <- 3 + 2 * W + 10 * (W>=0) + rnorm(1000)
```


```{r, fig.align='center', message=FALSE}
# load the package 'rddtools'
library(rddtools)

# construct rdd_data 
data <- rdd_data(y, W, cutpoint = 0)

# plot the sample data
plot(data,
     col = "steelblue",
     cex = 0.35, 
     xlab = "W", 
     ylab = "Y")
```


The argument `r ttcode("nbins")` sets the number of bins the running variable is divided into for aggregation. The dots represent bin averages of the outcome variable.

We may use the function `r ttcode("rdd_reg_lm()")` to estimate the treatment effect using model \@ref(eq:SRDDsetting) for the artificial data generated above. By choosing `r ttcode('slope = "same"')` we restrict the slopes of the estimated regression function to be the same on both sides of the jump at the cutpoint $W=0$. 

```{r, message=FALSE, warning=FALSE}
# estimate the sharp RDD model
rdd_mod <- rdd_reg_lm(rdd_object = data, 
                      slope = "same")
summary(rdd_mod)
```

The coefficient estimate of interest is labeled `r ttcode("D")`. The estimate is very close to the treatment effect chosen in the DGP above. 

It is easy to visualize the result: simply call `r ttcode("plot()")` on the estimated model object.


```{r fig.align='center'}
# plot the RDD model along with binned observations
plot(rdd_mod,
     cex = 0.35, 
     col = "steelblue", 
     xlab = "W", 
     ylab = "Y")
```


As above, the dots represent averages of binned observations.

So far we assumed that crossing of the threshold determines receipt of treatment so that the jump of the population regression functions at the threshold can be regarded as the causal effect of the treatment.

When crossing the threshold $c$ is not the only cause for receipt of the treatment, treatment is no a deterministic function of $W_i$. Instead, it is useful to think of $c$ as a threshold where the *probability* of receiving the treatment jumps.

This jump may be due to unobservable variables that have impact on the probability of being treated. Thus, $X_i$ in \@ref(eq:SRDDsetting) will be correlated with the error $u_i$ and it becomes more difficult to consistently estimate the treatment effect. In this setting, using a *fuzzy regression discontinuity design* which is based an IV approach may be a remedy: take the binary variable $Z_i$ as an indicator for crossing of the threshold,
\begin{align*}
  Z_i = \begin{cases}
    1, & W_i \geq c \\
    0, & W_i < c,
  \end{cases}
\end{align*}
and assume that $Z_i$ relates to $Y_i$ only through the treatment indicator $X_i$. Then $Z_i$ and $u_i$ are uncorrelated but $Z_i$ influences receipt of treatment so it is correlated with $X_i$. Thus, $Z_i$ is a valid instrument for $X_i$ and \@ref(eq:SRDDsetting) can be estimated using TSLS.  

The following code chunk generates sample data where observations with a value of the running variable $W_i$ below the cutoff $c=0$ do not receive treatment and observations with $W_i \geq 0$ do receive treatment with a probability of $80\%$ so that treatment status is only partially determined by the running variable and the cutoff. Treatment leads to an increase in $Y$ by $2$ units. Observations with $W_i \geq 0$ that do not receive treatment are called *no-shows*: think of an individual that was assigned to receive the treatment but somehow manages to avoid it.

```{r, message=FALSE}
library(MASS)

# generate sample data
mu <- c(0, 0)
sigma <- matrix(c(1, 0.7, 0.7, 1), ncol = 2)

set.seed(1234)
d <- as.data.frame(mvrnorm(2000, mu, sigma))
colnames(d) <- c("W", "Y")

# introduce fuzziness
d$treatProb <- ifelse(d$W < 0, 0, 0.8)

fuzz <- sapply(X = d$treatProb, FUN = function(x) rbinom(1, 1, prob = x))

# treatment effect
d$Y <- d$Y + fuzz * 2
```

`r ttcode("sapply()")` applies the function provided to `r ttcode("FUN")` to every element of the argument `r ttcode("X")`. Here, since `r ttcode("d$treatProb")` is a vector, the result is a vector, too.

We plot all observations and use blue color to mark individuals that did not receive the treatment and use red color for those who received the treatment. 


```{r, fig.align='center'}
# generate a colored plot of treatment and control group
plot(d$W, d$Y,
     col = c("steelblue", "darkred")[factor(fuzz)], 
     pch= 20, 
     cex = 0.5,
     xlim = c(-3, 3),
     ylim = c(-3.5, 5),
     xlab = "W",
     ylab = "Y")

# add a dashed vertical line at cutoff
abline(v = 0, lty = 2)
#add legend
legend("topleft",pch=20,col=c("steelblue","darkred"),
       legend=c("Do not receive treatment","Receive treatment"))
```


Obviously, receipt of treatment is no longer a deterministic function of the running variable $W$. Some observations with $W\geq0$ *did not* receive the treatment. We may estimate a FRDD by additionally setting `r ttcode("treatProb")` as the assignment variable `r ttcode("z")` in `r ttcode("rdd_data()")`. Then `r ttcode("rdd_reg_lm()")` applies the following TSLS procedure: treatment is predicted using $W_i$ and the cutoff dummy $Z_i$, the instrumental variable, in the first stage regression. The fitted values from the first stage regression are used to obtain a consistent estimate of the treatment effect using the second stage where the outcome $Y$ is regressed on the fitted values and the running variable $W$.

```{r}
# estimate the Fuzzy RDD
data <- rdd_data(d$Y, d$W, 
                 cutpoint = 0, 
                 z = d$treatProb)

frdd_mod <- rdd_reg_lm(rdd_object = data, 
                       slope = "same")
frdd_mod
```

The estimate is close to $2$, the population treatment effect. We may call `r ttcode("plot()")` on the model object to obtain a figure consisting of binned data and the estimated regression function. 


```{r, fig.align='center'}
# plot estimated FRDD function
plot(frdd_mod, 
     cex = 0.5, 
     lwd = 0.4,
     xlim = c(-4, 4),
     ylim = c(-3.5, 5),
     xlab = "W",
     ylab = "Y")
```


What if we used a SRDD instead, thereby ignoring the fact that treatment is not perfectly determined by the cutoff in $W$? We may get an impression of the consequences by estimating an SRDD using the previously simulated data.

```{r}
# estimate SRDD
data <- rdd_data(d$Y, 
                 d$W, 
                 cutpoint = 0)

srdd_mod <- rdd_reg_lm(rdd_object = data, 
                       slope = "same")
srdd_mod
```

The estimate obtained using a SRDD is suggestive of a substantial downward bias. In fact, this procedure is inconsistent for the true causal effect so increasing the sample would not alleviate the bias.

The book continues with a discussion of potential problems with quasi-experiments. As for all empirical studies, these potential problems are related to internal and external validity. This part is followed by a technical discussion of treatment effect estimation when the causal effect of treatment is heterogeneous in the population. We encourage you to work on these sections on your own.

#### Summary {-}

This chapter has introduced the concept of causal effects in randomized controlled experiments and quasi-experiments where variations in circumstances or accidents of nature are treated as sources of "as if" random assignment to treatment. We have also discussed methods that allow for consistent estimation of these effects in both settings. These included the *differences estimator*, the *differences-in-differences estimator* as well as *sharp* and *fuzzy regression discontinuity design* estimators. It was shown how to apply these estimation techniques in `r ttcode("R")`.

In an empirical application we have shown how to replicate the results of the analysis of the STAR data presented in Chapter 13.3 of the book using `r ttcode("R")`. This study uses a randomized controlled experiment to assess whether smaller classes improve students' performance on standardized tests. Being related to a randomized controlled experiment, the data of this study is fundamentally different to those used in the cross-section studies in Chapters \@ref(lrwor) to \@ref(nrf). We therefore have motivated usage of a *differences estimator*.

Chapter \@ref(attdfc) demonstrated how estimates of treatment effects can be obtained when the design of the study is a quasi-experiment that allows for *differences-in-differences* or *regression discontinuity design* estimators. In particular, we have introduced functions of the package `r ttcode("rddtools")` that are convenient for estimation as well as graphical analysis when estimating a regression discontinuity design. 

## Exercises {#exercises-13}

The subsequent exercises guide you in reproducing some of the results presented in one of the most famous DID studies by @card1994. The authors use geography as the “as if” random treatment assignment to study the effect on employment in fast food restaurants caused by an increase in the state minimum wage in New Jersey in the year of 1992, see Chapter \@ref(qe). 

The study is based on survey data collected in February 1992 and in November 1992, after New Jersey's minimum wage rose by $\$0.80$ from $\$4.25$ to $\$5.05$ in April 1992.

Estimating the effect of the wage increase simply by computing the change in employment in New Jersey (as you are asked to do in Exercise 3) would fail to control for omitted variables. By using Pennsylvania as a control in a difference-in-differences (DID) model one can control for variables with a common influence on New Jersey (treatment group) and Pennsylvania (control group). This reduces the risk of omitted variable bias enormously and even works when these variables are unobserved. 

For the DID approach to work we must assume that New Jersey and Pennsylvania have parallel trends over time, i.e., we assume that the (unobserved) factors influence employment in Pennsylvania and New Jersey in the same manner. This allows to interpret an observed change in employment in Pennsylvania as the change New Jersey would have experienced if there was no increase in minimum wage (and vice versa). 

Against to what standard economic theory would suggest, the authors did not find evidence that the increased minimum wage induced an increase in unemployment in New Jersey using the DID approach: quite the contrary, their results suggest that the $\$0.80$ minimum wage increase in New Jersey led to a 2.75 full-time equivalent (FTE) increase in employment.

```{r, echo=F, purl=F, results='asis'}
if (my_output == "html") {
  cat('
<div  class = "DCexercise">

#### 1. The Data from Card & Krueger (1994) {-}

<tt>fastfood.dat</tt>, the dataset used by Card & Krueger (1994) can be downloaded [here](http://www.stat.ucla.edu/projects/datasets/fastfood.dta). See this [link](http://www.stat.ucla.edu/projects/datasets/fastfood-explanation.html) for a detailed explanation of the variables.

This exercise asks you to import the dataset in <tt>R</tt> and to perform some formatting necessary for the subsequent analysis. This can be tedious using base <tt>R</tt> functions but is easily done using the <tt>dplyr</tt> package introducted in Chapter \\@ref(aattggoe). 

The URL to the dataset is saved in <tt>data_URL</tt>.

**Instructions:**

- Attach the packages <tt>dplyr</tt> and <tt>foreign</tt>.

- Read in the dataset <tt>fastfood.dta</tt> using <tt>data_URL</tt> and assign it to a <tt>data.frame</tt> named <tt>dat</tt>.

In their study, Card & Krueger (1994) measure employment in full time equivalents which they define as the number of full time employees (<tt>empft</tt> and <tt>empft2</tt>) plus the number of managers (<tt>nmgrs</tt> and <tt>nmgrs2</tt>) plus 0.5 times the number part-time employees (<tt>emppt</tt> / <tt>emppt2</tt>). 
- Define full-time employment before (<tt>FTE</tt>) and after the wage increase (<tt>FTE2</tt>) and add both variables to <tt>dat</tt>

<iframe src="DCL/ex13_1.html" frameborder="0" scrolling="no" style="width:100%;height:340px"></iframe>

**Hints:**

- <tt>read.dta()</tt> from the <tt>foreign</tt> package reads <tt>.dta</tt> files, a format used by the statistical software package *STATA*.

- <tt>mutate()</tt> generates new columns using existing ones.

</div>')}
```

```{r, echo=F, purl=F, results='asis'}
if (my_output == "html") {
  cat('
<div  class = "DCexercise"> 

#### 2. State Specific Estimates of Full-Time Employment --- I {-}

This exercise asks you to perform a quick calculation of state specific sample means in order to check whether our data on full-time employment is in alignment with the data used by Card & Krueger (1994).   

**Instructions:**

- Generate subsets of <tt>dat</tt> to seperate observations for New Jersey and Pennsylvania. Save them as <tt>dat_NJ</tt> and <tt>dat_PA</tt>. 

- Compute sample means of full-time employment equivalents for New Jersey and Pennsylvania both before and after the minium wage increase in New Jersey. It suffices if your code prints the correct values to the console.

<iframe src="DCL/ex13_2.html" frameborder="0" scrolling="no" style="width:100%;height:340px"></iframe>

**Hints:**

- You may use <tt>group_by()</tt> in conjunction with <tt>summarise()</tt> to compute groupwise means. Both function come with the <tt>dplyr</tt> package.

</div>')}
```


```{r, echo=F, purl=F, results='asis'}
if (my_output == "html") {
  cat('
<div  class = "DCexercise"> 

#### 3. State Specific Estimates of Full-Time Employment --- II {-}
  
A naive approach to investigate the impact of the minimum wage increase on employment is to use the estimated difference in mean employment before and after the wage increase for New Jersey fast food restaurants.

This exercise asks you to do the aforementioned and further to test if the estimated difference is significantly different from zero using a *robust* $t$-test.

The subsets <tt>dat_NJ</tt> and <tt>dat_PA</tt> from the previous exercise are available in your working environment. 

**Instructions:**

- Use <tt>dat_NJ</tt> for a robust test of the hypothesis that there is no difference in full-time employment before and after the wage hike in New Jersey at the level of $5\\%$.

<iframe src="DCL/ex13_3.html" frameborder="0" scrolling="no" style="width:100%;height:340px"></iframe>

**Hints:**

- The testing problem amounts to a two-sample $t$-test which is conveniently done using <tt>t.test()</tt>.

</div>')}
```

```{r, echo=F, purl=F, results='asis'}
if (my_output == "html") {
  cat('
<div  class = "DCexercise"> 

#### 4. Preparing the Data for Regression {-}

The estimations done in Exercise 3 and the difference-in-differences approach we are working towards can be shown to produce the same results OLS applied to specific regression models, see Chapters \\@ref(poceaie) and \\@ref(aattggoe).

This exercise asks you to construct a dataset which is more convenient for this purpose than the dataset <tt>dat</tt>.

**Instructions:**

Generate the dataset <tt>reg_dat</tt> from <tt>dat</tt> in *long format*, i.e., make sure that for each restaurant (identified by <tt>sheet</tt>) one observation before and one after the minimum wage increase (identified by <tt>D</tt>) are included.

Only consider the following variables:

- <tt>id</tt>: sheet number (unique store id)

- <tt>chain</tt>: chain 1=Burger King; 2=KFC; 3=Roy Rogers; 4=Wendys

- <tt>state</tt>: 1 if New Jersey; 0 if Pennsylvania

- <tt>empl</tt>: measure of full-time employment (<tt>FTE</tt> / <tt>FTE2</tt>)

- <tt>D</tt>: dummy indicating if the observation was made before or after the minimum wage increase in New Jersey. 

<iframe src="DCL/ex13_4.html" frameborder="0" scrolling="no" style="width:100%;height:340px"></iframe>

**Hints:**

- The original dataset <tt>dat</tt> has 410 observations of 48 variables (check this using <tt>dim(dat)</tt>). The dataset <tt>reg_dat</tt> you are asked to generate must consist of 820 observations of the variables listed above.

- It is straightforward to generate a <tt>data.frame</tt> from the columns of another <tt>data.frame</tt> using <tt>data.frame(...)</tt>.

- Use <tt>rbind()</tt> to combine two objects of type <tt>data.frame</tt> by row. 

</div>')}
```

```{r, echo=F, purl=F, results='asis'}
if (my_output == "html") {
  cat('
<div  class = "DCexercise">   

#### 5. A Difference Estimate using Data from Card & Krueger (1994) --- II {-}

<tt>reg_dat</tt> from Exercise 4 is a *panel dataset* as it has two observations for each fast food restaurant $i=1,\\dots,410$, at time periods $t=0,1$.   

Thus we may write down the simple regression model 

$$employment_{i,t} = \\beta_0 + \\beta_1 D_t + \\varepsilon_{i,t},$$

where $D_t$ is a dummy variable which equals $0$ if the observation was made before the minimum wage change ($t=0$) and $1$ after the minimum wage change ($t=1$), i.e.,

\\begin{align*}
D_t = \\begin{cases}
0, & \\, \\text{if $t=0$ (before wage change),} \\\\
1, & \\, \\text{if $t=1$ (after wage change)}
\\end{cases}
\\end{align*}

and assume that observations for *New Jersey restaurants only* are used in computing $\\hat\\beta_1$, the OLS estimator of $\\beta_1$, which is also called the *differences estimator*. 

The dataset <tt>reg_dat</tt> from Exercise 4 and the New Jersey subset <tt>dat_NJ</tt> are available in your working environment.

**Instructions:**

- Estimate $\\beta_1$ in the model above using OLS. Save the estimated model to <tt>emp_mod</tt>. 

- Obtain a robust summary of the results and interpret your findings.

<iframe src="DCL/ex13_5.html" frameborder="0" scrolling="no" style="width:100%;height:340px"></iframe>

**Hints:**

- Remember that dependencies of the <tt>AER</tt> package include functions for robust inference on regression models.

- The argument <tt>subset</tt> in <tt>lm()</tt> takes a logical vector which identifies observations used for estimation.

</div>')}
```

```{r, echo=F, purl=F, results='asis'}
if (my_output == "html") {
  cat('
<div  class = "DCexercise">

#### 6. A Difference Estimate using Data from Card & Krueger (1994) --- II {-}

The estimate obtained using <tt>t.test()</tt> on the New Jersey subset in Exercise 3 and the OLS estimate of $\\hat\\beta_1$ in Exercise 5 are numercially the same. This also holds for the reported $t$-statistics if the same standard error formulas are used (<tt>t.test(..., var.equal = T)</tt> and <tt>coeftest(..., vcov. = vcovHC, type = "HC1")</tt>).

This exercise asks you to check that the above statement is true. 

The data from the previous exercises, the result of <tt>t.test(...)</tt> from Exercise 3 as well as the regression model object <tt>emp_mod</tt> from Exercise 5 are available in your working environment. The <tt>AER</tt> package has been attached.

*No submission correctness tests are run.*

**Instructions:**

- Check that the estimate of $\\beta_1$ in Exercise 5 is equal to the estimated difference in mean employment of New Jersey fastfood restaurants before and after the minimum wage increase from Exercise 3. 

- Convince yourself that the $t$-statistics reported by <tt>coeftest(...)</tt> in Exercise 5 and <tt>t.test(...)</tt> in Exercise 3 match.

<iframe src="DCL/ex13_6.html" frameborder="0" scrolling="no" style="width:100%;height:340px"></iframe>

</div>')}
```

```{r, echo=F, purl=F, results='asis'}
if (my_output == "html") {
  cat('
<div  class = "DCexercise">

#### 7. A Difference-in-Differences Estimate --- II {-}

As mentioned in Chapter \\@ref(aattggoe), the approach discussed in Exercises 5 and 6 is naive: $\\hat\\beta_1$ is a biased estimate of the average effect of the minimum wage increase on employment because we cannot control for other determinants of employment that correlate with $D_t$. As an example, think about macro-economic developments which have a positive impact on the labor market such that employment is higher in the period after the minimum wage increase. It is likely that $D_t$ is positively correlated with the error term such that $\\hat\\beta_1$ *overestimates* the effect of the wage hike on employment.

This motivates usage of the difference-in-differences (DID) estimator outlined in Chapter \\@ref(aattggoe).

Consider the liner regression model

$$employment_{i,t} = \\beta_0 + \\beta_1 D_t + \\beta_2 state_i + \\beta_3 (D_t \\times state_i) + \\varepsilon_{i,t},$$

where we use indices $i$ and $t$ just as in the simple regression model in Exercise 5. 
In this model, $\\beta_3$ is the coefficient we are interested in as it is interpreted as the average difference in employment of New Jersey fastfood restaurants before and after the wage increase after controlling for unobservables which are common to New Jersey and Pennsylvania, the control group. The OLS estimator of $\\beta_3$ is called a DID estimator. 

**Instructions:**

- Estimate the above model using OLS and obtain a robust summary.

- Interpret your findings.

<iframe src="DCL/ex13_7.html" frameborder="0" scrolling="no" style="width:100%;height:340px"></iframe>

</div>')}
```