10-ch10.Rmd

# Regression with Panel Data {#rwpd}

Regression using panel data may mitigate omitted variable bias when there is no information on variables that correlate with both the regressors of interest and the dependent variable and if these variables are constant in the time dimension or across entities. 
When panel data is available, panel regression methods can be used to improve upon multiple regression models. This is because multiple regression models may produce results that lack internal validity in such a setting, as discussed in Chapter \@ref(asbomr).

This chapter covers the following topics:

- notation for panel data,
- fixed effects regression using time and/or entity fixed effects,
- computation of standard errors in fixed effects regression models.

Following the book, for applications we make use of the dataset `r ttcode("Fatalities")` from the `r ttcode("AER")` package [@R-AER] which is a panel dataset reporting annual state level observations on U.S. traffic fatalities for the period 1982 through 1988. The applications analyze if there are effects of alcohol taxes and drunk driving laws on road fatalities and, if present, *how strong* these effects are.

We introduce `r ttcode("plm()")`, a convenient `r ttcode("R")` function that enables us to estimate linear panel regression models which comes with the package `r ttcode("plm")` [@R-plm]. Usage of `r ttcode("plm()")` is very similar as for the function `r ttcode("lm()")` which we have used throughout the previous chapters for estimation of simple and multiple regression models.

The following packages and their dependencies are needed for reproduction of the code chunks presented throughout this chapter on your computer:

+ `r ttcode("AER")`
+ `r ttcode("plm")` 
+ `r ttcode("stargazer")`

Check whether the following code chunk runs without any errors.

```{r, warning=FALSE, message=FALSE, eval=FALSE}
library(AER)
library(plm)
library(stargazer)
```

## Panel Data

```{r, eval = my_output == "html", results='asis', echo=F, purl=F}
cat('
<div class = "keyconcept" id="KC10.1">
<h3 class = "right"> Key Concept 10.1 </h3>
<h3 class = "left"> Notation for Panel Data </h3>
In contrast to cross-section data where we have observations on $n$ subjects (entities), panel data has observations on $n$ entities at $T\\geq2$ time periods. This is denoted as

$$(X_{it},Y_{it}), \\ i=1,\\dots,n \\ \\ \\ \\text{and} \\ \\ \\ t=1,\\dots,T $$
where the index $i$ refers to the entity while $t$ refers to the time period.
</div>
')
```

```{r, eval = my_output == "latex", results='asis', echo=F, purl=F}
cat('\\begin{keyconcepts}[Notation for Panel Data]{10.1}
In contrast to cross-section data where we have observations on $n$ subjects (entities), panel data has observations on $n$ entities at $T\\geq2$ time periods. This is denoted as $$(X_{it},Y_{it}), \\ i=1,\\dots,n \\ \\ \\ \\text{and} \\ \\ \\ t=1,\\dots,T $$
where the index $i$ refers to the entity while $t$ refers to the time period.
\\end{keyconcepts}
')
```

Sometimes panel data is also called longitudinal data as it adds a temporal dimension to cross-sectional data. Let us have a look at the dataset `r ttcode("Fatalities")` by checking its structure and listing the first few observations.

```{r, warning=FALSE, message=FALSE}
# load the packagees and the dataset
library(AER)
library(plm)
data(Fatalities)
# pdata.frame() declares the data as panel data.
Fatalities<- pdata.frame(Fatalities, index = c("state", "year"))
```

```{r, warning=FALSE, message=FALSE}
# obtain the dimension and inspect the structure
is.data.frame(Fatalities)
dim(Fatalities)
```


```{r}
str(Fatalities)
```


```{r, warning=FALSE, message=FALSE}
# list the first few observations
head(Fatalities)
```


```{r, warning=FALSE, message=FALSE}
# summarize the variables 'state' and 'year'
summary(Fatalities[, c(1, 2)])
```

We find that the dataset consists of 336 observations on 34 variables. Notice that the variable `r ttcode("state")` is a factor variable with 48 levels (one for each of the 48 contiguous federal states of the U.S.). 
The variable `r ttcode("year")` is also a factor variable that has 7 levels identifying the time period when the observation was made. This gives us $7\times48 = 336$ observations in total. Since all variables are observed for all entities and over all time periods, the panel is *balanced*. If there were missing data for at least one entity in at least one time period we would call the panel *unbalanced*.

#### Example: Traffic Deaths and Alcohol Taxes {-}

We start by reproducing Figure 10.1 of the book. To this end we estimate simple regressions using data for years 1982 and 1988 that model the relationship between beer tax (adjusted for 1988 dollars) and the traffic fatality rate, measured as the number of fatalities per 10000 inhabitants. Afterwards, we plot the data and add the corresponding estimated regression functions. 

```{r}
# define the fatality rate
Fatalities$fatal_rate <- Fatalities$fatal / Fatalities$pop * 10000

# subset the data
Fatalities1982 <- subset(Fatalities, year == "1982")
Fatalities1988 <- subset(Fatalities, year == "1988")
```

```{r, warning=FALSE, message=FALSE}
# estimate simple regression models using 1982 and 1988 data
fatal1982_mod <- lm(fatal_rate ~ beertax, data = Fatalities1982)
fatal1988_mod <- lm(fatal_rate ~ beertax, data = Fatalities1988)

coeftest(fatal1982_mod, vcov. = vcovHC, type = "HC1")
coeftest(fatal1988_mod, vcov. = vcovHC, type = "HC1")
```

The estimated regression functions are
\begin{align*}
  \widehat{FatalityRate} =& \, \underset{(0.15)}{2.01} + \underset{(0.13)}{0.15} \times BeerTax \quad (1982 \text{ data}), \\
  \widehat{FatalityRate} =& \, \underset{(0.11)}{1.86} + \underset{(0.13)}{0.44} \times BeerTax \quad (1988 \text{ data}).
\end{align*}


```{r}
# plot the observations and add the estimated regression line for 1982 data
plot(x = as.double(Fatalities1982$beertax), 
     y = as.double(Fatalities1982$fatal_rate), 
     xlab = "Beer tax (in 1988 dollars)",
     ylab = "Fatality rate (fatalities per 10000)",
     main = "Traffic Fatality Rates and Beer Taxes in 1982",
     ylim = c(0, 4.5),
     pch = 20, 
     col = "steelblue")

abline(fatal1982_mod, lwd = 1.5, col="darkred")
legend("topright",lty=1,col="darkred","Estimated Regression Line")

# plot observations and add estimated regression line for 1988 data
plot(x = as.double(Fatalities1988$beertax), 
     y = as.double(Fatalities1988$fatal_rate), 
     xlab = "Beer tax (in 1988 dollars)",
     ylab = "Fatality rate (fatalities per 10000)",
     main = "Traffic Fatality Rates and Beer Taxes in 1988",
     ylim = c(0, 4.5),
     pch = 20, 
     col = "steelblue")

abline(fatal1988_mod, lwd = 1.5,col="darkred")
legend("bottomright",lty=1,col="darkred","Estimated Regression Line")
```


In both plots, each point represents observations of beer tax and fatality rate for a given state in the respective year. The regression results indicate a positive relationship between the beer tax and the fatality rate for both years. The estimated coefficient on beer tax for the 1988 data is almost three times as large as for the 1982 dataset. This is contrary to our expectations: alcohol taxes are supposed to *lower* the rate of traffic fatalities. As we known from Chapter \@ref(rmwmr), this is possibly due to omitted variable bias, since both models do not include any covariates, e.g., economic conditions. This could be corrected by using a multiple regression approach. However, this cannot account for omitted *unobservable* factors that differ from state to state but can be assumed to be constant over the observation span, e.g., the populations' attitude towards drunk driving. As shown in the next section, panel data allow us to hold such factors constant.

## Panel Data with Two Time Periods: "Before and After" Comparisons {#PDWTTP}

Suppose there are only $T=2$ time periods $t=1982,1988$. This allows us to analyze differences in changes of the fatality rate from year 1982 to 1988. We start by considering the population regression model $$FatalityRate_{it} = \beta_0 + \beta_1 BeerTax_{it} + \beta_2 Z_{i} + u_{it}$$ where the $Z_i$ are state specific characteristics that differ between states but are *constant over time*. For $t=1982$ and $t=1988$ we have
\begin{align*}
  FatalityRate_{i1982} =&\, \beta_0 + \beta_1 BeerTax_{i1982} + \beta_2 Z_i + u_{i1982}, \\
  FatalityRate_{i1988} =&\, \beta_0 + \beta_1 BeerTax_{i1988} + \beta_2 Z_i + u_{i1988}.
\end{align*}

We can eliminate the $Z_i$ by regressing the difference in the fatality rate between 1988 and 1982 on the difference in beer tax between those years:
$$FatalityRate_{i1988} - FatalityRate_{i1982} = \beta_1 (BeerTax_{i1988} - BeerTax_{i1982}) + u_{i1988} - u_{i1982}.$$
This regression model, where the difference in fatality rate between 1988 and 1982 is regressed on the difference in beer tax between those years, yields an estimate for $\beta_1$ that is robust to a possible bias due to omission of $Z_i$, as these influences are eliminated from the model. Next we will use `r ttcode("R")` to estimate a regression based on the differenced data and to plot the estimated regression function.

```{r}
# compute the differences 
diff_fatal_rate <- Fatalities1988$fatal_rate - Fatalities1982$fatal_rate
diff_beertax <- Fatalities1988$beertax - Fatalities1982$beertax

# estimate a regression using differenced data
fatal_diff_mod <- lm(diff_fatal_rate ~ diff_beertax)

coeftest(fatal_diff_mod, vcov = vcovHC, type = "HC1")
```

Including the intercept allows for a change in the mean fatality rate in the time between 1982 and 1988 in the absence of a change in the beer tax.

We obtain the OLS estimated regression function $$\widehat{FatalityRate_{i1988} - FatalityRate_{i1982}} = -\underset{(0.065)}{0.072} -\underset{(0.36)}{1.04} \times (BeerTax_{i1988}-BeerTax_{i1982}).$$


```{r, fig.align='center'}
# plot the differenced data
plot(x = as.double(diff_beertax), 
     y = as.double(diff_fatal_rate), 
     xlab = "Change in beer tax (in 1988 dollars)",
     ylab = "Change in fatality rate (fatalities per 10000)",
     main = "Changes in Traffic Fatality Rates and Beer Taxes in 1982-1988",
     cex.main=1,
     xlim = c(-0.6, 0.6),
     ylim = c(-1.5, 1),
     pch = 20, 
     col = "steelblue")

# add the regression line to plot
abline(fatal_diff_mod, lwd = 1.5,col="darkred")
#add legend
legend("topright",lty=1,col="darkred","Estimated Regression Line")
```


The estimated coefficient on beer tax is now negative and significantly different from zero at $5\%$. Its interpretation is that raising the beer tax by $\$1$ causes traffic fatalities to decrease by $1.04$ per $10000$ people. This is rather large as the average fatality rate is approximately $2$ persons per $10000$ people.

```{r}
# compute mean fatality rate over all states for all time periods
mean(Fatalities$fatal_rate)
```

Once more this outcome is likely to be a consequence of omitting factors in the single-year regression that influence the fatality rate and are correlated with the beer tax *and* change over time. The message is that we need to be more careful and control for such factors before drawing conclusions about the effect of a raise in beer taxes.

The approach presented in this section discards information for years $1983$ to $1987$. The fixed effects method that allows us to use data for more than $T = 2$ time periods and enables us to add control variables to the analysis.

## Fixed Effects Regression

Consider the panel regression model

$$Y_{it} = \beta_0 + \beta_1 X_{it} + \beta_2 Z_i +  u_{it}$$
where the $Z_i$ are unobserved time-invariant heterogeneities across the entities $i=1,\dots,n$. We aim to estimate $\beta_1$, the effect on $Y_i$ of a change in $X_i$ holding constant $Z_i$. Letting $\alpha_i = \beta_0 + \beta_2 Z_i$ we obtain the model
\begin{align}
Y_{it} = \alpha_i + \beta_1 X_{it} + u_{it} (\#eq:femodel).
\end{align}
Having individual specific intercepts $\alpha_i$, $i=1,\dots,n$, where each of these can be understood as the fixed effect of entity $i$, this model is called the *fixed effects model*. 
The variation in the $\alpha_i$, $i=1,\dots,n$ comes from the $Z_i$. \@ref(eq:femodel) can be rewritten as a regression model containing $n-1$ dummy regressors and a constant:
\begin{align}
Y_{it} = \beta_0 + \beta_1 X_{it} + \gamma_2 D2_i + \gamma_3 D3_i + \cdots + \gamma_n Dn_i + u_{it} (\#eq:drmodel).
\end{align}
Model \@ref(eq:drmodel) has $n$ different intercepts --- one for every entity. \@ref(eq:femodel) and \@ref(eq:drmodel) are equivalent representations of the fixed effects model (Note: $\beta_0$ is intercept of  the fixed effect model in equation 10.2).

The fixed effects  model can be generalized to contain more than just one determinant of $Y$ that is correlated with $X$ and changes over time. Key Concept 10.2 presents the generalized fixed effects regression model.

```{r, eval = my_output == "html", results='asis', echo=F, purl=F}
cat('
<div class = "keyconcept" id="KC10.2">
<h3 class = "right"> Key Concept 10.2 </h3>
<h3 class = "left"> The Fixed Effects Regression Model </h3>

The fixed effects regression model is

\\begin{align}
Y_{it} = \\beta_1 X_{1,it} + \\cdots + \\beta_k X_{k,it} + \\alpha_i + u_{it} (\\#eq:gfemodel)
\\end{align}

with $i=1,\\dots,n$ and $t=1,\\dots,T$. The $\\alpha_i$ are entity-specific intercepts that capture heterogeneities across entities. An equivalent representation of this model is given by

\\begin{align}
Y_{it} = \\beta_0 + \\beta_1 X_{1,it} + \\cdots + \\beta_k X_{k,it} + \\gamma_2 D2_i + \\gamma_3 D3_i + \\cdots + \\gamma_n Dn_i  + u_{it} (\\#eq:gdrmodel)
\\end{align}

where the $D2_i,D3_i,\\dots,Dn_i$ are dummy variables.

</div>
')
```

```{r, eval = my_output == "latex", results='asis', echo=F, purl=F}
cat('\\begin{keyconcepts}[The Fixed Effects Regression Model]{10.2}
The fixed effects regression model is
\\begin{align}
Y_{it} = \\beta_1 X_{1,it} + \\cdots + \\beta_k X_{k,it} + \\alpha_i + u_{it} \\label{eq:gfemodel}
\\end{align}
with $i=1,\\dots,n$ and $t=1,\\dots,T$. The $\\alpha_i$ are entity-specific intercepts that capture heterogeneities across entities. An equivalent representation of this model is given by
\\begin{align}
Y_{it} = \\beta_0 + \\beta_1 X_{1,it} + \\cdots + \\beta_k X_{k,it} + \\gamma_2 D2_i + \\gamma_3 D3_i + \\cdots + \\gamma_n Dn_i  + u_{it} \\label{eq:gdrmodel}
\\end{align}
where the $D2_i,D3_i,\\dots,Dn_i$ are dummy variables.
\\end{keyconcepts}
')
```

### Estimation and Inference {-}

Software packages use a so-called "entity-demeaned" OLS algorithm which is computationally more efficient than estimating regression models with $k+n$ regressors as needed for models \@ref(eq:gfemodel) and \@ref(eq:gdrmodel).

Taking averages across time on both sides of \@ref(eq:femodel) we obtain
\begin{align*}
\frac{1}{T} \sum_{t=1}^T Y_{it} =& \, \beta_1 \frac{1}{T} \sum_{t=1}^T X_{it} + \alpha_i + \frac{1}{T} \sum_{t=1}^T u_{it} \\
\overline{Y}_i =& \, \beta_1 \overline{X}_i + \alpha_i + \overline{u}_i. 
\end{align*}
Subtraction from \@ref(eq:femodel) yields
\begin{align}
\begin{split}
Y_{it} - \overline{Y}_i =& \, \beta_1(X_{it}-\overline{X}_i) + (u_{it} - \overline{u}_i) \\
\overset{\sim}{Y}_{it} =& \, \beta_1 \overset{\sim}{X}_{it} + \overset{\sim}{u}_{it}. 
\end{split} (\#eq:edols)
\end{align}
In this model, the OLS estimate of the parameter of interest $\beta_1$ is equal to the estimate obtained using \@ref(eq:drmodel) --- without the need to estimate $n-1$ dummies and an intercept. 

We conclude that there are two ways of estimating $\beta_1$ in the fixed effects regression:

1. OLS of the dummy regression model as shown in \@ref(eq:drmodel). 

2. OLS using the entity demeaned data as in \@ref(eq:edols).

Provided that the fixed effects regression assumptions stated in Key Concept 10.3 hold, the sampling distribution of the OLS estimator in the fixed effects regression model is normal in large samples. The variance of the estimates can be estimated and we can compute standard errors, $t$-statistics and confidence intervals for coefficients. In the next section, we see how to estimate a fixed effects model using `r ttcode("R")` and how to obtain a model summary that reports heteroskedasticity-robust standard errors. We leave aside complicated formulas of the estimators. See Chapter 10.5 and Appendix 10.2 of the book for a discussion of theoretical aspects.

### Application to Traffic Deaths {-}

Following Key Concept 10.2, the simple fixed effects model for estimation of the relation between traffic fatality rates and the beer taxes is
\begin{align}
FatalityRate_{it} = \beta_1 BeerTax_{it} + StateFixedEffects + u_{it}, (\#eq:fatsemod)
\end{align}
a regression of the traffic fatality rate on beer tax and 48 binary regressors --- one for each federal state.

We can simply use the function `r ttcode("lm()")` to obtain an estimate of $\beta_1$.

```{r}
fatal_fe_lm_mod <- lm(fatal_rate ~ beertax + state - 1, data = Fatalities)
fatal_fe_lm_mod
```

As discussed in the previous section, it is also possible to estimate $\beta_1$ by applying OLS to the demeaned data, that is, to run the regression

$$\overset{\sim}{FatalityRate} = \beta_1 \overset{\sim}{BeerTax}_{it} + u_{it}. $$

```{r, eval=F}
# obtain demeaned data
fatal_demeaned <- with(Fatalities,
            data.frame(fatal_rate = fatal_rate - ave(fatal_rate, state),
            beertax = beertax - ave(beertax, state)))

# estimate the regression
summary(lm(fatal_rate ~ beertax - 1, data = fatal_demeaned))
```

The function `r ttcode("ave")` is convenient for computing group averages. We use it to obtain state specific averages of the fatality rate and the beer tax.

Alternatively one may use `r ttcode("plm()")` from the package with the same name. 

```{r, eval=-2, message=F, warning=F}
# install and load the 'plm' package
install.packages("plm")
library(plm)
```

As for `r ttcode("lm()")` we have to specify the regression formula and the data to be used in our call of `r ttcode("plm()")`. Additionally, it is required to pass a vector of names of entity and time ID variables to the argument `r ttcode("index")`. For `r ttcode("Fatalities")`, the ID variable for entities is named `r ttcode("state")` and the time id variable is `r ttcode("year")`. Since the fixed effects estimator is also called the *within estimator*, we set `r ttcode('model = "within"')`. Finally, the function `r ttcode("coeftest()")` allows to obtain inference based on robust standard errors.

```{r}
# estimate the fixed effects regression with plm()
fatal_fe_mod <- plm(fatal_rate ~ beertax, 
                    data = Fatalities,
                    index = c("state", "year"), 
                    model = "within")


coeftest(fatal_fe_mod, vcov. = vcovHC, type = "HC1")
```

The estimated coefficient is again $-0.6559$. Note that `r ttcode("plm()")` uses the entity-demeaned OLS algorithm and thus does not report dummy coefficients. The estimated regression function is 

\begin{align}
\widehat{FatalityRate} = -\underset{(0.29)}{0.66} \times BeerTax + StateFixedEffects. (\#eq:efemod)
\end{align}

The coefficient on $BeerTax$ is negative and significant. The interpretation is that the estimated reduction in traffic fatalities due to an increase in the real beer tax by $\$1$ is $0.66$ per $10000$ people, which is still pretty high. Although including state fixed effects eliminates the risk of a bias due to omitted factors that vary across states but not over time, we suspect that there are other omitted variables that vary over time and thus cause a bias.

## Regression with Time Fixed Effects

Controlling for variables that are constant across entities but vary over time can be done by including time fixed effects. If there are *only* time fixed effects, the fixed effects regression model becomes $$Y_{it} = \beta_0 + \beta_1 X_{it} + \delta_2 B2_t + \cdots + \delta_T BT_t + u_{it},$$ where only $T-1$ dummies are included ($B1$ is omitted) since the model includes an intercept. This model eliminates omitted variable bias caused by excluding unobserved variables that evolve over time but are constant across entities.

In some applications it is meaningful to include both entity and time fixed effects. The *entity and time fixed effects  model* is $$Y_{it} = \beta_0 + \beta_1 X_{it} + \gamma_2 D2_i + \cdots + \gamma_n DT_i + \delta_2 B2_t + \cdots + \delta_T BT_t + u_{it} .$$ The combined model allows to eliminate bias from unobservables that change over time but are constant over entities and it controls for factors that differ across entities but are constant over time. Such models can be estimated using the OLS algorithm that is implemented in `r ttcode("R")`. 

The following code chunk shows how to estimate the combined entity and time fixed effects model of the relation between fatalities and beer tax: $$FatalityRate_{it} = \beta_1 BeerTax_{it} + StateEffects + TimeFixedEffects + u_{it},$$ using both `r ttcode("lm()")` and `r ttcode("plm()")`. It is straightforward to estimate this regression with `r ttcode("lm()")` since it is just an extension of \@ref(eq:fatsemod) so we only have to adjust the `r ttcode("formula")` argument by adding the additional regressor `r ttcode("year")` for time fixed effects. In our call of `r ttcode("plm()")` we set another argument `r ttcode('effect = "twoways"')` for inclusion of entity *and* time dummies.  

```{r}
# estimate a combined time and entity fixed effects regression model

# via lm()
fatal_tefe_lm_mod <- lm(fatal_rate ~ beertax + state + year - 1, data = Fatalities)
fatal_tefe_lm_mod

# via plm()
fatal_tefe_mod <- plm(fatal_rate ~ beertax, 
                      data = Fatalities,
                      index = c("state", "year"), 
                      model = "within", 
                      effect = "twoways")

coeftest(fatal_tefe_mod, vcov = vcovHC, type = "HC1")
```

Before discussing the outcomes we convince ourselves that `r ttcode("state")` and `r ttcode("year")` are of the class `r ttcode("factor")` . 

```{r}
# check the class of 'state' and 'year'
class(Fatalities$state)
class(Fatalities$year)
```

The `r ttcode("lm()")` functions converts factors into dummies automatically. Since we exclude the intercept by adding `r ttcode("-1")` to the right-hand side of the regression formula, `r ttcode("lm()")` estimates coefficients for $n + (T-1) = 48 + 6 = 54$ binary variables (6 year dummies and 48 state dummies). Again, `r ttcode("plm()")` only reports the estimated coefficient on $BeerTax$. 

The estimated regression function is
\begin{align}
\widehat{FatalityRate} =  -\underset{(0.35)}{0.64} \times BeerTax + StateEffects + TimeFixedEffects. (\#eq:cbnfemod)
\end{align}
The result $-0.66$ is close to the estimated coefficient for the regression model including only entity fixed effects. Unsurprisingly, the coefficient is less precisely estimated but significantly different from zero at $10\%$.

In view of \@ref(eq:efemod) and \@ref(eq:cbnfemod) we conclude that the estimated relationship between traffic fatalities and the real beer tax is not affected by omitted variable bias due to factors that are constant either over time or across states..

## The Fixed Effects Regression Assumptions and Standard Errors for Fixed Effects Regression {#tferaaseffer}

This section focuses on the entity fixed effects model and presents model assumptions that need to hold in order for OLS to produce unbiased estimates that are normally distributed in large samples. These assumptions are an extension of the assumptions made for the multiple regression model (see Key Concept 6.4) and are given in Key Concept 10.3. We also briefly discuss standard errors in fixed effects models which differ from standard errors in multiple regression as the regression error can exhibit serial correlation in panel models.

```{r, eval = my_output == "html", results='asis', echo=F, purl=F}
cat('
<div class = "keyconcept" id="KC10.3">
<h3 class = "right"> Key Concept 10.3 </h3>
<h3 class = "left"> The Fixed Effects Regression Assumptions </h3>

In the fixed effects model $$ Y_{it} = \\beta_1 X_{it} + \\alpha_i + u_{it} \\ \\ , \\ \\ i=1,\\dots,n, \\ t=1,\\dots,T, $$ we assume the following:

1. The error term $u_{it}$ has conditional mean zero, that is, $E(u_{it}|X_{i1}, X_{i2},\\dots, X_{iT} = 0)$.

2. $(X_{i1}, X_{i2}, \\dots, X_{it}, u_{i1}, \\dots, u_{iT})$, $i=1,\\dots,n$ are i.i.d. draws from their joint distribution.

3. Large outliers are unlikely, i.e., $(X_{it}, u_{it})$ have nonzero finite fourth moments.

4. There is no perfect multicollinearity.

When there are multiple regressors, $X_{it}$ is replaced by $X_{1,it}, X_{2,it}, \\dots, X_{k,it}$.

</div>
')
```

```{r, eval = my_output == "latex", results='asis', echo=F, purl=F}
cat('\\begin{keyconcepts}[The Fixed Effects Regression Assumptions]{10.3}
In the fixed effects regression model $$ Y_{it} = \\beta_1 X_{it} + \\alpha_i + u_{it} \\ \\ , \\ \\ i=1,\\dots,n, \\ t=1,\\dots,T, $$ we assume the following:\\newline

\\begin{enumerate}
\\item The error term $u_{it}$ has conditional mean zero, that is, $E(u_{it}|X_{i1}, X_{i2},\\dots, X_{iT})$.
\\item $(X_{i1}, X_{i2}, \\dots, X_{i3}, u_{i1}, \\dots, u_{iT})$, $i=1,\\dots,n$ are i.i.d. draws from their joint distribution.
\\item Large outliers are unlikely, i.e., $(X_{it}, u_{it})$ have nonzero finite fourth moments.
\\item There is no perfect multicollinearity.
\\end{enumerate}\\vspace{0.5cm}

When there are multiple regressors, $X_{it}$ is replaced by $X_{1,it}, X_{2,it}, \\dots, X_{k,it}$.
\\end{keyconcepts}
')
```

The first assumption is that the error is uncorrelated with *all* observations of the variable $X$ for the entity $i$ over time. If this assumption is violated, we face omitted variables bias. The second assumption ensures that variables are i.i.d. *across* entities $i=1,\dots,n$. This does not require the observations to be uncorrelated *within* an entity. The $X_{it}$ are allowed to be *autocorrelated* within entities. This is a common property of time series data. The same is allowed for errors $u_{it}$. Consult Chapter 10.5 (Stock and Watson) of the book for a detailed explanation for why autocorrelation is plausible in panel applications. The second assumption is justified if the entities are selected by simple random sampling. The third and fourth assumptions are analogous to the multiple regression assumptions made in Key Concept 6.4.

#### Standard Errors for Fixed Effects Regression {-}

Similar as for heteroskedasticity, autocorrelation invalidates the usual standard error formulas as well as heteroskedasticity-robust standard errors since these are derived under the assumption that there is no autocorrelation. When there is both heteroskedasticity *and* autocorrelation, the so-called *heteroskedasticity and autocorrelation-consistent (HAC) standard errors* need to be used. *Clustered standard errors* belong to these type of standard errors. They allow for heteroskedasticity and autocorrelated errors within an entity but *not* correlation across entities. 

As shown in the examples throughout this chapter, it is fairly easy to specify usage of clustered standard errors in regression summaries produced by functions like `r ttcode("coeftest()")` in conjunction with `r ttcode("vcovHC()")` from the package `r ttcode("sandwich")`. Conveniently, `r ttcode("vcovHC()")` recognizes panel model objects (objects of class `r ttcode("plm")`) and computes clustered standard errors by default.  

The regressions conducted in this chapter are good examples for why usage of clustered standard errors is crucial in empirical applications of fixed effects models. For example, consider the entity and time fixed effects model for fatalities. Since `r ttcode("fatal_tefe_lm_mod")` is an object of class `r ttcode("lm")`, `r ttcode("coeftest()")` does not compute clustered standard errors but uses robust standard errors that are only valid in the absence of autocorrelated errors.

```{r}
# check class of the model object
class(fatal_tefe_lm_mod)

# obtain a summary based on heteroskedasticity-robust standard errors 
# (no adjustment for heteroskedasticity only)
coeftest(fatal_tefe_lm_mod, vcov = vcovHC, type = "HC1")[1, ]

# check class of the (plm) model object
class(fatal_tefe_mod)

# obtain a summary based on clustered standard errors 
# (adjustment for autocorrelation + heteroskedasticity)
coeftest(fatal_tefe_mod, vcov = vcovHC, type = "HC1")
```

The outcomes differ rather strongly: imposing no autocorrelation we obtain a standard error of $0.25$ which implies significance of $\hat\beta_1$, the coefficient on $BeerTax$ at the level of $5\%$. On the contrary, using the clustered standard error $0.35$ results in a failure to reject the  null hypothesis $H_0: \beta_1 = 0$ at the same level, see equation \@ref(eq:cbnfemod). Consult Appendix 10.2 of the book (Stock and Watson) for insights on the computation of clustered standard errors.   

## Drunk Driving Laws and Traffic Deaths

There are two major sources of omitted variable bias that are not accounted for by all of the models of the relation between traffic fatalities and beer taxes that we have considered so far: economic conditions and driving laws. Fortunately, `r ttcode("Fatalities")` has data on state-specific legal drinking age (`r ttcode ("drinkage")`), punishment (`r ttcode("jail")`, `r ttcode("service")`) and various economic indicators like unemployment rate (`r ttcode("unemp")`) and per capita income (`r ttcode("income")`). We may use these covariates to extend the preceding analysis. 

These covariates are defined as follows:

- `r ttcode("unemp")`: a numeric variable stating the state specific unemployment rate.
- `r ttcode("log(income)")`: the logarithm of real per capita income (in 1988 dollars).
- `r ttcode("miles")`: the state average miles per driver.
- `r ttcode("drinkage")`: the state specific minimum legal drinking age.
- `r ttcode("drinkagec")`: a discretized version of `r ttcode("drinkage")` that classifies states into four categories of minimal drinking age; $18$, $19$, $20$, $21$ and older. `r ttcode("R")` denotes this as `r ttcode("[18,19)")`, `r ttcode("[19,20)")`, `r ttcode("[20,21)")` and `r ttcode("[21,22]")`. These categories are included as dummy regressors where `r ttcode("[21,22]")` is chosen as the reference category.
- `r ttcode("punish")`: a dummy variable with levels `r ttcode("yes")` and `r ttcode("no")` that measures if drunk driving is severely punished by mandatory jail time or mandatory community service (first conviction).

At first, we define the variables according to the regression results presented in Table 10.1 of the book. 

```{r}
# discretize the minimum legal drinking age
Fatalities$drinkagec <- cut(Fatalities$drinkage,
                            breaks = 18:22, 
                            include.lowest = TRUE, 
                            right = FALSE)

# set minimum drinking age [21, 22] to be the baseline level
Fatalities$drinkagec <- relevel(Fatalities$drinkagec, "[21,22]")

# mandatory jail or community service?
Fatalities$punish <- with(Fatalities, factor(jail == "yes" | service == "yes", 
                                             labels = c("no", "yes")))

# the set of observations on all variables for 1982 and 1988
fatal_1982_1988 <- Fatalities[with(Fatalities, year == 1982 | year == 1988), ]
```

Next, we estimate all seven models using `r ttcode("plm()")`.

```{r}
# estimate all seven models
fat_mod1 <- lm(fatal_rate ~ beertax, data = Fatalities)

fat_mod2 <- plm(fatal_rate ~ beertax + state, data = Fatalities)

fat_mod3 <- plm(fatal_rate ~ beertax + state + year,
                       index = c("state","year"),
                       model = "within",
                       effect = "twoways", 
                       data = Fatalities)

fat_mod4 <- plm(fatal_rate ~ beertax + state + year + drinkagec 
                       + punish + miles + unemp + log(income), 
                       index = c("state", "year"),
                       model = "within",
                       effect = "twoways",
                       data = Fatalities)

fat_mod5 <- plm(fatal_rate ~ beertax + state + year + drinkagec 
                       + punish + miles,
                       index = c("state", "year"),
                       model = "within",
                       effect = "twoways",
                       data = Fatalities)

fat_mod6 <- plm(fatal_rate ~ beertax + year + drinkage 
                       + punish + miles + unemp + log(income), 
                       index = c("state", "year"),
                       model = "within",
                       effect = "twoways",
                       data = Fatalities)

fat_mod7 <- plm(fatal_rate ~ beertax + state + year + drinkagec 
                       + punish + miles + unemp + log(income), 
                       index = c("state", "year"),
                       model = "within",
                       effect = "twoways",
                       data = fatal_1982_1988)
```

We again use `r ttcode("stargazer()")` [@R-stargazer] to generate a comprehensive tabular presentation of the results.

```{r, message=F, warning=F, results='asis', eval=F}
library(stargazer)

# gather clustered standard errors in a list
rob_se <- list(sqrt(diag(vcovHC(fat_mod1, type = "HC1"))),
               sqrt(diag(vcovHC(fat_mod2, type = "HC1"))),
               sqrt(diag(vcovHC(fat_mod3, type = "HC1"))),
               sqrt(diag(vcovHC(fat_mod4, type = "HC1"))),
               sqrt(diag(vcovHC(fat_mod5, type = "HC1"))),
               sqrt(diag(vcovHC(fat_mod6, type = "HC1"))),
               sqrt(diag(vcovHC(fat_mod7, type = "HC1"))))

# generate the table
stargazer(fat_mod1, 
          fat_mod2, 
          fat_mod3, 
          fat_mod4, 
          fat_mod5, 
          fat_mod6, 
          fat_mod7, 
          digits = 3,
          header = FALSE,
          type = "latex", 
          se = rob_se,
          title = "Linear Panel Regression Models of Traffic Fatalities
                                                      due to Drunk Driving",
          model.numbers = FALSE,
          column.labels = c("(1)", "(2)", "(3)", "(4)", "(5)", "(6)", "(7)"))
```

<!--html_preserve-->

```{r, message=F, warning=F, results='asis', echo=F, eval=my_output == "html"}
library(stargazer)

rob_se <- list(
  sqrt(diag(vcovHC(fat_mod1, type="HC1"))),
  sqrt(diag(vcovHC(fat_mod2, type="HC1"))),
  sqrt(diag(vcovHC(fat_mod3, type="HC1"))),
  sqrt(diag(vcovHC(fat_mod4, type="HC1"))),
  sqrt(diag(vcovHC(fat_mod5, type="HC1"))),
  sqrt(diag(vcovHC(fat_mod6, type="HC1"))),
  sqrt(diag(vcovHC(fat_mod7, type="HC1")))
)

stargazer(fat_mod1, fat_mod2, fat_mod3, fat_mod4, fat_mod5, fat_mod6, fat_mod7, 
          digits = 3,
          type = "html",
          header = FALSE,
          se = rob_se,
          dep.var.caption = "Dependent Variable: Fatality Rate",
          model.numbers = FALSE,
          column.labels = c("(1)", "(2)", "(3)", "(4)", "(5)", "(6)", "(7)")
          )

stargazer_html_title("Linear Panel Regression Models of Traffic Fatalities due to Drunk Driving", "lprmotfdtdd")
```

<!--/html_preserve-->

```{r, message=F, warning=F, results='asis', echo=F, eval=my_output == "latex"}
library(stargazer)

rob_se <- list(
  sqrt(diag(vcovHC(fat_mod1, type="HC1"))),
  sqrt(diag(vcovHC(fat_mod2, type="HC1"))),
  sqrt(diag(vcovHC(fat_mod3, type="HC1"))),
  sqrt(diag(vcovHC(fat_mod4, type="HC1"))),
  sqrt(diag(vcovHC(fat_mod5, type="HC1"))),
  sqrt(diag(vcovHC(fat_mod6, type="HC1"))),
  sqrt(diag(vcovHC(fat_mod7, type="HC1")))
)

stargazer(fat_mod1, fat_mod2, fat_mod3, fat_mod4, fat_mod5, fat_mod6, fat_mod7, 
          digits = 3,
          type = "latex",
          float.env = "sidewaystable",
          column.sep.width = "-5pt",
          se = rob_se,
          header = FALSE,
          model.names = FALSE,
          column.labels = c('OLS','','','Linear Panel Regression'),
          omit.stat = "f",
          title = "\\label{tab:lprmotfdtdd} Linear Panel Regression Models of Traffic Fatalities due to Drunk Driving")
```

While columns (2) and (3) recap the results \@ref(eq:efemod) and \@ref(eq:cbnfemod), column (1) presents an estimate of the coefficient of interest in the naive OLS regression of the fatality rate on beer tax without any fixed effects. We obtain a *positive* estimate for the coefficient on beer tax that is likely to be upward biased. The model fit is rather bad, too ($\bar{R}^2 = 0.091$). The sign of the estimate changes as we extend the model by both entity and time fixed effects in models (2) and (3). Furthermore $\bar{R}^2$ increases substantially as fixed effects are included in the model equation. Nonetheless, as discussed before, the magnitudes of both estimates may be too large. 

The model specifications (4) to (7) include covariates that shall capture the effect of economic conditions overall state as well as the legal framework. Considering (4) as the baseline specification, we observe four interesting results:

1. Including the covariates does not lead to a major reduction of the estimated effect of the beer tax. The coefficient is not significantly different from zero at the level of $5\%$ as the estimate is rather imprecise.

2. The minimum legal drinking age *does not* have an effect on traffic fatalities: none of the three dummy variables are significantly different from zero at any common level of significance. Moreover, an $F$-Test of the joint hypothesis that all three coefficients are zero does not reject the null hyptothesis. The next code chunk shows how to test this hypothesis.

```{r}
# test if legal drinking age has no explanatory power
linearHypothesis(fat_mod4,
                 test = "F",
                 c("drinkagec[18,19)=0", "drinkagec[19,20)=0", "drinkagec[20,21)"), 
                 vcov. = vcovHC, type = "HC1")
```

3. There is no evidence that punishment for first offenders has a deterring effects on drunk driving: the corresponding coefficient is not significant at the $10\%$ level.

4. The economic variables significantly explain traffic fatalities. We can check that the employment rate and per capita income are jointly significant at the level of $0.1\%$.
```{r}
# test if economic indicators have no explanatory power
linearHypothesis(fat_mod4, 
                 test = "F",
                 c("log(income)", "unemp"), 
                 vcov. = vcovHC, type = "HC1")
```

Model (5) omits the economic factors. The result supports the notion that economic indicators should remain in the model as the coefficient on beer tax is sensitive to the inclusion of the latter.

Results for model (6) demonstrate that the legal drinking age has little explanatory power and that the coefficient of interest is not sensitive to changes in the functional form of the relation between drinking age and traffic fatalities. 

Specification (7) reveals that reducing the amount of available information (we only use 95 observations for the period 1982 to 1988 here) inflates standard errors but does not lead to drastic changes in coefficient estimates. 

#### Summary {-}

We have not found evidence that severe punishments and increasing the minimum drinking age reduce traffic fatalities due to drunk driving. Nonetheless, there seems to be a negative effect of alcohol taxes on traffic fatalities which, however, is estimated imprecisely and cannot be interpreted as the causal effect of interest as there still may be a bias. The issue is that there may be omitted variables that differ across states *and* change over time and this bias remains even though we use a panel approach that controls for entity specific and time invariant unobservables.

A powerful method that can be used if common panel regression approaches fail is instrumental variables regression. We will return to this concept in Chapter \@ref(ivr).

## Exercises {#exercises-10}

```{r, echo=F, purl=F, results='asis'}
if (my_output=="html"){
  cat('
For the course of this section, you will work with <tt>Guns</tt>, a balanced panel containing observations on criminal and demographic variables for all US states and the years 1977-1999. The dataset comes with the package <tt>AER</tt> which is already installed for the interactive R exercises below.

<div  class = "DCexercise">

#### 1. The Guns Dataset {-}

**Instructions:**

+ Load both the <tt>AER</tt> package and the <tt>Guns</tt> dataset. 
  
+ Get yourself an overview over the dataset using the <tt>summary()</tt> function. Use <tt>?Guns</tt> for detailed information on the variables.
  
+ Verify that <tt>Guns</tt> is a balanced panel: extract the number of years and states from the dataset and assign them to the predefined variables <tt>years</tt> and <tt>states</tt>, respectively. Afterwards use these variables for a logical comparison: check that the panel is balanced.

<iframe src="DCL/ex10_1.html" frameborder="0" scrolling="no" style="width:100%;height:340px"></iframe>
        
**Hints:**
  
  + Use <tt>library()</tt> and <tt>data()</tt> to attach the package and load the dataset, respectively.
  + Use <tt>summary()</tt> to obtain a comprehensive overview of the dataset.
  + Remember that in a balanced panel the number of entities times the number of years equals the total number of observations in the dataset. The basic functions <tt>levels()</tt>, <tt>length()</tt> and <tt>nrow()</tt> may be useful.
      
</div>')
} else {
  cat('\\begin{center}\\textit{This interactive part of the book is only available in the HTML version.}\\end{center}')
}
```

```{r, echo=F, purl=F, results='asis'}
if (my_output=="html") {
  cat('
<div  class = "DCexercise">
#### 2. Strict or Loose? Gun Laws and the Effect on Crime I {-}
There is a controversial debate whether and if to what extent the right to carry a gun influences crime. Proponents of so-called "Carrying a Concealed Weapon" (CCW) laws argue that the deterrent effect of guns prevents crime, whereas opponents argue that the public availability of guns increases their usage and thus makes it easier to commit crimes. In the following exercises you will empirically investigate this topic.
To begin with consider the following estimated model
$$\\widehat{{\\log(violent_i)}} = 6.135 - 0.443 \\times law_i,$$
with $i=1,\\ldots,51$ where <tt>violent</tt> is the violent crime rate (incidents per 100000 residents) and <tt>law</tt> is a binary variable indicating the implementation of a CCW law (1 = yes, 0 = no), respectively.

The estimated model is available as <tt>model</tt> in your working environment. The packages <tt>AER</tt> and <tt>plm</tt> have been loaded.
      
**Instructions:**
        
  + Extend and estimate the model by including state fixed effects using the function <tt>plm()</tt> and assign the model object to the predefined variable <tt>model_se</tt>. Can you think of an unobserved variable that is captured by this model specification?
  + Print a summary of the model which reports cluster robust standard errors.
  + Test whether the fixed state effects are jointly significant from zero. To do so use the function <tt>pFtest()</tt>. Use <tt>?pFtest</tt> for additional information.

<iframe src="DCL/ex10_2.html" frameborder="0" scrolling="no" style="width:100%;height:340px"></iframe>

**Hints:**
  
  + The function <tt>plm()</tt> allows you to conduct regressions with panel data and works very similar to <tt>lm()</tt>. You have to specify the entity and time indicators inside as a vector using the argument <tt>index</tt> and specify the estimator to be used with the argument <tt>model</tt> (for the fixed effects estimator this is <tt>"within"</tt>).
  
  + As usual you can use <tt>coeftest()</tt> in conjunction with appropriate arguments to obtain a summary output with robust standard errors.
  + <tt>pFtest()</tt> expects two model objects. The first model includes fixed effects, the second does not.
</div>')
}
```

```{r, echo=F, purl=F, results='asis'}
if (my_output=="html") {
  cat('
<div  class = "DCexercise">
#### 3. Strict or Loose? Gun Laws and the Effect on Crime II {-}
As touched upon at the end of the last exercise it is reasonable to also include time effects which is why we now consider the  model
\\begin{align}\\log(violent_i)  & = \\beta_1\\times law_i + \\alpha_i + \\lambda_t + u_i,\\end{align}
for $i=1,\\ldots,51$ and $t=1977,\\ldots,1999$.
The models <tt>model</tt> and <tt>model_se</tt> from the previous exercises are available in your working environment. The packages <tt>AER</tt> and <tt>plm</tt> have been attached.
      
**Instructions:**
        
  + Estimate the model above and assign it to the variable <tt>model_sete</tt> using <tt>plm()</tt>.
  + Print a summary of the model which reports robust standard errors.
  + Test whether both state and time fixed effects are jointly significant.

<iframe src="DCL/ex10_3.html" frameborder="0" scrolling="no" style="width:100%;height:340px"></iframe>

**Hints:**
  
  + To additionally incorparate time fixed effects, one can set the argument <tt>effect="twoways"</tt> inside of <tt>plm()</tt>.
  + Note that we want to test whether the state *and* time fixed effects are jointly significant.
  
</div>')
}
```

```{r, echo=F, purl=F, results='asis'}
if (my_output=="html") {
  cat('
<div  class = "DCexercise">
#### 4. Strict or Loose? Gun Laws and the Effect on Crime III {-}
Despite the evidence for state as well as time effects found in exercise 3, there still might be a bias due to omitted variables such as sociodemographic characteristics. The following model accounts for the latter:
\\begin{align}\\log(violent_i)  & = \\beta_1\\times law_i + \\beta_2\\times prisoners_i + \\beta_3\\times density_i + \\beta_4\\times income_i + \\beta_5\\times population_i \\\\&\\quad + \\beta_6\\times afam_i + \\beta_7\\times cauc_i + \\beta_8\\times male_i + \\alpha_i + \\lambda_t + u_i.\\end{align}
See <tt>?Guns</tt> for detailed information on the additional variables.

The packages <tt>AER</tt> and <tt>plm</tt> have been loaded.
      
**Instructions:**
        
  + Estimate the extended model and assign it to the predefined variable <tt>model_sete_ext</tt>.
  + Print a robust summary of the estimated model. What can you say about the effect of a CCW law?

<iframe src="DCL/ex10_4.html" frameborder="0" scrolling="no" style="width:100%;height:340px"></iframe>

</div>')
}
```

```{r, echo=F, purl=F, results='asis'}
if (my_output=="html") {
  cat('
<div  class = "DCexercise">
#### 5. Fixed Effects Regression - Two Time Periods {-}
Recall the fixed effects model from Exercise 10.2, but now assume that you only have observations for the years 1978 and 1984. Consider the two model specifications
\\begin{align}
\\log(violent_{i1984}) - \\log(violent_{i1978}) = \\beta_{BA}\\times(law_{i1984}-law_{i1978}) + (u_{i1984} - u_{i1978})
\\end{align}
and
\\begin{align}
\\log(violent_{it}) = \\beta_{FE}\\times law_{it} + \\alpha_i + u_{it},\\\\
\\end{align}
with $i=1,\\ldots,51$ and $t=1978,1984$.
In this exercise you need to show that $\\widehat{\\beta}_{BA}=\\widehat{\\beta}_{FE}$.
The subsets of <tt>Guns</tt> for the years 1978 and 1984 are already available as <tt>Guns78</tt> and <tt>Guns84</tt> in your working environment. The packages <tt>AER</tt> and <tt>plm</tt> have been loaded.
      
**Instructions:**
        
  + Compute the differences necessary to estimate the first model and assign them to the variables <tt>diff_logv</tt> and <tt>diff_law</tt>.
  + Estimate both models. Use the differenced data to estimate the first model and <tt>plm()</tt> for the second. 
  + Verify with a logical comparison that both procedures numerically yield the same estimate. Use the variables <tt>coef_diff</tt> and <tt>coef_plm</tt> which contain the relevant coefficients rounded to the fourth decimal place.

<iframe src="DCL/ex10_5.html" frameborder="0" scrolling="no" style="width:100%;height:340px"></iframe>

**Hints:**
  
  + Keep in mind that the dependend variable is log-transformed.
  + You may use <tt>plm()</tt> as in the previous exercises. Note that you only need a subset of the original <tt>Guns</tt> dataset. Theargument <tt>subset</tt> allows to subset the dataset passed to the argument <tt>data</tt>. Alternatively, you can join the two datasets <tt>Guns78</tt> and <tt>Guns84</tt>using, e.g., <tt>rbind()</tt>.
  + Use the logical operator <tt>==</tt> to compare both estimates.
  
</div>')
}
```