05-ch5.Rmd

# Hypothesis Tests and Confidence Intervals in SLR Model {#htaciitslrm}

This chapter continues our treatment of the simple linear regression model. The following subsections discuss how we may use our knowledge about the sampling distribution of the OLS estimator in order to make statements regarding its uncertainty.

These subsections cover the following topics:

- Testing Hypotheses regarding regression coefficients.

- Confidence intervals for regression coefficients.

- Regression when $X$ is a dummy variable.

- Heteroskedasticity and Homoskedasticity.

The packages `r ttcode('AER')` [@R-AER] and `r ttcode('scales')` [@R-scales] are required for reproduction of the code chunks presented throughout this chapter. The package `r ttcode('scales')` provides additional generic plot scaling methods. Make sure both packages are installed before you proceed. The safest way to do so is by checking whether the following code chunk executes without any errors.

```{r, warning=FALSE, message=FALSE, eval=FALSE}
library(AER)
library(scales)
```

## Testing Two-Sided Hypotheses concerning the Slope Coefficient

Using the fact that $\hat{\beta}_1$ is approximately normally distributed in large samples (see [Key Concept 4.4](#tsdotoe)), testing hypotheses about the true value $\beta_1$ can be done as in Chapter \@ref(potsm).

```{r, eval = my_output == "html", results='asis', echo=F, purl=F}
cat('
<div class = "keyconcept" id="KC5.1">
<h3 class = "right"> Key Concept 5.1 </h3>
<h3 class = "left"> General Form of the $t$-Statistic </h3>
Remember from Chapter \\@ref(arosur) that a general $t$-statistic has the form
$$ t = \\frac{\\text{estimated value} - \\text{hypothesized value}}{\\text{standard error of the estimator}}.$$
</div>
')
```

```{r, eval = my_output == "latex", results='asis', echo=F, purl=F}
cat('\\begin{keyconcepts}[General Form of the $t$-Statistic]{5.1}
Remember from Chapter \\ref{arosur} that a general $t$-statistic has the form
$$ t = \\frac{\\text{estimated value} - \\text{hypothesized value}}{\\text{standard error of the estimator}}.$$
\\end{keyconcepts}
')
```

```{r, eval = my_output == "html", results='asis', echo=F, purl=F}
cat('
<div class = "keyconcept" id="KC5.2">
<h3 class = "right"> Key Concept 5.2 </h3>
<h3 class = "left"> Testing Hypotheses regarding $\\beta_1$ </h3>

For testing the hypothesis $H_0: \\beta_1 = \\beta_{1,0}$, we need to perform the following steps:

1. Compute the standard error of $\\hat{\\beta}_1$, $SE(\\hat{\\beta}_1)$

\\[ SE(\\hat{\\beta}_1) = \\sqrt{ \\hat{\\sigma}^2_{\\hat{\\beta}_1} } \\ \\ , \\ \\ 
  \\hat{\\sigma}^2_{\\hat{\\beta}_1} = \\frac{1}{n} \\times \\frac{\\frac{1}{n-2} \\sum_{i=1}^n (X_i - \\overline{X})^2 \\hat{u_i}^2 }{ \\left[ \\frac{1}{n} \\sum_{i=1}^n (X_i - \\overline{X})^2 \\right]^2}.
\\]

2. Compute the $t$-statistic

\\[ t = \\frac{\\hat{\\beta}_1 - \\beta_{1,0}}{ SE(\\hat{\\beta}_1) }. \\]

3. Given a two sided alternative ($H_1:\\beta_1 \\neq \\beta_{1,0}$) we reject at the $5\\%$ level if $|t^{act}| > 1.96$ or, equivalently, if the $p$-value is less than $0.05$.<br>  
Recall the definition of the $p$-value:  

  \\begin{align*}
    p \\text{-value} =& \\, \\text{Pr}_{H_0} \\left[ \\left| \\frac{ \\hat{\\beta}_1 - \\beta_{1,0} }{ SE(\\hat{\\beta}_1) } \\right| > \\left|        \\frac{ \\hat{\\beta}_1^{act} - \\beta_{1,0} }{ SE(\\hat{\\beta}_1) } \\right| \\right] \\\\
    =& \\, \\text{Pr}_{H_0} (|t| > |t^{act}|) \\\\
    \\approx& \\, 2 \\cdot \\Phi(-|t^{act}|).
  \\end{align*}  

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; The last transformation is due to the normal approximation for large samples.

</div>
')
```

```{r, eval = my_output == "latex", results='asis', echo=F, purl=F}
cat('\\begin{keyconcepts}[Testing Hypotheses regarding $\\beta_1$]{5.2}
For testing the hypothesis $H_0: \\beta_1 = \\beta_{1,0}$, we need to perform the following steps:\\newline
\\begin{enumerate}
\\item Compute the standard error of $\\hat{\\beta}_1$, $SE(\\hat{\\beta}_1)$
\\begin{align*}
SE(\\hat{\\beta}_1) = \\sqrt{ \\hat{\\sigma}^2_{\\hat{\\beta}_1} }, \\quad 
  \\hat{\\sigma}^2_{\\hat{\\beta}_1} = \\frac{1}{n} \\times \\frac{\\frac{1}{n-2} \\sum_{i=1}^n (X_i - \\overline{X})^2 \\hat{u_i}^2 }{ \\left[ \\frac{1}{n} \\sum_{i=1}^n (X_i - \\overline{X})^2 \\right]^2}.
\\end{align*}
\\item Compute the $t$-statistic
\\[ t = \\frac{\\hat{\\beta}_1 - \\beta_{1,0}}{ SE(\\hat{\\beta}_1) }. \\]
\\item Given a two sided alternative ($H_1:\\beta_1 \\neq \\beta_{1,0}$) we reject at the $5\\%$ level if $|t^{act}| > 1.96$ or, equivalently, if the $p$-value is less than $0.05$.\\newline  
Recall the definition of the $p$-value:  
\\begin{align*}
  p \\text{-value} =& \\, \\text{Pr}_{H_0} \\left[ \\left| \\frac{ \\hat{\\beta}_1 - \\beta_{1,0} }{ SE(\\hat{\\beta}_1) } \\right| > \\left|        \\frac{ \\hat{\\beta}_1^{act} - \\beta_{1,0} }{ SE(\\hat{\\beta}_1) } \\right| \\right] \\\\
                   =& \\, \\text{Pr}_{H_0} (|t| > |t^{act}|) \\\\
                   =& \\, 2 \\cdot \\Phi(-|t^{act}|).
\\end{align*}  
The last transformation is due to the normal approximation for large samples.
\\end{enumerate}
\\end{keyconcepts}
')
```

Consider again the OLS regression stored in `r ttcode('linear_model')` from Chapter \@ref(lrwor) that gave us the regression line

\[ \widehat{TestScore} \ = \underset{(9.47)}{698.9} - \underset{(0.49)}{2.28} \times STR \ , \ R^2=0.051 \ , \ SER=18.6. \]

Copy and execute the following code chunk if the above model object is not available in your working environment.

```{r, message=F, warning=F}
# load the `CASchools` dataset
data(CASchools)

# add student-teacher ratio
CASchools$STR <- CASchools$students/CASchools$teachers

# add average test-score
CASchools$score <- (CASchools$read + CASchools$math)/2

# estimate the model
linear_model <- lm(score ~ STR, data = CASchools)          
```

For testing a hypothesis concerning the slope parameter (the coefficient on $STR$), we need $SE(\hat{\beta}_1)$, the standard error of the respective point estimator. As is common in the literature, standard errors are presented in parentheses below the point estimates. 

Key Concept 5.1 reveals that it is rather cumbersome to compute the standard error and thereby the $t$-statistic by hand. The question you should be asking yourself right now is: can we obtain these values with minimum effort using `r ttcode('R')`? Yes, we can. Let us first use `r ttcode('summary()')` to get a summary on the estimated coefficients in `r ttcode('linear_model')`.

**Note**: Throughout the textbook, robust standard errors are reported. We consider it instructive to keep things simple at the beginning and thus start out with simple examples that do not allow for robust inference. Standard errors that are robust to heteroskedasticity are introduced in Chapter \@ref(hah) where we also demonstrate how they can be computed using `r ttcode('R')`. A discussion of heteroskedasticity-autocorrelation robust standard errors takes place in Chapter \@ref(eodce).

```{r, warning=F, message=F}
# print the summary of the coefficients to the console
summary(linear_model)$coefficients
```

The second column of the coefficients' summary reports $SE(\hat\beta_0)$ and $SE(\hat\beta_1)$. Also, in the third column `r ttcode('t value')`, we find $t$-statistics $t^{act}$ suitable for tests of the separate hypotheses $H_0: \beta_0=0$ and $H_0: \beta_1=0$. Furthermore, the output provides us with $p$-values corresponding to both tests against the two-sided alternatives $H_1:\beta_0\neq0$ respectively, $H_1:\beta_1\neq0$ in the fourth column of the table.

Let us have a closer look at the test of 

$$H_0: \beta_1=0 \ \ \ vs. \ \ \ H_1: \beta_1 \neq 0.$$ 

We have $$ t^{act} = \frac{-2.279808 - 0}{0.4798255} \approx - 4.75. $$

What does this tell us about the significance of the estimated coefficient? We reject the null hypothesis at the $5\%$ level of significance since $|t^{act}| > 1.96$. That is, the observed test statistic falls into the rejection region as $p\text{-value} = 2.78\cdot 10^{-6} < 0.05$. We conclude that the coefficient is significantly different from zero. In other words, we reject the hypothesis that the class size *has no influence* on the students test scores at the $5\%$ level.

Note that although the difference is negligible in the present case as we will see later, `r ttcode('summary()')` does not perform the normal approximation but calculates $p$-values using the $t$-distribution instead. Generally, the degrees of freedom of the assumed $t$-distribution are determined in the following manner:

$$ \text{DF} = n - k - 1, $$

where $n$ is the number of observations used to estimate the model and $k$ is the number of regressors, excluding the intercept. In our case, we have $n=420$ observations and the only regressor is $STR$ so $k=1$. The simplest way to determine the model degrees of freedom is:

```{r}
# determine residual degrees of freedom
linear_model$df.residual
```

Hence, for the assumed sampling distribution of $\hat\beta_1$ we have

$$\hat\beta_1 \sim t_{418}$$
such that the $p$-value for a two-sided significance test can be obtained by executing the following code:

```{r}
2 * pt(-4.751327, df = 418)
```

The result is very close to the value provided by `r ttcode('summary()')`. However since $n$ is sufficiently large one could just as well use the standard normal density to compute the $p$-value:

```{r}
2 * pnorm(-4.751327)
```

The difference is indeed negligible. These findings tell us that, if $H_0: \beta_1 = 0$ is true and we were to repeat the whole process of gathering observations and estimating the model, observing a $\hat\beta_1 \geq |-2.28|$ is very unlikely! 

Using `r ttcode('R')` we may visualize how such a statement is made when using the normal approximation. This reflects the principles depicted in figure 5.1 in the book. Do not let the following code chunk deter you: the code is somewhat longer than the usual examples and looks unappealing but there is a lot of repetition since color shadings and annotations are added on both tails of the normal distribution. We recommend to execute the code step by step in order to see how the graph is augmented with the annotations.


```{r, fig.align='center'}

# Plot the standard normal on the support [-6,6]
t <- seq(-6, 6, 0.01)

plot(x = t, 
     y = dnorm(t, 0, 1), 
     type = "l", 
     col = "steelblue", 
     lwd = 2, 
     yaxs = "i", 
     axes = F, 
     ylab = "", 
     main = expression("Calculating the p-value of a Two-sided Test when"
             ~ t^act ~ "=-4.75"), 
     cex.lab = 0.7,
     cex.main = 1)

tact <- -4.75

axis(1, at = c(0, -1.96, 1.96, -tact, tact), cex.axis = 0.7)

# Shade the critical regions using polygon():

# critical region in left tail
polygon(x = c(-6, seq(-6, -1.96, 0.01), -1.96),
        y = c(0, dnorm(seq(-6, -1.96, 0.01)), 0), 
        col = 'orange')

# critical region in right tail

polygon(x = c(1.96, seq(1.96, 6, 0.01), 6),
        y = c(0, dnorm(seq(1.96, 6, 0.01)), 0), 
        col = 'orange')

# Add arrows and texts indicating critical regions and the p-value
arrows(-3.5, 0.2, -2.5, 0.02, length = 0.1)
arrows(3.5, 0.2, 2.5, 0.02, length = 0.1)

arrows(-5, 0.16, -4.75, 0, length = 0.1)
arrows(5, 0.16, 4.75, 0, length = 0.1)

text(-3.5, 0.22, 
     labels = expression("0.025"~"="~over(alpha, 2)),
     cex = 0.7)
text(3.5, 0.22, 
     labels = expression("0.025"~"="~over(alpha, 2)),
     cex = 0.7)

text(-5, 0.18, 
     labels = expression(paste("-|",t[act],"|")), 
     cex = 0.7)
text(5, 0.18, 
     labels = expression(paste("|",t[act],"|")), 
     cex = 0.7)

# Add ticks indicating critical values at the 0.05-level, t^act and -t^act 
rug(c(-1.96, 1.96), ticksize  = 0.145, lwd = 2, col = "darkred")
rug(c(-tact, tact), ticksize  = -0.0451, lwd = 2, col = "darkgreen")
```


The $p$-Value is the area under the curve to left of $-4.75$ plus the area under the curve to the right of $4.75$. As we already know from the calculations above, this value is very small. 

## Confidence Intervals for Regression Coefficients {#cifrc}

As we already know, estimates of the regression coefficients $\beta_0$ and $\beta_1$ are subject to sampling uncertainty, see Chapter \@ref(lrwor). Therefore, we will *never* exactly estimate the true value of these parameters from sample data in an empirical application. However, we may construct confidence intervals for the intercept and the slope parameter.

A $95\%$ confidence interval for $\beta_i$ has two equivalent definitions:

- The interval is the set of values for which a hypothesis test to the level of $5\%$ cannot be rejected.
- The interval has a probability of $95\%$ to contain the true value of $\beta_i$. So in $95\%$ of all samples that could be drawn, the confidence interval will cover the true value of $\beta_i$.

We also say that the interval has a confidence level of $95\%$. The idea of the confidence interval is summarized in Key Concept 5.3.

```{r, eval = my_output == "html", results='asis', echo=F, purl=F}
cat('
<div class = "keyconcept" id="KC5.3">
<h3 class = "right"> Key Concept 5.3 </h3>
<h3 class = "left"> A Confidence Interval for $\\beta_i$ </h3>

Imagine you could draw all possible random samples of given size. The interval that contains the true value $\\beta_i$ in $95\\%$ of all samples is given by the expression

\\[ \\text{CI}_{0.95}^{\\beta_i} = \\left[ \\hat{\\beta}_i - 1.96 \\times SE(\\hat{\\beta}_i) \\, , \\, \\hat{\\beta}_i + 1.96 \\times SE(\\hat{\\beta}_i) \\right]. \\]

Equivalently, this interval can be seen as the set of null hypotheses for which a $5\\%$ two-sided hypothesis test does not reject.
</div>
')
```

```{r, eval = my_output == "latex", results='asis', echo=F, purl=F}
cat('\\begin{keyconcepts}[A Confidence Interval for $\\beta_i$]{5.3}
Imagine you could draw all possible random samples of given size. The interval that contains the true value $\\beta_i$ in $95\\%$ of all samples is given by the expression

\\[ \\text{CI}_{0.95}^{\\beta_i} = \\left[ \\hat{\\beta}_i - 1.96 \\times SE(\\hat{\\beta}_i) \\, , \\, \\hat{\\beta}_i + 1.96 \\times SE(\\hat{\\beta}_i) \\right]. \\]

Equivalently, this interval can be seen as the set of null hypotheses for which a $5\\%$ two-sided hypothesis test does not reject.
\\end{keyconcepts}
')
```

### Simulation Study: Confidence Intervals {-}

To get a better understanding of confidence intervals we conduct another simulation study. For now, assume that we have the following sample of $n=100$ observations on a single variable $Y$ where 

$$ Y_i \overset{i.i.d}{\sim} \mathcal{N}(5,25), \ i = 1, \dots, 100.$$

```{r, fig.align='center'}
# set seed for reproducibility
set.seed(4)

# generate and plot the sample data
Y <- rnorm(n = 100, 
           mean = 5, 
           sd = 5)

plot(Y, 
     pch = 19, 
     col = "steelblue")
```


We assume that the data is generated by the model 

$$ Y_i = \mu + \epsilon_i, $$

where $\mu$ is an unknown constant and we know that $\epsilon_i \overset{i.i.d.}{\sim} \mathcal{N}(0,25)$. In this model, the OLS estimator for $\mu$ is given by $$ \hat\mu = \overline{Y} = \frac{1}{n} \sum_{i=1}^n Y_i, $$ i.e., the sample average of the $Y_i$. It further holds that

$$ SE(\hat\mu) = \frac{\sigma_{\epsilon}}{\sqrt{n}} = \frac{5}{\sqrt{100}} $$

(see Chapter [2](#SVSSDASE)) a large-sample $95\%$ confidence interval for $\mu$ is then given by

\begin{equation} 
CI^{\mu}_{0.95} = \left[\hat\mu - 1.96 \times \frac{5}{\sqrt{100}} \ , \ \hat\mu + 1.96 \times \frac{5}{\sqrt{100}}  \right]. (\#eq:KI)
\end{equation}

It is fairly easy to compute this interval in `r ttcode('R')` by hand. The following code chunk generates a named vector containing the interval bounds:

```{r}
cbind(CIlower = mean(Y) - 1.96 * 5 / 10, CIupper = mean(Y) + 1.96 * 5 / 10)
```

Knowing that $\mu = 5$ we see that, for our example data, the confidence interval covers the true value.

As opposed to real world examples, we can use `r ttcode('R')` to get a better understanding of confidence intervals by repeatedly sampling data, estimating $\mu$ and computing the confidence interval for $\mu$ as in \@ref(eq:KI).

The procedure is as follows:

- We initialize the vectors `r ttcode('lower')` and `r ttcode('upper')` in which the simulated interval limits are to be saved. We want to simulate $10000$ intervals so both vectors are set to have this length.
- We use a `r ttcode('for()')` loop to sample $100$ observations from the $\mathcal{N}(5,25)$ distribution and compute $\hat\mu$ as well as the boundaries of the confidence interval in every iteration of the loop. 
- At last we join `r ttcode('lower')` and `r ttcode('upper')` in a matrix.

```{r}
# set seed
set.seed(1)

# initialize vectors of lower and upper interval boundaries
lower <- numeric(10000)
upper <- numeric(10000)

# loop sampling / estimation / CI
for(i in 1:10000) {
  
  Y <- rnorm(100, mean = 5, sd = 5)
  lower[i] <- mean(Y) - 1.96 * 5 / 10
  upper[i] <- mean(Y) + 1.96 * 5 / 10
  
}

# join vectors of interval bounds in a matrix
CIs <- cbind(lower, upper)
```

According to Key Concept 5.3 we expect that the fraction of the $10000$ simulated intervals saved in the matrix `r ttcode('CIs')` that contain the true value $\mu=5$ should be roughly $95\%$. We can easily check this using logical operators.

```{r}
mean(CIs[, 1] <= 5 & 5 <= CIs[, 2])
```

The simulation shows that the fraction of intervals covering $\mu=5$, i.e., those intervals for which $H_0: \mu = 5$ cannot be rejected is close to the theoretical value of $95\%$. 

Let us draw a plot of the first $100$ simulated confidence intervals and indicate those which *do not* cover the true value of $\mu$. We do this via horizontal lines representing the confidence intervals on top of each other.


```{r, fig.align='center', fig.height=5, fig.width=4}
# identify intervals not covering mu
# (4 intervals out of 100)
ID <- which(!(CIs[1:100, 1] <= 5 & 5 <= CIs[1:100, 2]))

# initialize the plot
plot(0, 
     xlim = c(3, 7), 
     ylim = c(1, 100), 
     ylab = "Sample", 
     xlab = expression(mu), 
     main = "Confidence Intervals")

# set up color vector
colors <- rep(gray(0.6), 100)
colors[ID] <- "red"

# draw reference line at mu=5
abline(v = 5, lty = 2)

# add horizontal bars representing the CIs
for(j in 1:100) {
  
  lines(c(CIs[j, 1], CIs[j, 2]), 
        c(j, j), 
        col = colors[j], 
        lwd = 2)
  
}
```


For the first $100$ samples, the true null hypothesis is rejected in four cases so these intervals do not cover $\mu=5$. We have indicated the intervals which lead to a rejection of the null red.

Let us now come back to the example of test scores and class sizes. The regression model from Chapter \@ref(lrwor) is stored in `r ttcode('linear_model')`. An easy way to get $95\%$ confidence intervals for $\beta_0$ and $\beta_1$, the coefficients on `r ttcode('(intercept)')` and `r ttcode('STR')`, is to use the function `r ttcode('confint()')`. We only have to provide a fitted model object as an input to this function. The confidence level is set to $95\%$ by default but can be modified by setting the argument `r ttcode('level')`, see `r ttcode('?confint')`. 

```{r}
# compute 95% confidence interval for coefficients in 'linear_model'
confint(linear_model)
```

Let us check if the calculation is done as we expect it to be for $\beta_1$, the coefficient on `r ttcode('STR')`.

```{r}
# compute 95% confidence interval for coefficients in 'linear_model' by hand
lm_sum <- summary(linear_model)

c("lower" = lm_sum$coef[2,1] - qt(0.975, df = lm_sum$df[2]) * lm_sum$coef[2, 2],
  "upper" = lm_sum$coef[2,1] + qt(0.975, df = lm_sum$df[2]) * lm_sum$coef[2, 2])
```

The upper and the lower bounds coincide. We have used the $0.975$-quantile of the $t_{418}$ distribution to get the exact result reported by `r ttcode("confint")`.  Obviously, this interval *does not* contain the value zero which, as we have already seen in the previous section, leads to the rejection of the null hypothesis $\beta_{1,0} = 0$.

## Regression when X is a Binary Variable {#rwxiabv}

Instead of using a continuous regressor $X$, we might be interested in running the regression 

\[ Y_i = \beta_0 + \beta_1 D_i + u_i, \tag{5.2} \]

where $D_i$ is a binary variable, a so-called *dummy variable*. For example, we may define $D_i$ as follows:

\[ D_i = \begin{cases}
        1 \ \ \text{if $STR$ in $i^{th}$ school district < 20} \\
        0 \ \ \text{if $STR$ in $i^{th}$ school district $\geq$ 20}. \\
      \end{cases} \tag{5.3} \]

The regression model now is

\[ TestScore_i = \beta_0 + \beta_1 D_i + u_i. \tag{5.4} \]

Let us see how these data look like in a scatter plot:


```{r, eval=F}
# Create the dummy variable as defined above
CASchools$D <- CASchools$STR < 20

# Compute the average score when D=1 (low  STR)
mean.score.for.D.1 <- mean(CASchools$score[CASchools$D == TRUE])

# Compute the average score when D=0 (high STR)
mean.score.for.D.0 <- mean(CASchools$score[CASchools$D == FALSE])


plot( CASchools$score ~ CASchools$D,        # provide the data to be plotted
      pch = 19,                             # use filled circles as plot symbols
      cex = 0.5,                            # set size of plot symbols to 0.5
      col = "Steelblue",                    # set the symbols' color to "Steelblue"
      xlab = expression(D[i]),              # Set title and axis names
      ylab = "Test Score",
      main = "Dummy Regression")

# Add the average for each group
points( y = mean.score.for.D.0, x = 0,   col="red", pch = 19)
points( y = mean.score.for.D.1, x = 1,   col="red", pch = 19)

```


```{r, fig.align='center', echo=F}
# Create the dummy variable as defined above
CASchools$D <- CASchools$STR < 20

# estimte the dummy regression
dummy_model <- lm(score ~ D, data = CASchools)

# Plot the data
plot(CASchools$D, CASchools$score, 
     pch = 20, cex = 0.5 , col = "Steelblue",
     xlab = expression(D[i]), ylab = "Test Score",
     main = "Dummy Regression")
#
points(CASchools$D, predict(dummy_model), col = "red", pch = 20)
```


With $D$ as the regressor, it is not useful to think of $\beta_1$ as a slope parameter since $D_i \in \{0,1\}$, i.e., we only observe two discrete values instead of a continuum of regressor values. There is no continuous line depicting the conditional expectation function $E(TestScore_i | D_i)$ since this function is solely defined for $x$-positions $0$ and $1$. 

Therefore, the interpretation of the coefficients in this regression model is as follows:

- $E(Y_i | D_i = 0) = \beta_0$, so $\beta_0$ is the expected test score in districts where $D_i=0$ and $STR$ is above $20$.

- $E(Y_i | D_i = 1) = \beta_0 + \beta_1$ or, using the result above, $\beta_1 = E(Y_i | D_i = 1) - E(Y_i | D_i = 0)$. Thus, $\beta_1$ is *the difference in group-specific expectations*, i.e., the difference in expected test score between districts with $STR < 20$ and those with $STR \geq 20$. 

We will now use `r ttcode('R')` to estimate the dummy regression model as defined by the equations (<a href="#mjx-eqn-5.2">5.2</a>) and (<a href="#mjx-eqn-5.3">5.3</a>) .


```{r}
# estimate the dummy regression model
dummy_model <- lm(score ~ D, data = CASchools)
summary(dummy_model)
```


```{block2, precision, type='rmdnote'}
<tt>summary()</tt> function indicates that the $p$-value for testing whether the coefficient on <tt>(Intercept)</tt> is zero is reported to be <tt>< 2e-16</tt>. This scientific notation implies that the $p$-value is extremely tiny, smaller than $\frac{2}{10^{16}}$. This small value is a result of the limitation of computers to handle extremely minute numbers. In fact, $\frac{2}{10^{16}}$ is the smallest possble number <tt>R</tt> can work with.
```

The vector `r ttcode("CASchools$D")` has the type `r ttcode("logical")` (to see this, use `r ttcode("typeof(CASchools$D)")`) which is shown in the output of `r ttcode("summary(dummy_model)")`: the label `r ttcode("DTRUE")` states that all entries `r ttcode("TRUE")` are coded as `r ttcode("1")` and all entries `r ttcode("FALSE")` are coded as `r ttcode("0")`. Thus, the interpretation of the coefficient `r ttcode("DTRUE")` is same as stated above for $\beta_1$.  

One can see that the expected test score in districts with $STR < 20$ ($D_i = 1$) is predicted to be $650.1 + 7.17 = 657.27$ while districts with $STR \geq 20$ ($D_i = 0$) are expected to have an average test score of only $650.1$.

Group specific predictions can be added to the plot by execution of the following code chunk.
```{r, eval=F}
# add group specific predictions to the plot
points(x = CASchools$D, 
       y = predict(dummy_model), 
       col = "red", 
       pch = 20)
```

Here we use the function `r ttcode('predict()')` to obtain estimates of the group specific means. The red dots represent these sample group averages. Accordingly, $\hat{\beta}_1 = 7.17$ can be seen as the difference in group averages.

`r ttcode('summary(dummy_model)')` also answers the question whether there is a statistically significant difference in group means. This in turn would support the hypothesis that students perform differently when they are taught in small classes. We can assess this by a two-tailed test of the hypothesis $H_0: \beta_1 = 0$. Conveniently, the $t$-statistic and the corresponding $p$-value for this test are computed by `r ttcode('summary()')`.

Since `r ttcode('t value')` $= 3.88 > 1.96$, we reject the null hypothesis at the $5\%$ level of significance. The same conclusion results when using the $p$-value, which reports significance up to the $0.00012\%$ level. 

As done with `r ttcode("linear_model")`, we may alternatively use the function `r ttcode('confint()')` to compute a $95\%$ confidence interval for the true difference in means and see if the hypothesized value is an element of this confidence set. 

```{r}
# confidence intervals for coefficients in the dummy regression model
confint(dummy_model)
```

We reject the hypothesis that there is no difference between group means at the $5\%$ significance level since $\beta_{1,0} = 0$ lies outside of $[3.54, 10.8]$, the $95\%$ confidence interval for the coefficient on $D$.

## Heteroskedasticity and Homoskedasticity {#hah}

All inference made in the previous chapters relies on the assumption that the error variance does not vary as regressor values change. But this will often not be the case in empirical applications.

```{r, eval = my_output == "html", results='asis', echo=F, purl=F}
cat('
<div class = "keyconcept" id="KC5.4">
<h3 class = "right"> Key Concept 5.4 </h3>          
<h3 class = "left"> Heteroskedasticity and Homoskedasticity </h3>
- The error term of our regression model is homoskedastic if the variance of the conditional distribution of $u_i$ given $X_i$, $Var(u_i|X_i=x)$, is constant *for all* observations in our sample:
\\[ \\text{Var}(u_i|X_i=x) = \\sigma^2 \\ \\forall \\ i=1,\\dots,n. \\]

- If instead there is dependence of the conditional variance of $u_i$ on $X_i$, the error term is said to be heteroskedastic. We then write
\\[ \\text{Var}(u_i|X_i=x) = \\sigma_i^2 \\ \\forall \\ i=1,\\dots,n. \\]

- Homoskedasticity is a *special case* of heteroskedasticity.
</div>
')
```

```{r, eval = my_output == "latex", results='asis', echo=F, purl=F}
cat('\\begin{keyconcepts}[Heteroskedasticity and Homoskedasticity]{5.4}
\\begin{itemize}
\\item The error term of our regression model is homoskedastic if the variance of the conditional distribution of $u_i$ given $X_i$, $Var(u_i|X_i=x)$, is constant \\textit{for all} observations in our sample:
\\[ \\text{Var}(u_i|X_i=x) = \\sigma^2 \\ \\forall \\ i=1,\\dots,n. \\]

\\item If instead there is dependence of the conditional variance of $u_i$ on $X_i$, the error term is said to be heteroskedastic. We then write
\\[ \\text{Var}(u_i|X_i=x) = \\sigma_i^2 \\ \\forall \\ i=1,\\dots,n. \\]

\\item Homoskedasticity is a \\textit{special case} of heteroskedasticity.
\\end{itemize}
\\end{keyconcepts}
')
```

For a better understanding of heteroskedasticity, we generate some bivariate heteroskedastic data, estimate a linear regression model and then use box plots to depict the conditional distributions of the residuals.


```{r, fig.align='center', warning=FALSE}
# load scales package for adjusting color opacities
library(scales)

# generate some heteroskedastic data:

# set seed for reproducibility
set.seed(123) 

# set up vector of x coordinates
x <- rep(c(10, 15, 20, 25), each = 25)

# initialize vector of errors
e <- c()

# sample 100 errors such that the variance increases with x
e[1:25] <- rnorm(25, sd = 10)
e[26:50] <- rnorm(25, sd = 15)
e[51:75] <- rnorm(25, sd = 20)
e[76:100] <- rnorm(25, sd = 25)

# set up y
y <- 720 - 3.3 * x + e

# Estimate the model 
mod <- lm(y ~ x)

# Plot the data
plot(x = x, 
     y = y, 
     main = "An Example of Heteroskedasticity",
     xlab = "Student-Teacher Ratio",
     ylab = "Test Score",
     cex = 0.5, 
     pch = 19, 
     xlim = c(8, 27), 
     ylim = c(600, 710))

# Add the regression line to the plot
abline(mod, col = "darkred")

# Add boxplots to the plot
boxplot(formula = y ~ x, 
        add = TRUE, 
        at = c(10, 15, 20, 25), 
        col = alpha("gray", 0.4), 
        border = "black")
```


We have used the `r ttcode("formula")` argument `r ttcode("y ~ x")` in `r ttcode("boxplot()")` to specify that we want to split up the vector `r ttcode("y")` into groups according to `r ttcode("x")`. `r ttcode("boxplot(y ~ x)")` generates a boxplot for each of the groups in `r ttcode("y")` defined by `r ttcode("x")`.

For this artificial data it is clear that the conditional error variances differ. Specifically, we observe that the variance in test scores (and therefore the variance of the errors committed) *increases* with the student teacher ratio.

### A Real-World Example for Heteroskedasticity {-}

Think about the economic value of education: if there were no expected economic value-added to receiving university education, you probably would not be reading this script right now. A starting point to empirically verify such a relation is to have data on working individuals. More precisely, we need data on wages and education of workers in order to estimate a model like

$$ wage_i = \beta_0 + \beta_1 \cdot education_i + u_i. $$

What can be presumed about this relation? It is likely that, on average, higher educated workers earn more than workers with less education, so we expect to estimate an upward sloping regression line. Also, it seems plausible that earnings of better educated workers have a higher dispersion than those of low-skilled workers: solid education is not a guarantee for a high salary so even highly qualified workers take on low-income jobs. However, they are more likely to meet the requirements for the well-paid jobs than workers with less education for whom opportunities in the labor market are much more limited.

To verify this empirically we may use real data on hourly earnings and the number of years of education of employees. Such data can be found in `r ttcode('CPSSWEducation')`. This data set is part of the package `r ttcode('AER')` and comes from the Current Population Survey (CPS) which is conducted periodically by the [Bureau of Labor Statistics](http://www.bls.gov/) in the United States. 

The subsequent code chunks demonstrate how to import the data into `r ttcode('R')` and how to produce a plot in the fashion of Figure 5.3 in the book.

```{r, fig.align='center'}
# load package and attach data
library(AER)
data("CPSSWEducation")
attach(CPSSWEducation)

# get an overview
summary(CPSSWEducation)

# estimate a simple regression model
labor_model <- lm(earnings ~ education)

# plot observations and add the regression line
plot(education, 
     earnings,
     xlab="Earnings",
     ylab="Education",
     ylim = c(0, 150))

abline(labor_model, 
       col = "steelblue", 
       lwd = 2)
```

The plot reveals that the mean of the distribution of earnings increases with the level of education. This is also supported by a formal analysis: the estimated regression model stored in `r ttcode('labor_model')` shows that there is a positive relation between years of education and earnings.

```{r}
# print the contents of labor_model to the console
labor_model
```

The estimated regression equation states that, on average, an additional year of education increases a worker's hourly earnings by about $\$ 1.47$. Once more we use `r ttcode('confint()')` to obtain a $95\%$ confidence interval for both regression coefficients.

```{r}
# compute a 95% confidence interval for the coefficients in the model
confint(labor_model)
```

Since the interval is $[1.33, 1.60]$ we can reject the hypothesis that the coefficient on `r ttcode('education')` is zero at the $5\%$ level.

Furthermore, the plot indicates that there is heteroskedasticity: if we assume the regression line to be a reasonably good representation of the conditional mean function $E(earnings_i\vert education_i)$, the dispersion of hourly earnings around that function clearly increases with the level of education, i.e., the variance of the distribution of earnings increases. In other words: the variance of the errors (the errors made in explaining earnings by education) increases with education so that the regression errors are heteroskedastic.

This example makes a case that the assumption of homoskedasticity is doubtful in economic applications. Should we care about heteroskedasticity? Yes, we should. As explained in the next section, heteroskedasticity can have serious negative consequences in hypothesis testing, if we ignore it. 

### Should We Care About Heteroskedasticity? {-}

To answer the question whether we should worry about heteroskedasticity being present, consider the variance of $\hat\beta_1$ under the assumption of homoskedasticity. In this case we have

$$ \sigma^2_{\hat\beta_1} = \frac{\sigma^2_u}{n \cdot \sigma^2_X}, \tag{5.5} $$

which is a simplified version of the general equation (<a href="#mjx-eqn-4.1">4.1</a>) presented in Key Concept 4.4. See Appendix 5.1 of the book for details on the derivation. `r ttcode('summary()')` estimates (<a href="#mjx-eqn-5.5">5.5</a>) by

$$ \overset{\sim}{\sigma}^2_{\hat\beta_1} = \frac{SER^2}{\sum_{i=1}^n (X_i - \overline{X})^2} \ \ \text{where} \ \ SER=\frac{1}{n-2} \sum_{i=1}^n \hat u_i^2. $$

Thus `r ttcode('summary()')` estimates the *homoskedasticity-only* standard error

\[ \sqrt{ \overset{\sim}{\sigma}^2_{\hat\beta_1} } = \sqrt{ \frac{SER^2}{\sum_{i=1}^n(X_i - \overline{X})^2} }. \]

This is in fact an estimator for the standard deviation of the estimator $\hat{\beta}_1$ that is *inconsistent* for the true value $\sigma^2_{\hat\beta_1}$ when there is heteroskedasticity. The implication is that $t$-statistics computed in the manner of Key Concept 5.1 do not follow a standard normal distribution, even in large samples. This issue may invalidate inference when using the previously treated tools for hypothesis testing: we should be cautious when making statements about the significance of regression coefficients on the basis of $t$-statistics as computed by `r ttcode('summary()')` or confidence intervals produced by `r ttcode('confint()')` if it is doubtful for the assumption of homoskedasticity to hold! 

We will now use `r ttcode('R')` to compute the homoskedasticity-only standard error for $\hat{\beta}_1$ in the regression model `r ttcode('labor_model')` by hand and see that it matches the value produced by `r ttcode('summary()')`.

```{r}
# Store model summary in 'model'
model <- summary(labor_model)

# Extract the standard error of the regression from model summary
SER <- model$sigma

# Compute the variation in 'education'
V <- (nrow(CPSSWEducation)-1) * var(education)

# Compute the standard error of the slope parameter's estimator and print it
SE.beta_1.hat <- sqrt(SER^2/V)
SE.beta_1.hat

# Use logical operators to see if the value computed by hand matches the one provided 
# in mod$coefficients. Round estimates to four decimal places
round(model$coefficients[2, 2], 4) == round(SE.beta_1.hat, 4)
```

Indeed, the estimated values are equal.

### Computation of Heteroskedasticity-Robust Standard Errors {-}

Consistent estimation of $\sigma_{\hat{\beta}_1}$ under heteroskedasticity is granted when the following *robust* estimator is used.

\[ SE(\hat{\beta}_1) = \sqrt{ \frac{1}{n} \cdot \frac{ \frac{1}{n} \sum_{i=1}^n (X_i - \overline{X})^2 \hat{u}_i^2 }{ \left[ \frac{1}{n} \sum_{i=1}^n (X_i - \overline{X})^2  \right]^2}. } \tag{5.6} \]

Standard error estimates computed this way are also referred to as [Eicker-Huber-White standard errors](https://en.wikipedia.org/wiki/Heteroscedasticity-consistent_standard_errors), the most frequently cited paper on this is @white1980.

It can be quite cumbersome to do this calculation by hand. Luckily certain R functions exist, serving this purpose. An example of such function is `r ttcode('vcovHC()')` is part of the package `r ttcode("sandwich")`.^[The package `r ttcode('sandwich')` is a dependency of the package `r ttcode('AER')`, meaning that it is attached automatically if you load `r ttcode('AER')`.] This function can compute a variety of standard errors. The one brought forward in (<a href="#mjx-eqn-5.6">5.6</a>) is computed when the argument `r ttcode('type')` is set to `r ttcode('"HC0"')`. Most of the examples presented in the book rely on a slightly different formula which is the default in the statistics package *STATA*:

\begin{align}
SE(\hat{\beta}_1)_{HC1} = \sqrt{ \frac{1}{n} \cdot \frac{ \frac{1}{n-2} \sum_{i=1}^n (X_i - \overline{X})^2 \hat{u}_i^2 }{ \left[ \frac{1}{n} \sum_{i=1}^n (X_i - \overline{X})^2  \right]^2}.} (\#eq:hc1)
\end{align}

The difference is that we multiply by $\frac{1}{n-2}$ in the numerator of \@ref(eq:hc1). This is a degrees of freedom correction and was considered by @mackinnon1985. To get `r ttcode('vcovHC()')` to use \@ref(eq:hc1), we have to set `r ttcode('type = "HC1"')`.

Let us now compute robust standard error estimates for the coefficients in `r ttcode('linear_model')`.

```{r}
# compute heteroskedasticity-robust standard errors
vcov <- vcovHC(linear_model, type = "HC1")
vcov
```

The output of `r ttcode('vcovHC()')` is the variance-covariance matrix of coefficient estimates. We are interested in the square root of the diagonal elements of this matrix, i.e., the standard error estimates.

```{block2, vcovmatrix, type='rmdknit'}
When we have k > 1 regressors, writing down the equations for a regression model becomes very messy. A more convenient way to denote and estimate so-called multiple regression models (see Chapter 6) is by using matrix algebra. This is why functions like <tt>vcovHC()</tt> produce matrices. In the simple linear regression model, the variances and covariances of the estimators can be gathered in the symmetric variance-covariance matrix

\begin{equation}
\text{Var}
  \begin{pmatrix}
    \hat\beta_0 \\
    \hat\beta_1
  \end{pmatrix} = 
\begin{pmatrix}
  \text{Var}(\hat\beta_0) & \text{Cov}(\hat\beta_0,\hat\beta_1) \\
\text{Cov}(\hat\beta_0,\hat\beta_1) & \text{Var}(\hat\beta_1)
\end{pmatrix},
\end{equation}

so <tt>vcovHC()</tt> gives us $\widehat{\text{Var}}(\hat\beta_0)$, $\widehat{\text{Var}}(\hat\beta_1)$ and $\widehat{\text{Cov}}(\hat\beta_0,\hat\beta_1)$, but most of the time we are interested in the diagonal elements of the estimated matrix.

```

```{r}
# compute the square root of the diagonal elements in vcov
robust_se <- sqrt(diag(vcov))
robust_se
```

Now assume we want to generate a coefficient summary as provided by `r  ttcode('summary()')` but with *robust* standard errors of the coefficient estimators, robust $t$-statistics and corresponding $p$-values for the regression model `r ttcode("linear_model")`. This can be done using `r  ttcode('coeftest()')` from the package `r  ttcode('lmtest')`, see `?coeftest`. Further we specify in the argument `r  ttcode('vcov.')` that `r ttcode("vcov")`, the Eicker-Huber-White estimate of the variance matrix we have computed before, should be used.

```{r}
# we invoke the function `coeftest()` on our model
coeftest(linear_model, vcov. = vcov)
```

We see that the values reported in the column `r  ttcode('Std.Error')` are equal those from `r  ttcode('sqrt(diag(vcov))')`.

How severe are the implications of using homoskedasticity-only standard errors in the presence of heteroskedasticity? The answer is: it depends. As mentioned above we face the risk of drawing wrong conclusions when conducting significance tests. <br> Let us illustrate this by generating another example of a heteroskedastic data set and using it to estimate a simple regression model. We take

$$ Y_i = \beta_1 \cdot X_i + u_i \ \ , \ \ u_i \overset{i.i.d.}{\sim} \mathcal{N}(0,0.36 \cdot X_i^2)  $$ 

with $\beta_1=1$ as the data generating process. Clearly, the assumption of homoskedasticity is violated here since the variance of the errors is a nonlinear, increasing function of $X_i$ but the errors have zero mean and are i.i.d. such that the assumptions made in Key Concept 4.3 are not violated. As before, we are interested in estimating $\beta_1$.

```{r}
set.seed(905)

# generate heteroskedastic data 
X <- 1:500
Y <- rnorm(n = 500, mean = X, sd = 0.6 * X)

# estimate a simple regression model
reg <- lm(Y ~ X)
```

We plot the data and add the regression line.


```{r, fig.align='center'}
# plot the data
plot(x = X, y = Y, 
     pch = 19, 
     col = "steelblue", 
     cex = 0.8)

# add the regression line to the plot
abline(reg, 
       col = "red", 
       lwd = 1.5)
legend("topleft","Regression line",col="red",lwd=1.5)
```


The plot shows that the data are heteroskedastic as the variance of $Y$ grows with $X$. We next conduct a significance test of the (true) null hypothesis $H_0: \beta_1 = 1$ twice, once using the homoskedasticity-only standard error formula and once with the robust version (<a href="#mjx-eqn-5.6">5.6</a>). An easy way to do this in `r  ttcode('R')` is the function `r  ttcode('linearHypothesis()')` from the package `r  ttcode('car')`, see `?linearHypothesis`. It allows to test linear hypotheses about parameters in linear models in a similar way as done with a $t$-statistic and offers various robust covariance matrix estimators. We test by comparing the tests' $p$-values to the significance level of $5\%$.   

```{block2, linearhypothesis, type='rmdknit'}
<tt>linearHypothesis()</tt> computes a test statistic that follows an $F$-distribution under the null hypothesis. We will not focus on the details of the underlying theory. In general, the idea of the $F$-test is to compare the fit of different models. When testing a hypothesis about a *single* coefficient using an $F$-test, one can show that the test statistic is simply the square of the corresponding $t$-statistic:

$$F = t^2 = \left(\frac{\hat\beta_i - \beta_{i,0}}{SE(\hat\beta_i)}\right)^2 \sim F_{1,n-k-1}.$$

In <tt>linearHypothesis()</tt>, there are different ways to specify the hypothesis to be tested, e.g., using a vector of the type <tt>character</tt> (as done in the next code chunk), see <tt>?linearHypothesis</tt> for alternatives. The function returns an object of class <tt>anova</tt> which contains further information on the test that can be accessed using the <tt>$</tt> operator.
```


```{r}
# test hypthesis using the default standard error formula
linearHypothesis(reg, hypothesis.matrix = "X = 1")$'Pr(>F)'[2] < 0.05

# test hypothesis using the robust standard error formula
linearHypothesis(reg, hypothesis.matrix = "X = 1",
                      white.adjust = "hc1")$'Pr(>F)'[2] < 0.05
```

This is a good example of what can go wrong if we ignore heteroskedasticity: for the data set at hand the default method rejects the null hypothesis $\beta_1 = 1$ although it is true. When using the robust standard error formula the test does not reject the null. Of course, we could think this might just be a coincidence and both tests do equally well in maintaining the type I error rate of $5\%$. This can be further investigated by computing *Monte Carlo* estimates of the rejection frequencies of both tests on the basis of a large number of random samples. We proceed as follows:

- initialize vectors `r  ttcode('t')` and `r  ttcode('t.rob')`.
- Using a `r  ttcode('for()')` loop, we generate $10000$ heteroskedastic random samples of size $1000$, estimate the regression model and check whether the tests falsely reject the null at the level of $5\%$ using comparison operators. The results are stored in the respective vectors `r  ttcode('t')` and `r  ttcode('t.rob')`.
- After the simulation, we compute the fraction of false rejections for both tests.

```{r, cache=T, echo=-1}
set.seed(905)

# initialize vectors t and t.rob
t <- c()
t.rob <- c()

# loop sampling and estimation
for (i in 1:10000) {
  
  # sample data
  X <- 1:1000
  Y <- rnorm(n = 1000, mean = X, sd = 0.6 * X)

  # estimate regression model
  reg <- lm(Y ~ X)

  # homoskedasdicity-only significance test
  t[i] <- linearHypothesis(reg, "X = 1")$'Pr(>F)'[2] < 0.05

  # robust significance test
  t.rob[i] <- linearHypothesis(reg, "X = 1", white.adjust = "hc1")$'Pr(>F)'[2] < 0.05

}

# compute the fraction of false rejections
round(cbind(t = mean(t), t.rob = mean(t.rob)), 3)
```

These results reveal the increased risk of falsely rejecting the null using the homoskedasticity-only standard error for the testing problem at hand: with the common standard error, $`r round(sum(t)/length(t)*100,3)`\%$ of all tests falsely reject the null hypothesis. In contrast, with the robust test statistic we are closer to the nominal level of $5\%$.
 
## The Gauss-Markov Theorem

When estimating regression models, we know that the results of the estimation procedure are random. However, when using unbiased estimators, at least on average, we estimate the true parameter. When comparing different unbiased estimators, it is therefore interesting to know which one has the highest precision: being aware that the likelihood of estimating the *exact* value of the parameter of interest is $0$ in an empirical application, we want to make sure that the likelihood of obtaining an estimate very close to the true value is as high as possible. This means we want to use the estimator with the lowest variance of all unbiased estimators, provided we care about unbiasedness. The Gauss-Markov theorem states that, in the class of conditionally unbiased linear estimators, the OLS estimator has this property under certain conditions.

```{r, eval = my_output == "html", results='asis', echo=F, purl=F}
cat('
<div class = "keyconcept" id="KC5.5">
<h3 class = "right"> Key Concept 5.5 </h3>          
<h3 class = "left"> The Gauss-Markov Theorem for $\\hat{\\beta}_1$ </h3>
Suppose that the assumptions made in Key Concept 4.3 hold and that the errors are *homoskedastic*. The OLS estimator is the best (in the sense of smallest variance) linear conditionally unbiased estimator (BLUE) in this setting.

Let us have a closer look at what this means:

- Estimators of $\\beta_1$ that are linear functions of the $Y_1, \\dots, Y_n$ and that are unbiased conditionally on the regressor $X_1, \\dots, X_n$ can be written as \\[ \\overset{\\sim}{\\beta}_1 = \\sum_{i=1}^n a_i Y_i. \\] where the $a_i$ are weights that are allowed to depend on the $X_i$ but *not* on the $Y_i$. 

- We already know that $\\overset{\\sim}{\\beta}_1$ has a sampling distribution: $\\overset{\\sim}{\\beta}_1$ is a linear function of the $Y_i$ which are random variables. If now \\[ E(\\overset{\\sim}{\\beta}_1 | X_1, \\dots, X_n) = \\beta_1, \\] $\\overset{\\sim}{\\beta}_1$ is a linear unbiased estimator of $\\beta_1$, conditionally on the $X_1, \\dots, X_n$.

- We may ask if $\\overset{\\sim}{\\beta}_1$ is also the *best* estimator in this class, i.e., the most efficient one of all linear conditionally unbiased estimators where "most efficient" means smallest variance. The weights $a_i$ play an important role here and it turns out that OLS uses just the right weights to have the BLUE property. 
</div>
')
```

```{r, eval = my_output == "latex", results='asis', echo=F, purl=F}
cat('\\begin{keyconcepts}[The Gauss-Markov Theorem for $\\hat{\\beta}_1$]{5.5}
Suppose that the assumptions made in Key Concept 4.3 hold and that the errors are \\textit{homoskedastic}. The OLS estimator is the best (in the sense of smallest variance) linear conditionally unbiased estimator (BLUE) in this setting.\\newline

Let us have a closer look at what this means:\\newline

\\begin{itemize}
\\item Estimators of $\\beta_1$ that are linear functions of the $Y_1, \\dots, Y_n$ and that are unbiased conditionally on the regressor $X_1, \\dots, X_n$ can be written as \\[ \\overset{\\sim}{\\beta}_1 = \\sum_{i=1}^n a_i Y_i. \\] where the $a_i$ are weights that are allowed to depend on the $X_i$ but \\textit{not} on the $Y_i$. 

\\item We already know that $\\overset{\\sim}{\\beta}_1$ has a sampling distribution: $\\overset{\\sim}{\\beta}_1$ is a linear function of the $Y_i$ which are random variables. If now \\[ E(\\overset{\\sim}{\\beta}_1 | X_1, \\dots, X_n) = \\beta_1, \\] $\\overset{\\sim}{\\beta}_1$ is a linear unbiased estimator of $\\beta_1$, conditionally on the $X_1, \\dots, X_n$.

\\item We may ask if $\\overset{\\sim}{\\beta}_1$ is also the \\textit{best} estimator in this class, i.e., the most efficient one of all linear conditionally unbiased estimators where most efficient means smallest variance. The weights $a_i$ play an important role here and it turns out that OLS uses just the right weights to have the BLUE property. 
\\end{itemize}
\\end{keyconcepts}
')
```

### Simulation Study: BLUE Estimator {-}

Consider the case of a regression of $Y_i,\dots,Y_n$ only on a constant. Here, the $Y_i$ are assumed to be a random sample from a population with mean $\mu$ and variance $\sigma^2$. The OLS estimator in this model is simply the sample mean, see Chapter \@ref(potsm).

\begin{equation}
\hat{\beta}_1 = \sum_{i=1}^n \underbrace{\frac{1}{n}}_{=a_i} Y_i. (\#eq:bluemean)
\end{equation}

Clearly, each observation is weighted by 

$$a_i = \frac{1}{n},$$

and we also know that $\text{Var}(\hat{\beta}_1)=\frac{\sigma^2}{n}$.

We now use `r  ttcode('R')` to conduct a simulation study that demonstrates what happens to the variance of \@ref(eq:bluemean) if different weights \[ w_i = \frac{1 \pm \epsilon}{n} \] are assigned to either half of the sample $Y_1, \dots, Y_n$ instead of using $\frac{1}{n}$, the OLS weights.  


```{r, fig.align='center', cache=T}
# set sample size and number of repetitions
n <- 100      
reps <- 1e5

# choose epsilon and create a vector of weights as defined above
epsilon <- 0.8
w <- c(rep((1 + epsilon) / n, n / 2), 
       rep((1 - epsilon) / n, n / 2) )

# draw a random sample y_1,...,y_n from the standard normal distribution, 
# use both estimators 1e5 times and store the result in the vectors 'ols' and 
# 'weightedestimator'

ols <- rep(NA, reps)
weightedestimator <- rep(NA, reps)

for (i in 1:reps) {
  
  y <- rnorm(n)
  ols[i] <- mean(y)
  weightedestimator[i] <- crossprod(w, y)
  
}

# plot kernel density estimates of the estimators' distributions: 

# OLS
plot(density(ols), 
     col = "red", 
     lwd = 2, 
     main = "Density of OLS and Weighted Estimator",
     xlab = "Estimates")

# weighted
lines(density(weightedestimator), 
      col = "steelblue", 
      lwd = 2) 

# add a dashed line at 0 and add a legend to the plot
abline(v = 0, lty = 2)

legend('topright', 
       c("OLS", "Weighted"), 
       col = c("red", "steelblue"), 
       lwd = 2)
```


What conclusion can we draw from the result?

- Both estimators seem to be unbiased: the means of their estimated distributions are zero.
- The estimator using weights that deviate from those implied by OLS is less efficient than the OLS estimator: there is higher dispersion when weights are $w_i = \frac{1 \pm 0.8}{100}$ instead of $w_i=\frac{1}{100}$ as required by the OLS solution. 

Hence, the simulation results support the Gauss-Markov Theorem.

## Using the t-Statistic in Regression when the Sample Size Is Small

The three OLS assumptions discussed in Chapter \@ref(lrwor) (see Key Concept 4.3) are the foundation for the results on the large sample distribution of the OLS estimators in the simple regression model. What can be said about the distribution of the estimators and their $t$-statistics when the sample size is small and the population distribution of the data is unknown? Provided that the three least squares assumptions hold and the errors are normally distributed and homoskedastic (we refer to these conditions as the homoskedastic normal regression assumptions), we have normally distributed estimators and $t$-distributed test statistics in small samples. 

Recall the [definition](#thetdist) of a $t$-distributed variable

$$ \frac{Z}{\sqrt{W/M}} \sim t_M,$$ 

where $Z$ is a standard normal random variable, $W$ is $\chi^2$ distributed with $M$ degrees of freedom and $Z$ and $W$ are independent. See section 5.6 in the book for a more detailed discussion of the small sample distribution of $t$-statistics in regression methods.

Let us simulate the distribution of regression $t$-statistics based on a large number of small random samples, say $n=20$, and compare the simulated distributions to the theoretical distributions which should be $t_{18}$, the $t$-distribution with $18$ degrees of freedom (recall that $\text{DF}=n-k-1$). 

 
```{r, fig.align="center", cache=T}
# initialize two vectors
beta_0 <- c()
beta_1 <- c()

# loop sampling / estimation / t statistics
for (i in 1:10000) {
  
  X <- runif(20, 0, 20)
  Y <- rnorm(n = 20, mean = X)
  reg <- summary(lm(Y ~ X))
  beta_0[i] <- (reg$coefficients[1, 1] - 0)/(reg$coefficients[1, 2])
  beta_1[i] <- (reg$coefficients[2, 1] - 1)/(reg$coefficients[2, 2])
  
}

# plot the distributions and compare with t_18 density:

# divide plotting area
par(mfrow = c(1, 2))

# plot the simulated density of beta_0
plot(density(beta_0), 
     lwd = 2 , 
     main = expression(widehat(beta)[0]), 
     xlim = c(-4, 4),
     ylim=c(0,0.5))

# add the t_18 density to the plot
curve(dt(x, df = 18), 
      add = T, 
      col = "red", 
      lwd = 2, 
      lty = 2)

# plot the simulated density of beta_1
plot(density(beta_1), 
     lwd = 2, 
     main = expression(widehat(beta)[1]),
     xlim = c(-4, 4),
     ylim=c(0,0.5))

# add the t_18 density to the plot
curve(dt(x, df = 18), 
      add = T, 
      col = "red", 
      lwd = 2, 
      lty = 2) 
```


The outcomes are consistent with our expectations: the empirical distributions of both the estimators seem to track the theoretical $t_{18}$ distribution quite closely.

## Exercises {#exercises-5}

```{r, echo=F, purl=F, results='asis'}
if (my_output=="html") {
  cat('
<div  class = "DCexercise">

#### 1. Testing Two Null Hypotheses Separately {-}

Consider the estimated regression model

$$ \\widehat{TestScore} = \\underset{(23.96)}{567.43} - \\underset{(0.85)}{7.15} \\times STR, \\, R^2 = 0.8976, \\, SER=15.19 $$

with standard errors in parentheses.

**Instructions:**

+ Compute the $p$-value for a $t$-test of the hypothesis that the intercept is zero against the two-sided alternative that it is non-zero. Save the result to <tt>p_int</tt>
+ Compute the $p$-value for a $t$-test of the hypothesis that the coefficient of <tt>STR</tt> is zero against the two-sided alternative that it is non-zero. Save the result to <tt>p_STR</tt>

<iframe src="DCL/ex5_1.html" frameborder="0" scrolling="no" style="width:100%;height:340px"></iframe>

**Hint:** 

Both hypotheses can be tested individually using a two-sided test. Use <tt>pnorm()</tt> to obtain cumulated probabilities for standard normally distributed outcomes.

</div>')
} else {
  cat('\\begin{center}\\textit{This interactive part of the book is only available in the HTML version.}\\end{center}')
}
```

```{r, echo=F, purl=F, results='asis'}
if (my_output=="html") {
  cat('
<div  class = "DCexercise">

#### 2. Two Null Hypotheses You Cannot Reject, Can You? {-}

Consider again the estimated regression model

$$\\widehat{TestScore} = \\underset{(23.96)}{567.43} - \\underset{(0.85)}{7.15} \\times STR, \\, R^2 = 0.8976, \\,SER=15.19$$

Can you reject the null hypotheses discussed in the previous code exercise using individual $t$-tests at the $5\\%$ significance level?

The variables <tt>t_int</tt> and <tt>t_STR</tt> are the $t$-statistics. Both are available in your working environment.

**Instructions:**

+ Gather <tt>t_int</tt> and <tt>t_STR</tt> in a vector <tt>test</tt> and use logical operators to check whether the corresponding rejection rule applies.

<iframe src="DCL/ex5_2.html" frameborder="0" scrolling="no" style="width:100%;height:340px"></iframe>

**Hints:**

+ Both tests are two-sided $t$-tests. Key Concept 5.2 recaps how a two-sided $t$-test is conducted. 
+ Use <tt>qnorm()</tt> to obtain standard normal critical values.

</div>')}
```

```{r, echo=F, purl=F, results='asis'}
if (my_output=="html") {
  cat('    
<div  class = "DCexercise">

#### 3. Confidence Intervals {-}

<tt>mod</tt>, the object of class <tt>lm</tt> which contains the estimated regression model $$\\widehat{TestScore} = \\underset{(23.96)}{567.43} - \\underset{(0.85)}{7.15} \\times STR, \\, R^2 = 0.8976, \\,SER=15.19$$ is available in your working environment.

**Instructions:**

Compute $90\\%$ confidence intervals for both coefficients. 

<iframe src="DCL/ex5_3.html" frameborder="0" scrolling="no" style="width:100%;height:340px"></iframe>

**Hint:**

Use the function <tt>confint()</tt>, see <tt>?confint</tt>. The argument <tt>level</tt> sets the confidence level to be used.

</div>')}
```

```{r, echo=F, purl=F, results='asis'}
if (my_output=="html") {
  cat('
<div  class = "DCexercise">

#### 4. A Confidence Interval for the Mean I {-}

Consider the regression model $$Y_i = \\beta_1 + u_i$$ where $Y_i \\sim \\mathcal{N}(\\mu, \\sigma^2)$. Following the discussion preceding equation \\@ref(eq:KI), a $95\\%$ confidence interval for the mean of the $Y_i$ can be computed as

$$CI^{\\mu}_{0.95} = \\left[\\hat\\mu - 1.96 \\times \\frac{\\sigma}{\\sqrt{n}}; \\, \\hat\\mu + 1.96 \\times \\frac{\\sigma}{\\sqrt{n}} \\right].$$

**Instructions:**

+ Sample $n=100$ observations from a normal distribution with variance $100$ and mean $10$.
+ Use the sample to estimate $\\beta_1$. Save the estimate in <tt>mu_hat</tt>.
+ Assume that $\\sigma^2 = 100$ is known. Replace the <tt>NA</tt>s in the code below to obtain a $95\\%$ confidence interval for the mean of the $Y_i$.

<iframe src="DCL/ex5_4.html" frameborder="0" scrolling="no" style="width:100%;height:340px"></iframe>

**Hint:**

Use the function <tt>confint()</tt>, see <tt>?confint</tt>. The argument <tt>level</tt> sets the confidence level.

</div>')}
```

```{r, echo=F, purl=F, results='asis'}
if (my_output=="html") {
  cat('
<div  class = "DCexercise">

#### 5. A Confidence Interval for the Mean II {-}

For historical reasons, some <tt>R</tt> functions which we use to obtain inference on model parameters, among them <tt>confint()</tt> and <tt>summary()</tt>, rely on the $t$-distribution instead of using the large-sample normal approximation. This is why for small sample sizes (and hence small degrees of freedom), $p$-values and confidence intervals reported by these functions deviate from those computed using critical values or cumulative probabilities of the standard normal distribution.

The $95\\%$ confidence interval for the mean in the previous exercise is $[9.13, 13.05]$.

**Instructions:**

100 observations sampled from a normal distribution with $\\mu=10$ and $\\sigma^2=100$ have been assigned to the vector <tt>s</tt> which is available in your environment.

Set up a suitable regression model to estimate the mean of the observations in <tt>s</tt>. Then use <tt>confint()</tt> to compute a $95\\%$ confidence interval for the mean.

(Check that the result is different from the interval reported above.)

<iframe src="DCL/ex5_5.html" frameborder="0" scrolling="no" style="width:100%;height:340px"></iframe>

</div>')}
```

```{r, echo=F, purl=F, results='asis'}
if (my_output=="html") {
  cat('
<div  class = "DCexercise">

#### 6. Regression on a Dummy Variable I {-}

Chapter \\@ref(rwxiabv) discusses regression when $X$ is a dummy variable. We have used a <tt>for()</tt> loop to generate a binary variable indicating whether a schooling district in the <tt>CASchools</tt> data set has a student-teacher ratio below $20$. Though it is instructive to use a loop for this, there are alternate ways to achieve the same with fewer lines of code.

A <tt>data.frame</tt> <tt>DF</tt> with $100$ observations of a variable <tt>X</tt> is available in your working environment.

**Instructions:**

+ Use <tt>ifelse()</tt> to generate a binary vector <tt>dummy</tt> indicating whether the observations in <tt>X</tt> are *positive*.

+ Append <tt>dummy</tt> to the <tt>data.frame</tt> <tt>DF</tt>.

<iframe src="DCL/ex5_6.html" frameborder="0" scrolling="no" style="width:100%;height:340px"></iframe>

</div>')}
```

```{r, echo=F, purl=F, results='asis'}
if (my_output=="html") {
  cat('
<div  class = "DCexercise">

#### 7. Regression on a Dummy Variable II {-}

A <tt>data.frame</tt> <tt>DF</tt> with 100 observations on <tt>Y</tt> and the binary variable <tt>D</tt> from the previous exercise is available in your working environment.

**Instructions:**

+ Compute the group-specific sample means of the observations in <tt>Y</tt>: save the mean of observations in <tt>Y</tt> where <tt>D == 1</tt> to <tt>mu_Y_D1</tt> and assign the mean of those observations with <tt>D == 0</tt> to <tt>mu_Y_D0</tt>. 

+ Use <tt>lm()</tt> to regress <tt>Y</tt> on <tt>D</tt>, i.e., estimate the coefficients in the model $$Y_i = \\beta_0 + \\beta_1 \\times D_i + u_i.$$

Also check that the estimates of the coefficients $\\beta_0$ and $\\beta_1$ reflect specific sample means. Can you tell which (no code submission needed)?

<iframe src="DCL/ex5_7.html" frameborder="0" scrolling="no" style="width:100%;height:340px"></iframe>

</div>')}
```

```{r, echo=F, purl=F, results='asis'}
if (my_output=="html") {
  cat('
<div  class = "DCexercise">

#### 8. Regression on a Dummy Variable III {-}

In this exercise, you have to visualize some of the results from the dummy regression model $$\\widehat{Y}_i = -0.66 + 1.43 \\times D_i$$ estimated in the previous exercise.

A <tt>data.frame</tt> <tt>DF</tt> with 100 observations on <tt>Y</tt> and the binary variable <tt>dummy</tt> as well as the model object <tt>dummy_mod</tt> from the previous exercise are available in your working environment.

**Instructions:**

+ Start by drawing a visually appealing plot of the observations on $Y$ and $D$ based on the code chunk provided in <tt>Script.R</tt>. Replace the <tt>???</tt> by the correct expressions!

+ Add the regression line to the plot.

<iframe src="DCL/ex5_8.html" frameborder="0" scrolling="no" style="width:100%;height:340px"></iframe>

</div>')}
```

```{r, echo=F, purl=F, results='asis'}
if (my_output=="html") {
  cat('
<div  class = "DCexercise">

#### 9. Gender Wage Gap I {-}

The cross-section data set <tt>CPS1985</tt> is a subsample from the May 1985 *Current Population Survey* conducted by the *US Census Bureau* which contains observations on, among other things, wage and the gender of employees.

<tt>CPS1985</tt> is part of the package <tt>AER</tt>.

**Instructions:**

+ Attach the package <tt>AER</tt> and load the data set <tt>CPS1985</tt>.

+ Estimate the dummy regression model $$wage_i = \\beta_0 + \\beta_1 \\cdot gender_i + u_i$$ where 
  
    \\begin{align*}
      gender_i = 
      \\begin{cases}
        1, & \\text{if employee} \\, i \\, \\text{is female,} \\\\
        0, & \\text{if employee} \\, i \\,  \\text{is male.}
      \\end{cases}
    \\end{align*}
  
    Save the result in <tt>dummy_mod</tt>.

<iframe src="DCL/ex5_9.html" frameborder="0" scrolling="no" style="width:100%;height:340px"></iframe>

</div>')}
```

```{r, echo=F, purl=F, results='asis'}
if (my_output=="html") {
  cat('
<div  class = "DCexercise">

#### 10. Gender Wage Gap II {-}

The wage regression from the previous exercise yields $$\\widehat{wage}_i = 9.995 - 2.116 \\cdot gender_i.$$

The model object <tt>dummy_mod</tt> is available in your working environment.

**Instructions:**

+ Test the hypothesis that the coefficient on $gender_i$ is zero against the alternative that it is non-zero. The null hypothesis implies that there is no gender wage gap. Use the heteroskedasticity-robust estimator proposed by @white1980.

<iframe src="DCL/ex5_10.html" frameborder="0" scrolling="no" style="width:100%;height:340px"></iframe>

**Hints:**

+ <tt>vcovHC()</tt> computes heteroskedasticity-robust estimates of the covariance matrix of the coefficient estimators for the model supplied. The estimator proposed by @white1980 is computed if you set <tt>type = "HC0"</tt>.

+ The function <tt>coeftest()</tt> performs significance tests for the coefficients in model objects. A covariance matrix can be supplied using the argument <tt>vcov.</tt>.

</div>')}
```

```{r, echo=F, purl=F, results='asis'}
if (my_output=="html") {
  cat('
<div  class = "DCexercise">

#### 11. Computation of Heteroskedasticity-Robust Standard Errors {-}

In the simple regression model, the covariance matrix of the coefficient estimators is denoted


$$\\text{Var}
  \\begin{pmatrix}
    \\hat\\beta_0 \\
    \\hat\\beta_1
  \\end{pmatrix} = 
\\begin{pmatrix}
  \\text{Var}(\\hat\\beta_0) & \\text{Cov}(\\hat\\beta_0,\\hat\\beta_1) \\\\
\\text{Cov}(\\hat\\beta_0,\\hat\\beta_1) & \\text{Var}(\\hat\\beta_1)
\\end{pmatrix}$$


The function <tt>vcovHC</tt> can be used to obtain estimates of this matrix for a model object of interest.

<tt>dummy_mod</tt>, a model object containing the wage regression dealt with in Exercises 9 and 10 is available in your working environment.

**Instructions:**

+ Compute robust standard errors of the type <tt>HC1</tt> for the coefficients estimators in the model object <tt>dummy_mod</tt>. Store the standard errors in a vector named <tt>rob_SEs</tt>.

<iframe src="DCL/ex5_11.html" frameborder="0" scrolling="no" style="width:100%;height:340px"></iframe>

**Hints**

+ The standard errors we seek can be obtained by taking the square root of the diagonal elements of the estimated covariance matrix. 
+ <tt>diag(A)</tt> returns the diagonal elements of the matrix <tt>A</tt>.


</div>')}
```

```{r, echo=F, purl=F, results='asis'}
if (my_output=="html") {
  cat('
<div  class = "DCexercise">

#### 12. Robust Confidence Intervals {-}

The function <tt>confint()</tt> computes confidence intervals for regression models using homoskedasticity-only standard errors so this function is not an option when there is heteroskedasticity.

The function <tt>Rob_CI()</tt> in <tt>script.R</tt> is meant to compute and report heteroskedasticity-robust confidence intervals for both model coefficients in a simple regression model.

<tt>dummy_mod</tt>, a model object containing the wage regression dealt with in the previous exercises is available in your working environment.

**Instructions:**

+ Complete the code of <tt>Rob_CI()</tt> given in <tt>Script.R</tt> such that lower and upper bounds of $95\\%$ robust confidence intervals are returned. Use standard errors of the type <tt>HC1</tt>.

+ Use the function <tt>Rob_CI()</tt> to obtain $95\\%$ confidence intervals for the model coefficients in <tt>dummy_mod</tt>.

<iframe src="DCL/ex5_12.html" frameborder="0" scrolling="no" style="width:100%;height:340px"></iframe>

</div>')}
```

```{r, echo=F, purl=F, results='asis'}
if (my_output=="html") {
  cat('
<div  class = "DCexercise">

#### 13. A Small Simulation Study --- I {-}

Consider the data generating process (DGP)
\\begin{align}
  X_i \\sim& \\, \\mathcal{U}[2,10], \\notag \\\\
  e_i \\sim& \\, \\mathcal{N}(0, X_i), \\notag \\\\
  Y_i =& \\, \\beta_1 X_i + e_i, (\\#eq:asss)  
\\end{align}
where $\\mathcal{U}[2,10]$ denotes the uniform distribution on the interval $[2,10]$ and $\\beta_1=2$.

Notice that the errors $e_i$ are heteroskedastic since the variance of the $e_i$ is a function of $X_i$.

**Instructions:**

+ Write a function <tt>DGP_OLS</tt> that generates a sample $(X_i,Y_i)$, $i=1,…,100$ using the DGP above and returns the OLS estimate of $\\beta_1$ based on this sample.

<iframe src="DCL/ex5_13.html" frameborder="0" scrolling="no" style="width:100%;height:340px"></iframe>

**Hint:**

<tt>runif()</tt> can be used to obtain random samples from a uniform distribution, see <tt>?runif</tt>.

</div>')}
```

```{r, echo=F, purl=F, results='asis'}
if (my_output=="html") {
  cat('
<div  class = "DCexercise">

#### 14. A Small Simulation Study --- II {-}

The function <tt>DGP_OLS()</tt> from the previous exercise is available in your working environment.

**Instructions:**

+ Use <tt>replicate()</tt> to generate a sample of $1000$ OLS estimates $\\widehat{\\beta}_1$ using the function <tt>DGP_OLS</tt>. Store the estimates in a vector named <tt>estimates</tt>.

+ Next, estimate the variance of $\\widehat{\\beta}_1$ in \\@ref(eq:asss): compute the sample variance of the $1000$ OLS estimates in <tt>estimates</tt>. Store the result in <tt>est_var_OLS</tt>.

<iframe src="DCL/ex5_14.html" frameborder="0" scrolling="no" style="width:100%;height:340px"></iframe>

</div>')}
```

```{r, echo=F, purl=F, results='asis'}
if (my_output=="html") {
  cat('
<div  class = "DCexercise">

#### 15. A Small Simulation Study --- III {-}

According to the the Gauss-Markov theorem, the OLS estimator in linear regression models is no longer the most efficient estimator among the conditionally unbiased linear estimators when there is heteroskedasticity. In other words, the OLS estimator loses the BLUE property when the assumption of homoskedasticity is violated.

It turns out that OLS applied to the weighted observations $(w_i X_i, w_i Y_i)$ where $w_i=\\frac{1}{\\sigma_i}$ is the BLUE estimator under heteroskedasticity. This estimator is called the *weighted least squares* (WLS) estimator. Thus, when there is heteroskedasticity, the WLS estimator has lower variance than OLS.

The function <tt>DGP_OLS()</tt> and the estimated variance <tt>est_var_OLS</tt> from the previous exercises are available in your working environment.

**Instructions:**

+ Write a function <tt>DGP_WLS()</tt> that generates $100$ samples using the DGP introduced in Exercise 13 and returns the WLS estimate of $\\beta_1$. Treat $\\sigma_i$ as known, i.e., set $w_i=\\frac{1}{\\sqrt{X_i}}$.

+ Repeat exercise 14 using <tt>DGP_WLS()</tt>. Store the variance estimate in <tt>est_var_WLS</tt>.

+ Compare the estimated variances <tt>est_var_OLS</tt> and <tt>est_var_WLS</tt> using logical operators (<tt><</tt> or <tt>></tt>).

<iframe src="DCL/ex5_15.html" frameborder="0" scrolling="no" style="width:100%;height:340px"></iframe>

**Hints:**

+ <tt>DGP_WLS()</tt> can be obtained using a modified code of <tt>DGP_OLS()</tt>.

+ Remember that functions are objects and you may print the code of a function to the console.

</div>')}
```