diff --git a/06-StdErrors.Rmd b/06-StdErrors.Rmd
index 1caffe4e..3160d31b 100644
--- a/06-StdErrors.Rmd
+++ b/06-StdErrors.Rmd
@@ -1,11 +1,14 @@
# Standard Errors {#std-errors}
-In the previous chapters we have seen how the OLS method can produce estimates about intercept and slope coefficients from data. You have seen this method at work in `R` by using the `lm` function as well. It is now time to introduce the notion that given that $b_0$, $b_1$ and $b_2$ are *estimates* of some unkown *population parameters*, there is some degree of **uncertainty** about their values. An other way to say this is that we want some indication about the *precision* of those estimates.
+In the previous chapters we have seen how the OLS method can produce estimates about intercept and slope coefficients from data.
+You have seen this method at work in `R` by using the `lm` function as well.
+It is now time to introduce the notion that as $b_0$, $b_1$ and $b_2$ are *estimates* of some unknown *population parameters*, there is some degree of **uncertainty** about how well these estimates reflect the true parameter values.
+Another way to say this is that we want some indication about the *precision* of those estimates.
```{block,type="note"}
-How *confident* should we be about the estimated values $b$?
+How *confident* should we be that $b$ is a good estimate of the true parameter?
```
@@ -17,18 +20,18 @@ library(Ecdat)
p <- ggplot(mapping = aes(x = str, y = testscr), data = Caschool) # base plot
p <- p + geom_point() # add points
p <- p + geom_smooth(method = "lm", size=1, color="red") # add regression line
-p <- p + scale_y_continuous(name = "Average Test Score") +
+p <- p + scale_y_continuous(name = "Average Test Score") +
scale_x_continuous(name = "Student/Teacher Ratio")
p + theme_bw() + ggtitle("Testscores vs Student/Teacher Ratio")
```
-
-The shaded area shows us the region within which the **true** red line will lie with 95% probability. The fact that there is an unknown true line (i.e. a *true* slope coefficient $\beta_1$) that we wish to uncover from a sample of data should remind you immediately of our first tutorial. There, we wanted to estimate the true population mean from a sample of data, and we saw that as the sample size $N$ increased, our estimate got better and better - fundamentally this is the same idea here.
-## What is *true*? What are Statistical Models?
+The shaded area shows us the region within which the **true** red line will lie with 95% probability. The fact that there is an unknown true line (i.e. a *true* slope coefficient $\beta_1$) that we wish to uncover from a sample of data should remind you immediately of our first tutorial. There, we wanted to estimate the true population mean from a sample of data, and we saw that as the sample size $N$ increased, our estimate got better and better---fundamentally this is the same idea here.
+
+## What is *true*? What are statistical models?
+
+A **statistical model** is simply a set of assumptions about how some data have been generated. As such, it models the data-generating process (DGP), as we have it in mind. Once we define a DGP, we could simulate data from it and see how this compares to the data we observe in the real world. Or, we could change the parameters of the DGP so as to understand how the real world data *would* change, should we (or some policy) change the corresponding parameters in reality. Let us now consider one particular statistical model, which in fact we have seen many times already.
-A **statistical model** is simply a set of assumptions about how some data have been generated. As such, it models the data-generating process (DGP), as we have it in mind. Once we define a DGP, we could simulate data from it and see how this compares to the data we observe in the real world. Or, we could change the parameters of the DGP so as to understand how the real world data *would* change, could we (or some policy) change the corresponding parameters in reality. Let us now consider one particular statistical model, which in fact we have seen so many times already.
-
## The Classical Regression Model {#class-reg}
@@ -39,7 +42,7 @@ Let's bring back our simple model \@ref(eq:abline) to explain this concept.
y_i = \beta_0 + \beta_1 x_i + \varepsilon_i (\#eq:abline-5)
\end{equation}
-The smallest set of assumptions used to define the *classical regression model* as in \@ref(eq:abline-5) are the following:
+The smallest set of assumptions required to define the *classical regression model* as in \@ref(eq:abline-5) are the following:
1. The data are **not linearly dependent**: Each variable provides new information for the outcome, and it cannot be replicated as a linear combination of other variables. We have seen this in section \@ref(multicol). In the particular case of one regressor, as here, we require that $x$ exhibit some variation in the data, i.e. $Var(x)\neq 0$.
1. The mean of the residuals conditional on $x$ should be zero, $E[\varepsilon|x] = 0$. Notice that this also means that $Cov(\varepsilon,x) = 0$, i.e. that the errors and our explanatory variable(s) should be *uncorrelated*. It is said that $x$ should be **strictly exogenous** to the model.
@@ -48,9 +51,9 @@ These assumptions are necessary to successfully (and correctly!) run an OLS regr
3. The data are drawn from a **random sample** of size $n$: observation $(x_i,y_i)$ comes from the exact same distribution, and is independent of observation $(x_j,y_j)$, for all $i\neq j$.
4. The variance of the error term $\varepsilon$ is the same for each value of $x$: $Var(\varepsilon|x) = \sigma^2$. This property is called **homoskedasticity**.
-5. The error is normally distributed, i.e. $\varepsilon \sim \mathcal{N}(0,\sigma^2)$
+5. The errors are **normally distributed**, i.e. $\varepsilon \sim \mathcal{N}(0,\sigma^2)$
-Invoking assumption 5. in particular defines what is commonly called the *normal* linear regression model.
+Invoking assumption 5 in particular defines what is commonly called the *normal* linear regression model.
### $b$ is not $\beta$!
@@ -60,24 +63,34 @@ Let's talk about the small but important modifications we applied to model \@re
* $\beta_0$ and $\beta_1$ and intercept and slope parameters
* $\varepsilon$ is the error term.
-First, we *assumed* that \@ref(eq:abline-5) is the correct represenation of the DGP. With that assumption in place, the values $\beta_0$ and $\beta_1$ are the *true parameter values* which generated the data. Notice that $\beta_0$ and $\beta_1$ are potentially different from $b_0$ and $b_1$ in \@ref(eq:abline) for a given sample of data - they could in practice be very close to each other, but $b_0$ and $b_1$ are *estimates* of $\beta_0$ and $\beta_1$. And, crucially, those estimates are generated from a sample of data. Now, the fact that our data $\{y_i,x_i\}_{i=1}^N$ are a sample from a larger population, means that there will be *sampling variation* in our estimates - exactly like in the case of the sample mean estimating the population average as mentioned above. One particular sample of data will generate one particular set of estimates $b_0$ and $b_1$, whereas another sample of data will generate estimates which will in general be different - by *how much* those estimates differ across samples is the question in this chapter. In general, the more observations we have the greater the precision of our estimates, hence, the closer the estimates from different samples will lie together.
+First, we *assumed* that \@ref(eq:abline-5) reflects the true DGP.
+With that assumption in place, the values $\beta_0$ and $\beta_1$ are the *true parameter values* which generated the data.
+Notice that $\beta_0$ and $\beta_1$ are potentially different from $b_0$ and $b_1$ in \@ref(eq:abline) for a given sample of data---they could in practice be very close to each other, but $b_0$ and $b_1$ are *estimates* of $\beta_0$ and $\beta_1$.
+And, crucially, those estimates are generated from a sample of data.
+Now, the fact that our data $\{y_i,x_i\}_{i=1}^N$ are a sample from a larger population, means that there will be *sampling variation* in our estimates---exactly like in the case of the sample mean estimating the population average as mentioned above.
+One particular sample of data will generate one particular set of estimates $b_0$ and $b_1$, whereas another sample of data will generate estimates which will in general be different---by *how much* those estimates differ across samples is the question of this chapter.
+In general, the more observations we have the greater the precision of our estimates, hence, the closer the estimates from different samples will lie together.
### Standard Errors in Theory {#se-theory}
-The standard deviation of the OLS parameters is generally called *standard error*. As such, it is just the square root of the parameter's variance.
-Under assumptions 1. through 4. we can define the formula for the variance of our slope coefficient in the context of our single regressor model \@ref(eq:abline-5) as follows:
+The standard deviation of the OLS parameters is generally called *standard error*.
+As such, it is just the square root of the parameter's variance.
+Under assumptions 1 through 4 we can define the formula for the variance of our slope coefficient in the context of our single regressor model \@ref(eq:abline-5) as follows:
\begin{equation}
Var(b_1|x_i) = \frac{\sigma^2}{\sum_i^N (x_i - \bar{x})^2} (\#eq:var-ols)
\end{equation}
-In pratice, we don't know the theoretical variance of $\varepsilon$, i.e. $\sigma^2$, but we form an estimate about it from our sample of data. A widely used estimate uses the already encountered SSR (sum of squared residuals), and is denoted $s^2$:
+In pratice, we don't know the theoretical variance of $\varepsilon$, i.e. $\sigma^2$, but we form an estimate about it from our sample of data.
+A widely used estimate uses the already encountered SSR (sum of squared residuals), and is denoted $s^2$:
$$
s^2 = \frac{SSR}{n-p} = \frac{\sum_{i=1}^n (y_i - b_0 - b_1 x_i)^2}{n-p} = \frac{\sum_{i=1}^n e_i^2}{n-p}
$$
-where $n-p$ are the *degrees of freedom* available in this estimation. $p$ is the number of parameters we wish to estimate (here: 1). So, the variance formula would become
+where $n-p$ are the *degrees of freedom* available in this estimation.
+$p$ is the number of parameters we wish to estimate (here: 1).
+So, the variance formula would become
\begin{equation}
Var(b_1|x_i) = \frac{SSR}{(n-p)\sum_i^N (x_i - \bar{x})^2} (\#eq:var-ols2)
@@ -117,63 +130,85 @@ ctval = qt(0.95,df=n)
cxbar = ctval * (s/sqrt(n)) + mu
```
-Imagine we were tasked by the Director of our school to provide him with our best guess of the *mean body height* $\mu$ amongst all SciencesPo students in order to assess which height the new desks should have. Of course, we are econometricians and don't *guess* things: we **estimate** them! How would we go about this task and estimate $\mu$?
+Imagine we were tasked by the director of our school to provide him with our best guess of the *mean body height* $\mu$ amongst all SciencesPo students in order to assess which height the new desks should have.
+Of course, we are econometricians and don't *guess* things: we **estimate** them!
+How would we go about this task and estimate $\mu$?
-You may want to ask: Why bother with this estimation business at all, and not just measure all students' height, compute $\mu$, and that's it? That's a good question! In most cases, we cannot do this, either because we do not have access to the entire population (think of computing the mean height of all Europeans!), or it's too costly to measure everyone, or it's impractical. That's why we take *samples* from the wider population, to make inference. In our example, suppose we'd randomly measure students coming out of the SciencesPo building at 27 Rue Saint Guillaume until we have $`r n`$ measurements on any given Monday. Suppose further that we found a sample mean height $\bar{x} = `r xbar`$, and that the sample standard deviation was $s=`r s`$. In short, we found the data summarized in figure \@ref(fig:heightdata)
+You may want to ask: Why bother with this estimation business at all, and not just measure all students' height, compute $\mu$, and that's it?
+That's a good question!
+In most cases, we cannot do this, either because we do not have access to the entire population (think of computing the mean height of all Europeans!), or it's too costly to measure everyone, or it's impractical.
+That's why we take *samples* from the wider population, to make inference.
+In our example, suppose we randomly measure students coming out of the SciencesPo building at 27 rue Saint-Guillaume until we have $`r n`$ measurements on any given Monday.
+Suppose further that we found a sample mean height $\bar{x} = `r xbar`$, and that the sample standard deviation was $s=`r s`$. In short, we found the data summarized in figure \@ref(fig:heightdata)
-```{r heightdata,echo=FALSE,fig.cap="Our ficitious sample of SciencesPo students' body height. The small ticks indicate the location of each measurement.",fig.align='center'}
+```{r heightdata,echo=FALSE,fig.cap="Our fictitious sample of SciencesPo students' body height. The small ticks indicate the location of each measurement.",fig.align='center'}
hist(height)
rug(height)
```
-What are we going to tell *Monsieur le Directeur* now, with those two numbers and figure \@ref(fig:heightdata) in hand? Before we address this issue, we need to make a short detour into *test statistics*.
+What are we going to tell *Monsieur le Directeur* now, with those two numbers and figure \@ref(fig:heightdata) in hand?
+Before we address this issue, we need to make a short detour into *test statistics*.
-### Test Statistics
+### Test statistics
-We have encountered many statistics already: think of the sample mean, or the standard deviation. Statistics are just functions of data. *Test* statistics are used to perform statistical tests.
+We have encountered many statistics already: think of the sample mean, or the standard deviation.
+Statistics are just functions of data.
+*Test* statistics are used to perform statistical tests.
-Many test statistics rely on some notion of *standardizing* the sample data so that it becomes comparable to a theoretical distribution. We encountered this idea already in section \@ref(reg-standard), where we talked about a standardized regression. The most common standardization is the so-called *z-score*, which says that
+Many test statistics rely on some notion of *standardizing* the sample data so that it becomes comparable to a standard distribution.
+We encountered this idea already in section \@ref(reg-standard), where we talked about a standardized regression.
+The most common standardization is the so-called *z-score*:
+*If some random variable $X$ is normally distributed with mean, $\mu$, and standard deviation, $\sigma$. I.e. $X \sim \mathscr{N}(\mu, \sigma^2)$, then*
\begin{equation}
-\frac{x - \mu}{\sigma}\equiv z\sim \mathcal{N}(0,1), (\#eq:zscore)
+\sqrt{N} \frac{\bar{X} - \mu}{\sigma}\equiv Z \sim \mathcal{N}(0,1), (\#eq:zscore)
\end{equation}
-in other words, substracting the population mean from random variable $x$ and dividing by it's population standard deviation yields a standard normally distributed random variable, commonly called $z$.
+in other words, the sample mean of $N$ random draws from a normal distribution, is itself normally distributed about the population mean, $\mu$, with standard deviation $\sigma^2 / \sqrt{N}$ .
+This is a consequence of the [central limit theorem](https://en.wikipedia.org/wiki/Central_limit_theorem).
-A very similar idea applies if we *don't know* the population variance (which is our case here!). The corresponding standardization gives rise to the *t-statistic*, and it looks very similar to \@ref(eq:zscore):
+A very similar idea applies if we *don't know* the population variance (which is our case here!).
+The corresponding standardization gives rise to the *t-statistic*, and it looks very similar to \@ref(eq:zscore):
\begin{equation}
-\sqrt{n} \frac{\bar{x} - \mu}{s} \equiv T \sim t_{n-1} (\#eq:tscore)
+\sqrt{N} \frac{\bar{x} - \mu}{s} \equiv T \sim t_{n-1} (\#eq:tscore)
\end{equation}
-
Several things to note:
* We observe the same standardization as above: dividing by the sample standard deviation $s$ brings $\bar{x} - \mu$ to a *unit free* scale.
-* We use $\bar{x}$ and $s$ instead of $x$ and $\sigma$
-* We multiply by $\sqrt{n}$ because we expect $\bar{x} - \mu$ to be a small number: we need to *rescale* it again to make it compatible with the $t_{n-1}$ distribution.
-* $t_{n-1}$ is the [Student's T](https://en.wikipedia.org/wiki/Student's_t-distribution) distribution with $n-1$ degrees of freedom. We don't have $n$ degrees of freedom because we already had to estimate one statistic ($\bar{x}$) in order to construct $T$.
+* We use $s$ instead of $\sigma$, an estimate of the *population parameter*
+
+* $t_{N-1}$ is the [Student's t](https://en.wikipedia.org/wiki/Student's_t-distribution) distribution with $n-1$ degrees of freedom.
+* We don't have $N$ degrees of freedom because we already had to estimate one statistic ($\bar{x}$) in order to construct $t$.
### Confidence Intervals {#CI}
-Back to our example now! We are clearly in need of some measure of *confidence* about our sample statistic $\bar{x} = `r xbar`$ before we communicate our result. It seems reasonable to inform the Director about $\bar{x}$, but surely we also need to tell him that there was considerable *dispersion* in the data: Some people were as short as `r round(min(height),2)`cm, while others were as tall as `r round(max(height),2)`cm!
+Back to our example now!
+We are clearly in need of some measure of *confidence* about our sample statistic $\bar{x} = `r xbar`$ before we communicate our result.
+It seems reasonable to inform the director about $\bar{x}$, but surely we also need to tell him that there was considerable *dispersion* in the data: Some people were as short as `r round(min(height),2)`cm, while others were as tall as `r round(max(height),2)`cm!
-The way to proceed is to construct a *confidence interval* about the true population mean $\mu$, based on $\bar{x}$, which will take this uncertainty into account. We will use the *t* statistic from above. We want to have a *symmetric interval* around $\bar{x}$ which contains the true value $\mu$ with probability $1-\alpha$. One very popular choice of $\alpha$ is $0.05$, hence we cover $\mu$ with 95% probability. After computing our statistic $T$ as defind in \@ref(eq:tscore), this interval is defined as follows:
+The way to proceed is to construct a *confidence interval* for the true population mean $\mu$, based on $\bar{x}$, which will take this uncertainty into account.
+We will use the *t* statistic from above.
+We want to have a *symmetric interval* around $\bar{x}$ which contains the true value $\mu$ with probability $1-\alpha$.
+A popular choice of $\alpha$ is $0.05$, hence we cover $\mu$ with 95% probability, conditional on our assumptions about the *true* distribution being true, and given the sample data.
+After computing our statistic $T$ as defind in \@ref(eq:tscore), this interval is defined as follows:
\begin{align}
\Pr \left(-c \leq T \leq c \right) = 1-\alpha (\#eq:ci)
\end{align}
-where $c$ stands for *critical value*, which we need to choose. This is illustrated in figure \@ref(fig:cifig).
+where $c$ stands for *critical value*, which are straightforward to calculate (or look up) given $\alpha$.
+This is illustrated in figure \@ref(fig:cifig).
```{r cifig, echo=FALSE, engine='tikz', out.width='90%', fig.ext=if (knitr:::is_latex_output()) 'pdf' else 'png', fig.cap='Confidence Interval Construction. The blue area is called *coverage region* which contains the true $\\mu$ with probability $1-\\alpha$.',fig.align='center',engine.opts = list(convert = 'convert', convert.opts = '-density 600')}
\begin{tikzpicture}[scale=2, y=5cm]
\draw[domain=-3.15:-2] (-3.15,0) plot[id=gauss1,samples=50]
-(\x,{1/sqrt(2*pi)*exp(-0.5*(\x)^2)}) -- (-2,0);
-\draw[domain=-2:2,fill=blue,opacity=0.4] (-2,0) -- plot[id=gauss1, samples=100] (\x,{1/sqrt(2*pi)*exp(-0.5*(\x)^2)}) -- (2,0);
+(\x,{1/sqrt(2*pi)*exp(-0.5*(\x)^2)}) -- (-2,0);
+\draw[domain=-2:2,fill=blue,opacity=0.4] (-2,0) -- plot[id=gauss1, samples=100] (\x,{1/sqrt(2*pi)*exp(-0.5*(\x)^2)}) -- (2,0);
\draw[domain=2:3.2] (2,0) --
plot[id=gauss3, samples=50] (\x,{1/sqrt(2*pi)*exp(-0.5*(\x)^2)});
\draw (2,-0.01) -- (2,0.01); % ticks
@@ -222,24 +257,31 @@ Here $\mathcal{T}_{df}^{-1}$ stands for the *quantile function*, i.e. the invers
\begin{align}
0.95 = 1-\alpha &= \Pr \left(-c \leq T \leq c \right) \\(\#eq:ci2)
&= \Pr \left(-`r round(qt(0.975,df=n-1),3)` \leq \sqrt{n} \frac{\bar{x} - \mu}{s} \leq `r round(qt(0.975,df=n-1),3)` \right) \\
- &= \Pr \left(\bar{x} -`r round(qt(0.975,df=n-1),3)` \frac{s}{\sqrt{n}} \leq \mu \leq \bar{x} + `r round(qt(0.975,df=n-1),3)` \frac{s}{\sqrt{n}} \right)
+ &= \Pr \left(\bar{x} -`r round(qt(0.975,df=n-1),3)` \frac{s}{\sqrt{n}} \leq \mu \leq \bar{x} + `r round(qt(0.975,df=n-1),3)` \frac{s}{\sqrt{n}} \right)
\end{align}
-Finally, filling in our numbers for $s$ etc, this implies that a 95% confidence interval about the location of the true average height of all SciencesPo students, $\mu$, is given by:
+Finally, filling in our numbers for $s$ etc, this implies that a 95% confidence interval for the true average height of all SciencesPo students, $\mu$, is given by:
\begin{equation}
CI = \left[`r round(xbar - qt(0.975,df=n-1)* s/sqrt(n),3)` , `r round(xbar + qt(0.975,df=n-1)* s/sqrt(n),3)` \right]
\end{equation}
-We would tell the director that with 95% probability, the true average height of all students comes to lie within those two bounds.
+We would tell the director that with 95% probability, the true average height of all students lies within those two bounds.
-Finally, looking back at figure \@ref(fig:confint) above, the shaded area is just the 95% confidence interval *about the true value $\beta_1$*. We would say that *the true regression line* is contained within the shaded region with 95% probability. Very similarly to our example of $\bar{x}$, in that picture we have instead an estimate $b_1$, with an associated standard error $SE(b_1)$. The shaded area is called *confidence band*, and it is just plotting the confidence interval *for each value $x$* in the data. You can see how the band becomes narrower (i.e. the estimate becomes more precise) if there is more data associated to a certain $x$.
+Finally, looking back at figure \@ref(fig:confint) above, the shaded area is just the 95% confidence interval *about the true value $\beta_1$*.
+We would say that *the true regression line* is contained within the shaded region with 95% probability.
+Very similar to our example of $\bar{x}$, in that picture we have instead an estimate $b_1$, with an associated standard error $SE(b_1)$.
+The shaded area is called *confidence band*, and it is just plotting the confidence interval *for each value $x$* in the data.
+You can see how the band becomes narrower (i.e. the estimate becomes more precise) if there is more data associated to a certain value of $x$.
## Hypothesis Testing
-Now know by now how the standard errors of an OLS estimate are computed, and what they stand for. We can now briefly^[We will not go into great detail here. Please refer back to your statistics course from last spring semester (chapters 8 and 9), or the short note I [wrote while ago](images/hypothesis.pdf) ] discuss a very common usage of this information, in relation to which variables we should include in our regression. There is a statistical proceedure called *hypothesis testing* which helps us to make such decisions.
-In [hypothesis testing](https://en.wikipedia.org/wiki/Statistical_hypothesis_testing), we have a baseline, or *null* hypothesis $H_0$, which we want to confront with a competing *alternative* hypthesis $H_1$. Continuing with our example of the mean height of SciencesPo students ($\mu$), one potential hypothesis could be
+We have seen how the standard errors of an OLS estimate are computed, and what they stand for.
+We can now briefly^[We will not go into great detail here. Please refer back to your statistics course from last spring semester (chapters 8 and 9), or the short note I [wrote while ago](images/hypothesis.pdf) ] discuss a very common usage of this information, in relation to which variables we should include in our regression.
+There is a statistical procedure called *hypothesis testing* which helps us to make such decisions.
+In [hypothesis testing](https://en.wikipedia.org/wiki/Statistical_hypothesis_testing), we have a baseline, or *null* hypothesis $H_0$, which we want to confront with a competing *alternative* hypthesis $H_1$.
+Continuing with our example of the mean height of SciencesPo students ($\mu$), one potential hypothesis could be
\begin{align}
H_0:& \mu = `r mu`\\
@@ -253,18 +295,22 @@ H_0:& \mu = `r mu`\\
H_1:& \mu > `r mu`.
\end{align}
-which would mean: under the null hypothesis, the average of all ScPo students' body height is `r mu`cm. Under the alternative, it is larger. You can immediately see that this is very similar to confidence interval construction.
+which would mean: under the null hypothesis, the average of all ScPo students' body height is `r mu`cm. Under the alternative, it is larger.
+You can immediately see that this is very similar to confidence interval construction.
-Suppose as above that we found $\bar{x} = `r xbar`$, and that the sample standard deviation is still $s=`r s`$. Would you regard this as strong or weak evidence against $H_0$ and in favor of $H_1$?
+Suppose as above that we found $\bar{x} = `r xbar`$, and that the sample standard deviation is still $s=`r s`$.
+Would you regard this as strong or weak evidence against $H_0$ and in favor of $H_1$?
-You should now remember what you saw when you did `launchApp("estimate")`. Look again at this app and set the slider to a sample size of $`r n`$, just as in our running example. You can see that the app draws one hundred (100) samples for you, locates their sample mean on the x-axis, and estimates the red density.
+You should now remember what you saw when you did `launchApp("estimate")`.
+Look again at this app and set the slider to a sample size of $`r n`$, just as in our running example.
+You can see that the app draws one hundred (100) samples for you, locates their sample mean on the x-axis, and estimates the red density.
```{block type='note'}
The crucial thing to note here is that, given we are working with a **random sample** from a population with a certain distribution of *height*, our sample statistic $\bar{x}$ is **also a random variable**. Every new set of randomly drawn students would yield a different $\bar{x}$, and all of them together would follow the red density in the app. In reality we often only get to draw one single sample, and we can use knowledge about the sampling distribution to make inference.
```
-Our task is now to decide if given that particular sampling distribution, given our estimate $\bar{x}$ and given an observed sample variance $s^2$, whether $\bar{x} = `r xbar`$ is *far away* from $\bar{x} = `r mu`$, or not. The way to proceed is by computing a *test statistic*, which is to be compared to a *critical value*: if the test statistic exceeds that value, we reject $H_0$, otherwise we cannot. The critical value depends on the sampling distribution, and the size of the test. We talk about this next.
+Our task is now to decide if given that particular sampling distribution, given our estimate $\bar{x}$ and given an observed sample variance $s^2$, whether $\bar{x} = `r xbar`$ is *far away* from $\bar{x} = `r mu`$, or not. The way to proceed is by computing a *test statistic*, which is to be compared to a *critical value*: if the test statistic exceeds that value, we reject $H_0$, otherwise we cannot. The critical value depends on the sampling distribution, and the size of the test. We talk about this next.
### Making Errors
@@ -512,8 +558,12 @@ This shows that lotsize and the number of bathrooms is indeed positively related
```{block type='note'}
**Direction of Omitted Variable Bias**
-If there is an omitted variable $z$ that is *positively* correlated with our explanatory variable $x$, then our estimate of effect of $x$ on $y$ will be too large (or, *biased upwards*). The correlation between $x$ and $z$ means that we attribute part of the impact of $z$ on $y$ mistakenly to $x$! And, of course, vice versa for *negatively* correlated omitted variables.
-```
-
+The direction of the bias on $b1$ due to omitting $z$ depends on two covariances: $Cov(x, z)$ and $Cov(z, y)$
+- if $Cov(x,z) > 0$ **and** $Cov(z,y) > 0$ then the omitted variable bias will be positive, $b_1 > \beta_1$
+- if one of $Cov(x,z)$ and $Cov(z,y)$ is positive and the other negative (it doesn't matter which), then the bias will be negative
+if $Cov(x,z) < 0$ **and** $Cov(z,y) < 0$ then the omitted variable bias will be positive, $b_1 > \beta_1$
+The correlation between $x$ and $z$ means that we attribute part of the impact of $z$ on $y$ mistakenly to $x$!
+```
+