Skip to content

Commit

Permalink
add test for correlations to CH 17. Add opensafely to CH 13. Remove T…
Browse files Browse the repository at this point in the history
…witter link
  • Loading branch information
Lakens committed Jul 7, 2024
1 parent e332e5d commit 8e85109
Show file tree
Hide file tree
Showing 47 changed files with 156 additions and 64 deletions.
4 changes: 2 additions & 2 deletions 13-prereg.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ In the past, researchers have proposed solutions to prevent bias in the literatu
De Groot [-@degroot_methodology_1969] already pointed out the importance to "work out in advance the investigative procedure (or experimental design) on paper to the fullest possible extent" which should include "a statement of the confirmation criteria, including formulation of null hypotheses, if any, choice of statistical test(s), significance level and resulting confirmation intervals" and "for each of the details mentioned, a brief note on their rationale, i.e., a justification of the investigator's particular choices."

The rise of the internet has made it possible to create online [registries](https://en.wikipedia.org/wiki/List_of_clinical_trial_registries) that allow researchers to specify their study design, the sample plan, and statistical analysis plan before the data is collected. A time-stamp, and sometimes even a dedicated Digital Object Identifier (DOI) transparently communicates to peers that the research question and analysis plan were specified before looking at the data. This is important, because you can’t *test* a hypothesis on the data that is used to generate it. If you come up with a hypothesis by looking at data, the hypothesis might be true, but nothing has been done to severely test the hypothesis yet. When exploring data, you can perform a hypothesis test, but you cannot *test* a hypothesis.
The rise of the internet has made it possible to create online [registries](https://en.wikipedia.org/wiki/List_of_clinical_trial_registries) that allow researchers to specify their study design, the sample plan, and statistical analysis plan before the data is collected. A time-stamp, and sometimes even a dedicated Digital Object Identifier (DOI) transparently communicates to peers that the research question and analysis plan were specified before looking at the data. Some tools go even further, such as [OpenSafely](https://www.opensafely.org/about/#transparency-and-public-logs) which logs all analyses that are performed, and all changes to the analysis code. This is important, because you can’t *test* a hypothesis on the data that is used to generate it. If you come up with a hypothesis by looking at data, the hypothesis might be true, but nothing has been done to severely test the hypothesis yet. When exploring data, you can perform a hypothesis test, but you cannot *test* a hypothesis.

In some fields, such as medicine, it is now required to register certain studies, such as clinical trials. For example, the [International Committee of Journal Editors](https://www.icmje.org/icmje-recommendations.pdf) writes:

Expand Down Expand Up @@ -150,7 +150,7 @@ For example, in the verbal description of a statistical hypothesis in the previo

In @lakens_improving_2020a we discuss how a good way to remove ambiguity in a hypothesis test described in a preregistration document is to make sure it is [machine readable](https://en.wikipedia.org/wiki/Machine-readable_document). Machines are notoriously bad at dealing with ambiguous descriptions, so if the hypothesis is understandable for a machine, it will be clearly specified. A *hypothesis* is tested in an *analysis* that takes *data* as input and returns test *results*. Some of these test results will be compared to *criteria*, used in the *evaluation* of the test result. For example, imagine a *hypothesis* predicts that the mean in one group will be higher than the mean in another group. The *data* is *analyzed* with Welch's *t*-test, and if the *resulting* *p*-value is smaller than a specified *criterion* alpha (e.g., 0.01) the prediction is *evaluated* as being *corroborated*. Our prediction is *falsified* if we can reject effects deemed large enough to matter in an equivalence test, and the result is *inconclusive* otherwise. In a clear preregistration of a hypothesis test, all these components (the analysis, the way results will be compared to criteria, and how results will be evaluated in terms of corroborating of falsifying a prediction) will be clearly specified.

The most transparent way to specify the statistical hypothesis is in **analysis code**. The gold standard for a preregistration is to create a simulated dataset that looks like the data you plan to collect, and write an analysis script that can be run on the dataset you plan to collect. Simulating data might sound difficult, but there are [great packages](https://debruine.github.io/faux/) for this in R, and an increasing number of tutorials. Since you will need to perform the analyses anyway, doing so before you collect the data helps you to carefully think through your experiment. By preregistering the analysis code, you make sure all steps in the data analysis are clear, including assumption checks, exclusion of outliers, and the exact analysis you plan to run (including any parameters that need to be specified for the test).
The most transparent way to specify the statistical hypothesis is in **analysis code**. The gold standard for a preregistration is to create a simulated dataset that looks like the data you plan to collect, and write an analysis script that can be run on the dataset you plan to collect. Simulating data might sound difficult, but there are [great packages](https://debruine.github.io/faux/) for this in R, and an increasing number of tutorials. Since you will need to perform the analyses anyway, doing so before you collect the data helps you to carefully think through your experiment. By preregistering the analysis code, you make sure all steps in the data analysis are clear, including assumption checks, exclusion of outliers, and the exact analysis you plan to run (including any parameters that need to be specified for the test). For some examples, see https://osf.io/un3zx, https://osf.io/c4t28, and section 25 of https://osf.io/gjsft/.

In addition to sharing the analysis code, you will need to specify how you will **evaluate** the test result when the analysis code is run on the data you will collect. This is often not made explicit in preregistrations, but it is an essential part of a hypothesis test, especially when there are multiple primary hypotheses, such as in our prediction that "Researchers who have read this text will become better at controlling their alpha level *and* more clearly specify what would corroborate or falsify their prediction". If our hypothesis really predicts that both of these outcomes should occur, then the evaluation of our hypothesis should specify that the prediction is falsified if only one of these two effects occurs.

Expand Down
31 changes: 31 additions & 0 deletions 17-replication.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@
library(metafor)
library(ggplot2)
library(pwr)
library(cocor)
library(pwrss)
```

In 2015 a team of 270 authors published the results of a research project where they replicated 100 studies [@opensciencecollaboration_estimating_2015]. The original studies had all been published in three psychology journals in the year 2008. The authors of the replication project selected the last study of papers that could feasibly be replicated, performed a study with high power to detect the observed effect size, and attempted to design the best possible replication study. They stayed close to the original study where possible, but deviated where this was deemed necessary. For the original studies published in 2008 97 of the 100 studies were interpreted as significant. Given an estimated 92% power for the effect sizes observed in the original studies 97 $\times$ 0.92 = 89 of the replication studies could be expected to observe a significant effect, if the effects in the original studies were at least as large as reported. Yet, only 35 out of the 97 original studies that were significant replicated, for a replication rate of 36%. This result was a surprise for most researchers, and led to the realization that it is much more difficult to replicate findings than one might intuitively think. This result solidified the idea of a *replication crisis*, a sudden loss of confidence in the reliability of published results, which led to confusion and uncertainty about how scientists worked. Since 2015 the field of **metascience** has emerged to use empirical methods to study science itself and identify some of the causes of low replicability rates, and develop possible solutions to increase it.
Expand Down Expand Up @@ -136,6 +138,35 @@ metafor::forest(res_h,
```

It is also possible to test the difference between correlations, where the formula is:

$$
Z = \frac{ln(1+r)-ln(1-r)}{2}
$$

This test can for example be performed in the cocor package in R:

```{r}
library(cocor)
cocor.indep.groups(n1 = 30, r1.jk = .4, n2 = 200, r2.hm = .01)
```

We see that the difference between an original study with 30 participants that observed an effect of *r* = 0.4 and a replication study that observed an effect of *r* = 0.01 is just statistically significant. We can use the pwrss package [@bulus_pwrss_2023] to examine the sample size we would need to achieve sufficient power for such difference between correlations with 90% power.

```{r}
library(pwrss)
pwrss.z.2corrs(r1 = 0.4, r2 = 0.01,
power = .90, alpha = 0.05,
alternative = "not equal")
```

The same calculation can be performed in G\*Power.

```{r fig-powerdifcor, echo=FALSE}
#| fig-cap: "Power analysis for the difference between two independent correlations in G\\*Power."
knitr::include_graphics("images/powerdifcor.png")
```

Although a statistical difference between effect sizes is one coherent approach to decide whether a study has been replicated, researchers are sometimes interested in a different question: Was there a significant result in the replication study? In this approach to analyzing replication studies there is no direct comparison with the effect observed in the original study. The question is therefore not so much 'is the original effect replicated?' but 'if we repeat the original study is a statistically significant effect observed?'. In other words, we are not testing whether an effect has been replicated, but whether a predicted effect has been observed. Another way of saying this is that we are not asking whether the observed effect replicated, but whether the original ordinal claim of the presence of an non-zero effect is replicated. In the example in @fig-rep-1 the replication study has an effect size of 0, so there is no statistically significant effect, and the original effect did not replicate (in the sense that repeated the procedure did not yield a significant result).

Let's take a step back, and consider which statistical test best reflects the question 'did this study replicate'. On the one hand it seems reasonable to consider an effect 'not replicated' if there is a statistically significant difference between the effect sizes. However, this can mean that a non-significant effect in the replication studies leads to a 'replication' just because the effect size estimate is not statistically smaller than the original effect size. On the other hand, it seems reasonable to consider an effect replicated if it is statistically significant in the replication study. These two approaches can lead to a conflict, however. Some statistically significant effects are statistically smaller than the original study. Should these be considered 'replicated' or not? We might want to combine both statistical tests, and consider a study a replication if the effect is both statistically different from 0 (i.e., *p* < .05 in a traditional significance test), and the difference in effect sizes is not statistically different from 0 (i.e., *p* > .05 for a test of heterogeneity in a meta-analysis of both effect sizes). We can perform both tests, and only consider a finding replicated if both conditions are met. Logically, a finding should then be considered as a non-replication if the opposite is true (i.e., *p* > .05 for significance test of the replication study, and *p* < .05 for the test of heterogeneity).
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added 17-replication_files/figure-epub/fig-rep-1-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Binary file not shown.
Binary file added 17-replication_files/figure-pdf/fig-rep-1-1.pdf
Binary file not shown.
2 changes: 1 addition & 1 deletion _quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ book:
repo-branch: master
repo-actions: [edit, issue, source]
downloads: [pdf, epub]
sharing: [twitter]
# sharing: [twitter]
# sidebar:
# style: docked
# background: light
Expand Down
1 change: 0 additions & 1 deletion docs/01-pvalue.html
Original file line number Diff line number Diff line change
Expand Up @@ -194,7 +194,6 @@
</li>
</ul>
</div>
<a href="https://twitter.com/intent/tweet?url=%7Curl%7C" title="Twitter" class="quarto-navigation-tool px-1" aria-label="Twitter"><i class="bi bi-twitter"></i></a>
<a href="" class="quarto-color-scheme-toggle quarto-navigation-tool px-1" onclick="window.quartoToggleColorScheme(); return false;" title="Toggle dark mode"><i class="bi"></i></a>
</div>
</div>
Expand Down
1 change: 0 additions & 1 deletion docs/02-errorcontrol.html
Original file line number Diff line number Diff line change
Expand Up @@ -193,7 +193,6 @@
</li>
</ul>
</div>
<a href="https://twitter.com/intent/tweet?url=%7Curl%7C" title="Twitter" class="quarto-navigation-tool px-1" aria-label="Twitter"><i class="bi bi-twitter"></i></a>
<a href="" class="quarto-color-scheme-toggle quarto-navigation-tool px-1" onclick="window.quartoToggleColorScheme(); return false;" title="Toggle dark mode"><i class="bi"></i></a>
</div>
</div>
Expand Down
1 change: 0 additions & 1 deletion docs/03-likelihoods.html
Original file line number Diff line number Diff line change
Expand Up @@ -193,7 +193,6 @@
</li>
</ul>
</div>
<a href="https://twitter.com/intent/tweet?url=%7Curl%7C" title="Twitter" class="quarto-navigation-tool px-1" aria-label="Twitter"><i class="bi bi-twitter"></i></a>
<a href="" class="quarto-color-scheme-toggle quarto-navigation-tool px-1" onclick="window.quartoToggleColorScheme(); return false;" title="Toggle dark mode"><i class="bi"></i></a>
</div>
</div>
Expand Down
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
1 change: 0 additions & 1 deletion docs/04-bayes.html
Original file line number Diff line number Diff line change
Expand Up @@ -193,7 +193,6 @@
</li>
</ul>
</div>
<a href="https://twitter.com/intent/tweet?url=%7Curl%7C" title="Twitter" class="quarto-navigation-tool px-1" aria-label="Twitter"><i class="bi bi-twitter"></i></a>
<a href="" class="quarto-color-scheme-toggle quarto-navigation-tool px-1" onclick="window.quartoToggleColorScheme(); return false;" title="Toggle dark mode"><i class="bi"></i></a>
</div>
</div>
Expand Down
1 change: 0 additions & 1 deletion docs/05-questions.html
Original file line number Diff line number Diff line change
Expand Up @@ -159,7 +159,6 @@
</li>
</ul>
</div>
<a href="https://twitter.com/intent/tweet?url=%7Curl%7C" title="Twitter" class="quarto-navigation-tool px-1" aria-label="Twitter"><i class="bi bi-twitter"></i></a>
<a href="" class="quarto-color-scheme-toggle quarto-navigation-tool px-1" onclick="window.quartoToggleColorScheme(); return false;" title="Toggle dark mode"><i class="bi"></i></a>
</div>
</div>
Expand Down
1 change: 0 additions & 1 deletion docs/06-effectsize.html
Original file line number Diff line number Diff line change
Expand Up @@ -159,7 +159,6 @@
</li>
</ul>
</div>
<a href="https://twitter.com/intent/tweet?url=%7Curl%7C" title="Twitter" class="quarto-navigation-tool px-1" aria-label="Twitter"><i class="bi bi-twitter"></i></a>
<a href="" class="quarto-color-scheme-toggle quarto-navigation-tool px-1" onclick="window.quartoToggleColorScheme(); return false;" title="Toggle dark mode"><i class="bi"></i></a>
</div>
</div>
Expand Down
1 change: 0 additions & 1 deletion docs/07-CI.html
Original file line number Diff line number Diff line change
Expand Up @@ -193,7 +193,6 @@
</li>
</ul>
</div>
<a href="https://twitter.com/intent/tweet?url=%7Curl%7C" title="Twitter" class="quarto-navigation-tool px-1" aria-label="Twitter"><i class="bi bi-twitter"></i></a>
<a href="" class="quarto-color-scheme-toggle quarto-navigation-tool px-1" onclick="window.quartoToggleColorScheme(); return false;" title="Toggle dark mode"><i class="bi"></i></a>
</div>
</div>
Expand Down
1 change: 0 additions & 1 deletion docs/08-samplesizejustification.html
Original file line number Diff line number Diff line change
Expand Up @@ -193,7 +193,6 @@
</li>
</ul>
</div>
<a href="https://twitter.com/intent/tweet?url=%7Curl%7C" title="Twitter" class="quarto-navigation-tool px-1" aria-label="Twitter"><i class="bi bi-twitter"></i></a>
<a href="" class="quarto-color-scheme-toggle quarto-navigation-tool px-1" onclick="window.quartoToggleColorScheme(); return false;" title="Toggle dark mode"><i class="bi"></i></a>
</div>
</div>
Expand Down
1 change: 0 additions & 1 deletion docs/09-equivalencetest.html
Original file line number Diff line number Diff line change
Expand Up @@ -193,7 +193,6 @@
</li>
</ul>
</div>
<a href="https://twitter.com/intent/tweet?url=%7Curl%7C" title="Twitter" class="quarto-navigation-tool px-1" aria-label="Twitter"><i class="bi bi-twitter"></i></a>
<a href="" class="quarto-color-scheme-toggle quarto-navigation-tool px-1" onclick="window.quartoToggleColorScheme(); return false;" title="Toggle dark mode"><i class="bi"></i></a>
</div>
</div>
Expand Down
1 change: 0 additions & 1 deletion docs/10-sequential.html
Original file line number Diff line number Diff line change
Expand Up @@ -193,7 +193,6 @@
</li>
</ul>
</div>
<a href="https://twitter.com/intent/tweet?url=%7Curl%7C" title="Twitter" class="quarto-navigation-tool px-1" aria-label="Twitter"><i class="bi bi-twitter"></i></a>
<a href="" class="quarto-color-scheme-toggle quarto-navigation-tool px-1" onclick="window.quartoToggleColorScheme(); return false;" title="Toggle dark mode"><i class="bi"></i></a>
</div>
</div>
Expand Down
1 change: 0 additions & 1 deletion docs/11-meta.html
Original file line number Diff line number Diff line change
Expand Up @@ -194,7 +194,6 @@
</li>
</ul>
</div>
<a href="https://twitter.com/intent/tweet?url=%7Curl%7C" title="Twitter" class="quarto-navigation-tool px-1" aria-label="Twitter"><i class="bi bi-twitter"></i></a>
<a href="" class="quarto-color-scheme-toggle quarto-navigation-tool px-1" onclick="window.quartoToggleColorScheme(); return false;" title="Toggle dark mode"><i class="bi"></i></a>
</div>
</div>
Expand Down
1 change: 0 additions & 1 deletion docs/12-bias.html
Original file line number Diff line number Diff line change
Expand Up @@ -193,7 +193,6 @@
</li>
</ul>
</div>
<a href="https://twitter.com/intent/tweet?url=%7Curl%7C" title="Twitter" class="quarto-navigation-tool px-1" aria-label="Twitter"><i class="bi bi-twitter"></i></a>
<a href="" class="quarto-color-scheme-toggle quarto-navigation-tool px-1" onclick="window.quartoToggleColorScheme(); return false;" title="Toggle dark mode"><i class="bi"></i></a>
</div>
</div>
Expand Down
Loading

0 comments on commit 8e85109

Please sign in to comment.