Skip to content

Commit

Permalink
render design doc from Rmd to check correctness
Browse files Browse the repository at this point in the history
  • Loading branch information
sbfnk committed Nov 22, 2024
1 parent ceb13e7 commit 9a8ef8e
Show file tree
Hide file tree
Showing 2 changed files with 119 additions and 17 deletions.
63 changes: 63 additions & 0 deletions inst/dev/accumulation.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
---
output: github_document
---

```{r setup, echo = FALSE}
library("knitr")
```

# Supporting missing data

We want to support reporting patterns that include *accumulation* (i.e. batch reporting of data from multiple dates, for example weekly) and *missingness* (dates which are lacking reports) in incidence and prevalence data.

## Proposed interface

### Missing data

Any dates between the minimum date and maximum date in the data that is either absent, or present with an `NA` value (currently called `confirm`) is interpreted as missing and ignored in the likelihood.
All other data points are used in the likelihood.
This matches the current default behaviour, introduced in version 1.5.0.

### Accumulation

If instead modelled values on these days should be accumulated onto the next reporting date, the passed `data.frame` must have an additional logical column, `accumulate`.
If `accumulate` is TRUE then the modelled value of the observed latent variable on that day is added to any existing accumulation and stored for later observation.
If `accumulate` is FALSE the modelled value is added to any stored accumulated variables before potentially being used in the likelihood on that day (if not `NA`).
Subsequently the stored accumulated variable is reset to zero.

### Example

```{r results = "asis"}
df <- data.frame(
date = as.Date(c("2024-10-23", "2024-10-24", "2024-10-26", "2024-10-27", "2024-10-28")),
confirm = c(NA, 10, NA, NA, 17),
accumulate = c(TRUE, FALSE, TRUE, TRUE, FALSE)
)
df |>
kable(align = "l")
```

The likelihood is evaluated on two days, 24 October and 28 October.
On 24 October the data (10) is compared to (modelled value on 23 October) + (modelled value on 24 October).
On 28 October the data (17) is compared to (modelled value on 26 October) + (modelled value on 27 October) + (modelled value on 28 October).

## Helper functions

A helper function, `fill_missing()` can be used to convert weekly data to the required format, i.e.

```{r, results = "asis"}
df <- data.frame(
date = as.Date(c("2024-10-24", "2024-10-31", "2024-11-07")),
confirm = c(10, 17, 11)
)
df |>
kable(align = "l")
```

can be converted with `fill_missing(missing_dates = "accumulate", initial_accumulate = 7)` to

```{r, results = "asis"}
df |>
fill_missing(missing_dates = "accumulate", initial_accumulate = 7) |>
kable(align = "l")
```
73 changes: 56 additions & 17 deletions inst/dev/accumulation.md
Original file line number Diff line number Diff line change
@@ -1,50 +1,89 @@

# Supporting missing data

We want to support reporting patterns that include *accumulation* (i.e. batch reporting of data from multiple dates, for example weekly) and *missingness* (dates which are lacking reports) in incidence and prevalence data.
We want to support reporting patterns that include *accumulation*
(i.e. batch reporting of data from multiple dates, for example weekly)
and *missingness* (dates which are lacking reports) in incidence and
prevalence data.

## Proposed interface

### Missing data

Any dates between the minimum date and maximum date in the data that is either absent, or present with an `NA` value (currently called `confirm`) is interpreted as missing and ignored in the likelihood.
All other data points are used in the likelihood.
This matches the current default behaviour, introduced in version 1.5.0.
Any dates between the minimum date and maximum date in the data that is
either absent, or present with an `NA` value (currently called
`confirm`) is interpreted as missing and ignored in the likelihood. All
other data points are used in the likelihood. This matches the current
default behaviour, introduced in version 1.5.0.

### Accumulation

If instead modelled values on these days should be accumulated onto the next reporting date, the passed `data.frame` must have an additional logical column, `accumulate`.
If `accumulate` is TRUE then the modelled value of the observed latent variable on that day is added to any existing accumulation and stored for later observation.
If `accumulate` is FALSE the modelled value is added to any stored accumulated variables before potentially being used in the likelihood on that day (if not `NA`).
Subsequently the stored accumulated variable is reset to zero.
If instead modelled values on these days should be accumulated onto the
next reporting date, the passed `data.frame` must have an additional
logical column, `accumulate`. If `accumulate` is TRUE then the modelled
value of the observed latent variable on that day is added to any
existing accumulation and stored for later observation. If `accumulate`
is FALSE the modelled value is added to any stored accumulated variables
before potentially being used in the likelihood on that day (if not
`NA`). Subsequently the stored accumulated variable is reset to zero.

### Example

``` r
df <- data.frame(
date = as.Date(c("2024-10-23", "2024-10-24", "2024-10-26", "2024-10-27", "2024-10-28")),
confirm = c(NA, 10, NA, NA, 17),
accumulate = c(TRUE, FALSE, TRUE, TRUE, FALSE)
)
df |>
kable(align = "l")
```

| date | confirm | accumulate |
|------------|---------|------------|
|:-----------|:--------|:-----------|
| 2024-10-23 | NA | TRUE |
| 2024-10-24 | 10 | FALSE |
| 2024-10-26 | NA | TRUE |
| 2024-10-27 | NA | TRUE |
| 2024-10-28 | 17 | FALSE |

The likelihood is evaluated on two days, 24 October and 28 October.
On 24 October the data (10) is compared to (modelled value on 23 October) + (modelled value on 24 October).
On 28 October the data (17) is compared to (modelled value on 26 October) + (modelled value on 27 October) + (modelled value on 28 October).
The likelihood is evaluated on two days, 24 October and 28 October. On
24 October the data (10) is compared to (modelled value on 23 October) +
(modelled value on 24 October). On 28 October the data (17) is compared
to (modelled value on 26 October) + (modelled value on 27 October) +
(modelled value on 28 October).

## Helper functions

A helper function, `fill_missing_dates()` can be used to convert weekly data to the required format, i.e.
A helper function, `fill_missing()` can be used to convert weekly data
to the required format, i.e.

``` r
df <- data.frame(
date = as.Date(c("2024-10-24", "2024-10-31", "2024-11-07")),
confirm = c(10, 17, 11)
)
df |>
kable(align = "l")
```

| date | confirm |
|------------|---------|
|:-----------|:--------|
| 2024-10-24 | 10 |
| 2024-10-31 | 17 |
| 2024-11-07 | 11 |

can be converted with `fill_missing_dates(missing = "accumulate", initial = 7)` to
can be converted with
`fill_missing(missing_dates = "accumulate", initial_accumulate = 7)` to

``` r
df |>
fill_missing(missing_dates = "accumulate", initial_accumulate = 7) |>
kable(align = "l" )
```

| date | confirm | accumulate |
|------------|---------|------------|
|:-----------|:--------|:-----------|
| 2024-10-18 | NA | TRUE |
| 2024-10-19 | NA | TRUE |
| 2024-10-20 | NA | TRUE |
Expand All @@ -65,4 +104,4 @@ can be converted with `fill_missing_dates(missing = "accumulate", initial = 7)`
| 2024-11-04 | NA | TRUE |
| 2024-11-05 | NA | TRUE |
| 2024-11-06 | NA | TRUE |
| 2024-11-07 | 11 | TRUE |
| 2024-11-07 | 11 | FALSE |

0 comments on commit 9a8ef8e

Please sign in to comment.