-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
161 lines (127 loc) · 6.75 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "README-"
)
```
# msnz
The goal of the `msnz` package is to provide functions to simplify data
analysis from the New Zealand Multiple Sclerosis Prevalence and related studies,
for researchers associated with the New Zealand Brain Research Institute
(NZBRI).
The package also provides a function to estimate life expectancy from the New
Zealand cohort life tables, which can be of use more widely. These individual
estimates are either:
- Deterministic - the function simply looks up the expected year of death for
an individual of a given sex, year of birth, and conditional age, or
- Probabilistic - the function runs an individual-level random simulation for a
person of given characteristics. This means that the generated sample will have
a realistic level of variability across individuals, rather than people with the
same demographics having an identical expected year of death.
## Installation
This package is not available via CRAN and is hosted in a repository on Github
at https://github.com/nzbri/msnz
Therefore, the usual installation route using `install.packages('msnz')` is
not possible, and will yield a not-useful error message that the package is not
available for your version of R. Instead, install `msnz` from its development
repository on Github as follows:
```{r gh-installation, eval = FALSE}
# install.packages('remotes')
remotes::install_github('nzbri/msnz')
```
If `install_github('nzbri/msnz')` is invoked subsequently, the package will be
downloaded and installed only if the version on Github is newer than the one
installed locally.
If problems arise in a new release, you can downgrade to a previous version by
specifying the name of a particular release to revert to, e.g.
```{r gh-revert, eval = FALSE}
remotes::install_github('nzbri/msnz@v0.3.0')
```
## Functions specific to the MS prevalence study
The package provides three convenience functions to extract info from the unique
identifier (UIN) assigned to each participant (see Richardson _et.al._ (2012)
Method for identifying eligible individuals for a prevalence survey in the
absence of a disease register or population register.
_Internal Medicine Journal,_ ***42,*** 1207-1212):
```{r uni-functions, eval = TRUE}
library(msnz)
msnz::dob_from_uin('01011970FAB')
msnz::sex_from_uin('01011970FAB')
msnz::initials_from_uin('01011970FAB')
```
It also provides two package-wide constants, giving the original census date of
the New Zealand MS Prevalence study, and the censoring date for survival
analyses, set to 15 years later:
```{r date-constants, eval = TRUE}
msnz::census_date
msnz::censoring_date
```
## Calculating New Zealand life expectancy
This function is the only one of use more generally to external researchers. The
package incorporates the New Zealand Cohort Life Tables provided by
Statistics New Zealand, as released in March 2021. (See https://www.stats.govt.nz/information-releases/new-zealand-cohort-life-tables-march-2022-update
for source data).
The `msnz::expected_year_of_death()` function allows one to calculate the life
expectancy of a New Zealander, given their year of birth, sex, and some
conditional age (see below for explanation).
```{r example-1, eval = TRUE}
msnz::expected_year_of_death(year_of_birth = 1970,
sex = 'female',
conditional_age = 50)
# calculate life expectancy (in years from the conditional age year):
year_of_birth = 1970
conditional_age = 50
expected_year_of_death(year_of_birth = year_of_birth,
sex = 'female',
conditional_age = conditional_age) -
(year_of_birth + conditional_age) # = 2020
```
It is important to specify a conditional age that the person has reached. If
`0`, then the returned value will reflect life expectancy at birth. For example,
the expected year of death of a person is greater if it is known that they have
managed to reach a given age, compared to their life expectancy at birth (when
they would be subject to child mortality and early adulthood risks):
```{r example-2, eval = TRUE}
# life expectancy at birth, which is subject to infant mortality and elevated
# risks in teenage years/early adulthood (e.g. traffic accidents and suicide):
msnz::expected_year_of_death(year_of_birth = 1970, 'female',
conditional_age = 0)
# given that we know that a person has lived to a certain age, their expected
# year of death should be greater, as they have survived through a period of
# mortality risks:
msnz::expected_year_of_death(year_of_birth = 1970, 'female',
conditional_age = 50)
```
The values returned can either be deterministic (extracted directly from the
life table) or the result of a random simulation process (to better approximate
the distribution of values that would be seen in an actual population). The
simulation approach is particularly useful in a survival analysis where one
wishes to compare the survival in a given sample against a synthetic sample
derived from the population. That is, for each person in the actual sample, we
generate a synthetic comparison person, randomly simulated from the population,
conditional on the target person's year of birth, sex, and age at a censoring
date. The simulation approach gives a much more natural-looking population
comparison survival distribution, compared to each synthetic person having
precisely the mean life expectancy conditional on those values.
The simulation process is invoked by specifying the parameter
`method = 'sample'`, rather than the default value `method = 'median'`, which
simply returns the tabulated median value from the cohort life tables. Using the
`'sample'` method runs a simulation for a person of the given year of birth,
sex, and conditional age (e.g. the age they reached at the census date). The
simulation proceeds by iterating up from that conditional age, one year at a
time. At each year of age, a random number is generated (uniformly distributed
between 0.0 and 1.0). If that number is less than the life table probability of
such a person surviving to the next year, the age is incremented and the
simulation continues for that individual. If the random number exceeds that
probability, then the current age is assigned as the synthetic person’s age of
death.
For fuller details on other function parameters, look up the help via
`?expected_year_of_death`. These include control over what estimate is returned
(i.e. the median or the 5th or 95th percentile), the number of simulated samples
returned per individual, and a seed to allow for reproducibility of the randomly
simulated values.