The goal of the msnz
package is to provide functions to simplify data
analysis from the New Zealand Multiple Sclerosis Prevalence and related
studies, for researchers associated with the New Zealand Brain Research
Institute (NZBRI).
The package also provides a function to estimate life expectancy from the New Zealand cohort life tables, which can be of use more widely. These individual esteimates are either:
- Deterministic - the function simply looks up the expected year of death for an individual of a given sex, year of birth, and conditional age, or
- Probabilistic - the function runs an individual-level random simulation for a person of given characteristics. This means that the generated sample will have a realistic level of variability across individuals, rather than people with the same demographics having an identical expected year of death.
This package is not available via CRAN and is hosted in a repository on Github at https://github.com/nzbri/msnz
Therefore, the usual installation route using install.packages('msnz')
is not possible, and will yield a not-useful error message that the
package is not available for your version of R. Instead, install msnz
from its development repository on Github as follows:
# install.packages('remotes')
remotes::install_github('nzbri/msnz')
If install_github('nzbri/msnz')
is invoked subsequently, the package
will be downloaded and installed only if the version on Github is newer
than the one installed locally.
If problems arise in a new release, you can downgrade to a previous version by specifying the name of a particular release to revert to, e.g.
remotes::install_github('nzbri/msnz@v0.3.0')
The package provides three convenience functions to extract info from the unique identifier (UIN) assigned to each participant (see Richardson et.al. (2012) Method for identifying eligible individuals for a prevalence survey in the absence of a disease register or population register. Internal Medicine Journal, 42, 1207-1212):
library(msnz)
msnz::dob_from_uin('01011970FAB')
#> [1] "1970-01-01"
msnz::sex_from_uin('01011970FAB')
#> [1] Female
#> Levels: Female Male
msnz::initials_from_uin('01011970FAB')
#> [1] "AB"
It also provides two package-wide constants, giving the original census date of the New Zealand MS Prevalence study, and the censoring date for survival analyses, set to 15 years later:
msnz::census_date
#> [1] "2006-03-07"
msnz::censoring_date
#> [1] "2021-03-07"
This function is the only one of use more generally to external researchers. The package incorporates the New Zealand Cohort Life Tables provided by Statistics New Zealand, as released in March 2021. (See https://www.stats.govt.nz/information-releases/new-zealand-cohort-life-tables-march-2022-update for source data).
The msnz::expected_year_of_death()
function allows one to calculate
the life expectancy of a New Zealander, given their year of birth, sex,
and some conditional age (see below for explanation).
msnz::expected_year_of_death(year_of_birth = 1970,
sex = 'female',
conditional_age = 50)
#> [1] 2058.5
# calculate life expectancy (in years from the conditional age year):
year_of_birth = 1970
conditional_age = 50
expected_year_of_death(year_of_birth = year_of_birth,
sex = 'female',
conditional_age = conditional_age) -
(year_of_birth + conditional_age) # = 2020
#> [1] 38.5
It is important to specify a conditional age that the person has
reached. If 0
, then the returned value will reflect life expectancy at
birth. For example, the expected year of death of a person is greater if
it is known that they have managed to reach a given age, compared to
their life expectancy at birth (when they would be subject to child
mortality and early adulthood risks):
# life expectancy at birth, which is subject to infant mortality and elevated
# risks in teenage years/early adulthood (e.g. traffic accidents and suicide):
msnz::expected_year_of_death(year_of_birth = 1970, 'female',
conditional_age = 0)
#> [1] 2055.2
# given that we know that a person has lived to a certain age, their expected
# year of death should be greater, as they have survived through a period of
# mortality risks:
msnz::expected_year_of_death(year_of_birth = 1970, 'female',
conditional_age = 50)
#> [1] 2058.5
The values returned can either be deterministic (extracted directly from the life table) or the result of a random simulation process (to better approximate the distribution of values that would be seen in an actual population). The simulation approach is particularly useful in a survival analysis where one wishes to compare the survival in a given sample against a synthetic sample derived from the population. That is, for each person in the actual sample, we generate a synthetic comparison person, randomly simulated from the population, conditional on the target person’s year of birth, sex, and age at a censoring date. The simulation approach gives a much more natural-looking population comparison survival distribution, compared to each synthetic person having precisely the mean survival conditional on those values.
The simulation process is invoked by specifying the parameter
method = 'sample'
, rather than the default value method = 'median'
,
which simply returns the tabulated median value from the cohort life
tables. Using the 'sample'
method runs a simulation for a person of
the given year of birth, sex, and conditional age (e.g. the age they
reached at the census date). The simulation proceeds by iterating up
from that conditional age age, one year at a time. At each year of age,
a random number is generated (uniformly distributed between 0.0 and
1.0). If that number is less than the life table probability of such a
person surviving to the next year, the age is incremented and the
simulation continues for that individual. If the random number exceeds
that probability, then the current age is assigned as the synthetic
person’s age of death.
For fuller details on other function parameters, look up the help via
?expected_year_of_death
. These include control over what estimate is
returned (i.e. the median or the 5th or 95th percentile), the number of
simulated samples returned per individual, and a seed to allow for
reproducibility of the randomly simulated values.