This repository contins all necessary R scripts to implement the Approximate EM algorithm on joint models of survival and multivariate joint data as they appear in Murray & Philipson (2022). Note that the work presented here has been succeeded by an R package on a broader class of generalised linear mixed models with more flexible random effects structure, gmvjoint.
This README briefly sets out file structure. The working directory needs to be set to ~/.../ApproximateEM
in order for the structure set out here to work without a hitch.
Necessary packages are
dplyr
nlme
Rcpp
andRcppArmadillo
survival
ucminf
statmod
The file EM.R
fits a multivariate model with K longitudinal responses via approximate EM algorithm. The user can determine the number of responses via the nK
argument in the call to em()
.
Necessary arguments here are:
data
: a dataset that matches the required data format detailed in the next section. Currently only completely balanced data is fit by the algorithm;ph
: a Cox proportional-hazards model fit with a continuouscont
and binarybin
variable which are additionally included in the longitudinal model and data;nK
: The number of longitudinal responses to fit. (if nK=3 then variables named Y.1 -> Y.3 are fit in a trivariate joint model. Optional arguments are:gh.nodes
: The number of weights and abscissae to use in Gauss-Hermite quadrature (defaults to three);collect.hist
: Should parameter estimates at each iteration be stored? (defaults toTRUE
);max.iter
: How many iterations to attempt before exiting and returning the max.iter'th as final parameter estimates (defaults to 200);tol
: Tolerance that difference between iterations across all parameters must be less than (defaults to 0.01);diff.type
: Should relative difference (the defualt, 'abs.rel') be used, or absolute ('abs') be used to monitor convergence?post.process
: Should posterior estimates of random effects be calculated at final parameter estimates and standard errors calculated? (defaults toTRUE
);verbose
: IfTRUE
then prints all parameter estimates to the console at each iteration (defaults toFALSE
).
REs
: A matrix of the random effects. Ifpost.process
isTRUE
then these are calculated post-hoc at final parameter values;coeffs
: A list of parameter estimates;EMtime
: The time taken for the approximate EM algorithm to converge (s);mvlme.time
: The time taken for the MVLME step to obtain initial conditions to converge (s);comp.time
: The time taken for theem()
call to complete (i.e. EM, MVLME, data functions, all initial conditions and post-processing;history
: ifcollect.hist
isTRUE
then a matrix containing parameter estimates at each iteration.SEs
: ifpost.process
isTRUE
then a vector of approximate standard errors calculated using the observed empirical information matrix.postprocess.time
: ifpost.process
isTRUE
then the time taken for RE calculation and standard error calculation in post processing step.
Each longitudinal sub-model is fit with the following formulation:
To that end, the algorithm requires data with a column for subject ID called 'id'; time variable 'time'; continuous variable 'cont' and binary 'bin'. The K longitudinal outcomes should be named 'Y.1, ..., Y.K'. Currently implementation is predicated upon a balanced design only.
Simulations are carried out using the function simData
R
package joineRML
(Hickey et al., 2018). This function generates data in the format as explained above.
The TrivariateSimulations.R
file generates simulated data using this function under three covariates. These are then fit using the EM.R
file in the parent folder. The coresponding simulation pro for five (K5-
) and ten-covariate (K10-sims.R
) simulate data for the number of covariates using true values from univariate lme4
models fit to the ADNI data (see Supplementary Materials for further detail). These five and ten-variate version are then fit using the em()
function with argument nK
set to 5 or 10.