estRodis is an R package for statistical analyses in the field of infectious disease dynamics. More precisely, for estimating the effective reproduction number
At the heart of the R-package estRodis are functions to simulate the size distribution of identical sequence clusters and to estimate parameters related to transmission dynamics from the sequence cluster size distribution. The simulation is based on a mathematical model of the size distribution of identical sequence clusters that takes into account the viral transmission process, the mutation of the virus and incomplete case-detection. We described the probability that a case was detected as the product of the probability that a case was confirmed by a test and the probability that the viral genome of a confirmed case is sequenced. There are six Bayesian inference models to estimate the effective reproduction number, the dispersion parameter, the probability a mutation occurs at a case before onward transmission or the yearly mutation rate of the virus and the testing probability to choose from.
All models use weakly informative prior distributions for the effective reproduction number
model one
informative prior distribution for the yearly mutation rate (normal distribution with mean 14 and variance 0.25) and weakly informative prior distribution for the probability that a case is confirmed by a test (scaled beta distribution on the interval [0.05, 1] with parameters (1,3)), see below for more details on the scaled beta distribution)model two
informative prior distribution for the yearly mutation rate (normal distribution with mean 14 and variance 0.25) and constant value for the probability that a case is confirmed by a testmodel three
constant value for the probability a mutation occurs at a case before onward transmission and weakly informative prior distribution for the probability that a case is confirmed by a test (scaled beta distribution on the interval [0.05, 1] with parameters (1,3)), see below for more details on the scaled beta distribution)model four
constant value for the probability a mutation occurs at a case before onward transmission and constant value for the probability that a case is confirmed by a testmodel five
informative prior distribution for the probability a mutation occurs at a case before onward transmission (beta distribution with parameters (27,68)) and weakly informative prior distribution for the probability that a case is confirmed by a test (scaled beta distribution on the interval [0.05, 1] with parameters (1,3)), see below for more details on the scaled beta distribution)model six
informative prior distribution for the probability a mutation occurs at a case before onward transmission (beta distribution with parameters (27,68)) and constant value for the probability that a case is confirmed by a test
The code used for the statistical analysis of the paper "Estimating
Whenever "model five" is mentioned in comments in the code, this refers to the standard model developed in the paper. model one", "model two", "model three", "model four" and "model six" refer to the alternative models described in the section "Sensitivity analysis" of the supplementary material of the paper "Estimating
R code to install the latest stable release of the estRodis package:
devtools::install_github("mwohlfender/estRodis@v1.0.0", ref = "main", force = TRUE)
R code to install the newest development version of the estRodis package:
devtools::install_github("mwohlfender/estRodis", ref = "main", force = TRUE)
In the following we present code examples how to use the main functionalities of the R package estRodis.
We simulate 1000 identical sequence clusters.
simulated_clusters <- estRodis_simulate_cluster_sizes_v2(n_clusters = 1000,
max_cluster_size = 2500,
R = 0.8,
k = 0.3,
mutation_proba = 0.2,
testing_proba = 0.6,
sequencing_proba = 0.4)
Result:
size | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 13 | 14 | 15 | 18 | 19 | 21 | 25 | 28 | 31 | |
frequency | 742 | 104 | 49 | 31 | 22 | 13 | 9 | 6 | 2 | 4 | 2 | 6 | 1 | 2 | 2 | 1 | 1 | 1 | 1 | 1 | |
percentage | 0.742 | 0.104 | 0.049 | 0.031 | 0.022 | 0.013 | 0.009 | 0.006 | 0.002 | 0.004 | 0.002 | 0.006 | 0.001 | 0.002 | 0.002 | 0.001 | 0.001 | 0.001 | 0.001 | 0.001 |
We apply model five to estimate the effective reproduction number, the dispersion parameter, the mutation probability and the testing probability from the clusters we simulated in section (D1).
options(mc.cores = parallelly::availableCores())
estRodis_estimate_parameters_five(clusters_size = simulated_clusters$size,
clusters_freq = simulated_clusters$frequency,
sequencing_proba = 0.44)
Result:
Inference for Stan model: estRodis_stan_model_estimate_parameters_five.
4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
R 0.89 0.00 0.09 0.74 0.82 0.88 0.94 1.08 1144 1
k 0.29 0.00 0.07 0.16 0.24 0.28 0.33 0.44 1905 1
mutation_proba 0.29 0.00 0.05 0.20 0.25 0.29 0.32 0.38 1559 1
testing_proba 0.65 0.01 0.23 0.20 0.49 0.68 0.84 0.98 1204 1
detection_proba 0.29 0.00 0.10 0.09 0.21 0.30 0.37 0.43 1204 1
lp__ -1165.95 0.04 1.48 -1169.70 -1166.68 -1165.62 -1164.86 -1164.08 1199 1
Samples were drawn using NUTS(diag_e) at Tue Oct 1 15:42:56 2024.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
We apply model six to estimate the effective reproduction number, the dispersion parameter, the mutation probability and the testing probability from the clusters we simulated in section (D1).
options(mc.cores = parallelly::availableCores())
estRodis_estimate_parameters_six(clusters_size = simulated_clusters$size,
clusters_freq = simulated_clusters$frequency,
testing_proba = 0.55,
sequencing_proba = 0.44)
Result:
Inference for Stan model: estRodis_stan_model_estimate_parameters_six.
4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
R 0.90 0.00 0.06 0.79 0.86 0.90 0.94 1.04 1439 1
k 0.28 0.00 0.06 0.18 0.23 0.27 0.31 0.42 1748 1
mutation_proba 0.29 0.00 0.04 0.20 0.26 0.29 0.32 0.38 1467 1
lp__ -1163.88 0.03 1.19 -1166.78 -1164.44 -1163.58 -1163.00 -1162.51 1345 1
Samples were drawn using NUTS(diag_e) at Tue Oct 1 15:47:19 2024.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
The R package estRodis contains 18 functions. Short introductions of all functions are provided below, more detailed information on input and output values can be found in the documentation of the respective function within the R package.
Simulate the size of identical sequence clusters implementing the mutation process via the yearly mutation rate and the mean generation interval.
Simulate the size of identical sequence clusters implementing the mutation process directly via the probability a mutation occurs at a case before onward transmission.
Simulate a transmission chain.
Create a plot of a transmission chain.
Create a plot of a transmission chain including mutations of the viral genome.
Apply model one, two, three, four, five or six to estimate parameters from the sequence cluster size distribution.
Determine initial values for model one, two, three, four, five or six.
Probability density function of a scaled version of the beta distribution.
This distribution is used as prior distribution of the testing probability in models one, three ad five.