P2C2M2 provides functions to read default output from BEAST 2 (Bouckaert et al. 2019) and StarBEAST2 (Ogilvie et al. 2017) and conduct posterior predictive checks of coalescent models (Reid et al. 2014) based on the functions originally provided in the P2C2M package (Gruenstaeudl et al. 2016) to parse tree and xml files from BEAST and *BEAST. It also implements estimation of type-I (false-positive) error rates using pseudo-observed datasets (“pods”) sampled from the posterior predictive distribution.
You can install the development version of P2C2M2 from GitHub with:
# install.packages("devtools")
devtools::install_github("arleyc/P2C2M2")
In order to generate the tree files annotated with the required branch rate estimates, the XML file has to be modified by hand to include the following code before running BEAST 2:
#For strict clocks:
Insert the following:
substitutions=“true”
branchratemodel=“@StrictClock.c:Locus_Name”
after
id=“TreeWithMetaDataLogger.t:Locus_Name” and before
spec=“beast.base.evolution.TreeWithMetaDataLogger”
#For UCLN or RLC clocks:
Insert the following:
substitutions=“true”
after
id=“TreeWithMetaDataLogger.t:Locus_Name” and before
spec=“beast.base.evolution.TreeWithMetaDataLogger”
Basic example showing how to run the full pipeline with a single function using the provided dataset to test the multispecies coalescent model and to estimate false positive error rates per locus and across loci:
library(P2C2M2)
# The absolute path to the input directory is set
inPath <- system.file("extdata", "thomomys/", package="P2C2M2")
# The name of the xml-file generated by BEAUTi and located in
# "inPath" is set
inFile <- "thomomys.xml"
# Posterior predictive simulations with a setting of 2 simulation
# replicates are preformed
suppressWarnings(thomomys <- p2c2m2.complete(inPath, inFile, num.reps=2, error.rate = TRUE))
#>
#> bindings, loci, and alignments read
#>
#> locus 26 is done
#>
#> locus 29 is done
#>
#> locus 47 is done
#>
#> locus 53 is done
#>
#> locus 59 is done
#>
#> locus 64 is done
#>
#> locus 72 is done
#>
#> gene trees done
#>
#> species trees done
#>
#> Calculating metrics for 3 trees of the posterior distribution
#>
#> Calculating metrics for the trees of the post. predictive distribution
#>
#> Calculating differences btw. the posterior and the post. predictive distribution
#>
#> Calculating differences btw. 1000 pods and the post. predictive distribution to estimate error rates
thomomys$results$alpha0.01
#> $perGene
#> LCWT[2] NDC[2]
#> 26 "-14.27 (±37.45) n.s." "-2 (±7.21) n.s."
#> 29 "-20.41 (±38.11) n.s." "-2.33 (±10.41) n.s."
#> 47 "9.85 (±2.45) *" "2.33 (±4.16) n.s."
#> 53 "-20.49 (±2.29) *" "5.33 (±2.08) *"
#> 59 "-3.14 (±7.53) n.s." "-2.33 (±3.06) n.s."
#> 64 "-22.06 (±7.76) *" "10 (±6) *"
#> 72 "10.64 (±28.26) n.s." "-8.67 (±7.23) *"
#>
#> $acrGenes
#> LCWT[2] NDC[2]
#> Sum "-43.29 (±93.14) n.s." "-0.57 (±20.15) n.s."
#> Mean "-6.18 (±13.31) n.s." "-0.08 (±2.88) n.s."
#> Median "-14.21 (±18.06) n.s." "0.86 (±5.52) n.s."
#> Mode "8.04 (±26.61) n.s." "-2.57 (±11.84) n.s."
#> CV "0.8 (±2.36) n.s." "-5.77 (±6.53) n.s."
#>
#> $perGene.error
#> LCWT[2] NDC[2]
#> 26 0.657 0.678
#> 29 0.689 0.678
#> 47 0.689 0.665
#> 53 0.689 0.657
#> 59 0.654 0.678
#> 64 0.654 0.665
#> 72 0.689 0.678
#>
#> $acrGenes.error
#> LCWT[2] NDC[2]
#> Sum 0.657 0.665
#> Mean 0.657 0.665
#> Median 0.000 0.000
#> Mode 0.000 0.000
#> CV 0.657 0.665
thomomys$results$legend
#> [1] "Differences between the posterior and the posterior predictive distributions per locus and across loci. Each cell contains the following information in said order: mean, standard deviation, significance level. Error rates (if estimated with option error.rate=TRUE) are based on differences between the pods and the posterior predictive distributions. Codes in square brackets indicate the number of tails. Alpha values are automatically adjusted for the number of tails."
Bouckaert, R., Vaughan, T.G., Barido-Sottani, J., Duchêne, S., Fourment, M., Gavryushkina, A., Heled, J., Jones, G., Kühnert, D., De Maio, N., Matschiner, M., Mendes, F.K., Müller, N.F., Ogilvie, H.A., du Plessis, L., Popinga, A., Rambaut, A., Rasmussen, D., Siveroni, I., Suchard, M.A., Wu, C.H., Xie, D., Zhang, C., Stadler, T., Drummond, A.J. (2019) BEAST 2.5: An advanced software platform for Bayesian evolutionary analysis. PLoS Computational Biology, 15(4), e1006650.
Gruenstaeudl, M., Reid, N.M., Wheeler, G.R., Carstens, B.C. (2016) Posterior predictive checks of coalescent models: P2C2M, an R package. Molecular Ecology Resources, 6, 193–205.
Ogilvie, H.A., Bouckaert, R.R., Drummond, A.J. (2017) StarBEAST2 brings faster species tree inference and accurate estimates of substitution rates. Molecular Biology and Evolution, 34(8), 2101-–2114.
Reid, N.M., Brown, J.M., Satler, J.D., Pelletier, T.A., McVay, J.D., Hird, S.M., Carstens, B.C. (2014) Poor fit to the multi-species coalescent model is widely detectable in empirical data. Systematic Biology, 63, 322–333.
Villamil, J., Morando, M., Avila, L.J., Lanna, F.M., Fonseca, E.M., Sites, Jr., J.W., Camargo, A. Revisiting the problem of Multispecies Coalescent Model fit with an example from a complete molecular phylogeny of the Liolaemus wiegmannii species group (Squamata: Liolaemidae), in review.