Models describing evolution at the codon level allow the estimation of measures of the selective forces acting on proteins (Kosiol et. al. 2000). The available implementation is only M0 currently, also called as GY94, first published by Goldman & Yang 1994, and late designated to be one of the M-series (M0-M13) proposed by Yang et al. 2000.
The relative instantaneous substitution rate qij from codon i to codon j can be calculated as:
The above two figures are copied from Yang et al. 2000.
The model implementation in Java was ported from BEAST 1 project. The code was modified (including bug-fixing) to fit in BEAST 2 and also tested comparing with the codeml results.
The tutorial is available in wiki.
The codon substitution model takes codon alignment as the input, where the codon alignment wraps the nucleotide alignment and has to use codon data type, for example:
<data id="codon.alignment" data="@alignment" dataType="codon"
geneticCode="vertebrateMitochondrial" spec="CodonAlignment"/>
The attribute geneticCode includes "universal", "vertebrateMitochondrial", "yeast", "moldProtozoanMitochondrial", "mycoplasma", "invertebrateMitochondrial", "ciliate", "echinodermMitochondrial", "euplotidNuclear", "bacterial", "alternativeYeast", "ascidianMitochondrial", "flatwormMitochondrial", "blepharismaNuclear", "noStops".
If the sequences contain partial nucleotide ambiguities in the code, such as -TA, then turn off the flag unknownCodeException to replace any unmapped triplets to ---.
<data id="codon.alignment" data="@alignment" dataType="codon"
geneticCode="vertebrateMitochondrial" spec="CodonAlignment" unknownCodeException="false"/>
The xml to use M0 looks like:
<parameter id="m0.omega" value="0.04484"/>
<parameter id="m0.kappa" value="20.41545"/>
<substModel spec="codonmodels.M0Model" id="m0" verbose="true">
<omega idref="m0.omega"/>
<kappa idref="m0.kappa"/>
<frequencies id="m0.freqs" spec="CodonFrequencies" pi="F3X4">
<data idref="codon.alignment"/>
</frequencies>
</substModel>
Note: the tree likelihood has to use the codon alignment not the nucleotide alignment.
<distribution id="treeLikelihood" spec="ThreadedTreeLikelihood">
<data name="data" idref="codon.alignment"/>
<tree name="tree" idref="tree"/>
<siteModel name="siteModel" idref="siteModel"/>
<branchRateModel id="StrictClock.m0" spec="StrictClockModel" clock.rate="@clockRate"/>
</distribution>
The type of equilibrium codon frequencies πj consists of
"equal", "F1X4", "F3X4", and "F60/F61"
(Yang 2006).
Setting verbose="true"
will print the rate category matrix determining
which formula to use to calculate qij.
The codeml results (available here) are used to compare with the tree likelihood from this package given the same input.
The BEAST 2 xml to test tree likelihood is also available here.
Table 1: tree likelihood using M0
Frequencies | Omega | Kappa | Codeml | BEAST 2 codon model |
---|---|---|---|---|
equal | 0.14593 | 10.69924 | -2098.864211 | -2098.8642114431645 |
F1X4 | 0.10195 | 13.49811 | -2009.834314 | -2009.8343141884977 |
F3X4 | 0.08000 | 15.34858 | -1960.766171 | -1960.7661713238033 |
F60 | 0.08327 | 15.54039 | -1915.225157 | -1915.2251567137826 |
The small differences between two likelihoods are caused by rounding error. As you can see, their differences are slightly increased when the number of free parameters describing codon frequencies is increased.
Dong Xie & Remco Bouckaert, (2018, April 12), Codon substitution models v1.1.0, Zenodo, http://doi.org/10.5281/zenodo.1217169
This package dependencies are in version.xml.