This repository contains a simulator, implemented in a Jupyter notebook, to test the effects of selection, recombination and linkage disequilibrium on haploid and diploid versions of the Wright-Fisher model of genetic drift (Figure A). For the haploid model, the simulator demonstrates that speed and rate of fixation of an allele in a population can be increased through positive selection. For the diploid model, the simulator demonstrates how LD can be broken down faster with higher recombination and negative selection.
Figure (A) Diagram demonstrating allele genetic drift over time for the standard Wright-Fisher model (left) and the modified two loci model (right).
The classic Wright-Fisher model simulates a haploid, asexual, panmictic population of size
- A population is initialised with an
$a$ allele count of$n$ . - For each preceding generation, the number of alleles
$n'$ is sampled independently with replacement from a binomial distribution. Selection can be introduced via the selection coefficient$s \neq 0$ . As alleles$a$ and$A$ have relative fitness$w_{a} = 1 + s$ and$w_{A} = 1$ , an$s > 0$ gives allele$a$ a fitness advantage, while$s < 0$ means the allele is deleterious and less likely to be passed to the next generation. - Drift is modelled as a stochastic process in which the variant allele can be lost if its frequency in a generation reaches zero, fixed if its frequency reaches
$N$ or fluctuating for any other$n$ . If either extreme is encountered, the simulator will end prematurely as only one allele remains.
A variant of the Wright-Fisher model was created to model genetic drift between two linked loci
- The number of individuals with the
$A$ allele is initialised through randomly sampling a distribution$S(n)$ , while the rest of the population is set to have the$a$ allele. - Then a singleton variant allele is introduced at random at site
$B$ which can occur on either a chromosome with the$A$ allele or the$a$ allele. This is equivalent to introducing a new mutation at$B$ . - For each generation, a new population is filled through selecting haplotypes from the previous generation with probability
$1 - r$ , and creating recombinant haplotypes with probability$r$ . In the case of recombination, two alleles are selected at random with probability proportional to the allele frequencies of the previous generation. All generations are kept at a constant size$N$ and the simulator iterates until an allele at one loci become fixed.
Linkage disequilibrium (LD) is the non-random association of alleles at different loci. One method to quantify LD between two alleles
The range of potential values for
Another normalised measure of LD is the genetic correlation
For the single loci model, the effects of introducing positive/negative selection on genetic drift of an allele with initial frequency
Selection Coefficient (s) | Selection Effect | Number Fixed | Number Lost | Mean Fixation Time | Mean Loss Time |
---|---|---|---|---|---|
0.02 | Strongly beneficial | 40 | 960 | 171.475 | 7.554 |
0.005 | Weakly beneficial | 29 | 970 | 198.931 | 8.737 |
0 | Alleles have equal fitness | 12 | 988 | 171.0 | 8.694 |
-0.005 | Weakly deleterious | 7 | 993 | 194.143 | 9.986 |
-0.02 | Strongly deleterious | 1 | 999 | 114.0 | 8.269 |
For the linked loci model, the initial values for three common linkage disequilibrium metrics
Run ID | Singleton Haplotype | AB | Ab | aB | ab | |||
---|---|---|---|---|---|---|---|---|
1 | Ab | 4 | 1 | 95 | 0 | 0.0095 | 1 | 0.1919 |
2 | ab | 27 | 0 | 72 | 1 | 0.0027 | 1 | 0.0037 |
3 | AB | 1 | 1 | 0 | 98 | 0.0098 | 1 | 0.4949 |
4 | aB | 0 | 3 | 1 | 96 | 0.0003 | 1 | 0.0003 |
5 | Ab | 22 | 1 | 77 | 0 | 0.0077 | 1 | 0.0338 |
Table (B) Measuring linkage disequilibrium for the initial population across 5 runs.
Figure (B) Initial distribution of
For each generation in the diploid simulator, recombinant haplotypes are created between two loci with probability
The simulator was run 1,000 times for
Figure (C) Three measures of linkage disequilibrium for different recombination rates. On the left
The distribution of average allele frequency after a fixation event occurred for different
When negative selection is stronger, the
Figure (D) Bar chart of allele frequency in final generation averaged across 1,000 simulation for different selection coefficients.