-
Notifications
You must be signed in to change notification settings - Fork 8
Home
Created by Ariella Gladstein, based on code from Consuelo Quinto Cortes and Krishna Veeramah.
Also worked on by David Christy, Logan Gantner, and Mack Skodiak.
agladstein@email.arizona.edu
SimPrily runs genome simulations with user defined parameters or parameters randomly generated by priors and computes genomic statistics on the simulation output.
Version 1
- Run genome simulation with model defined by prior distributions of parameters and demographic model structure.
- Take into account SNP array ascertainment bias by creating pseudo array based on priors of number of samples of discovery populations and allele frequency cut-off.
- Calculate genomic summary statistics on simulated genomes and pseudo arrays.
This is ideal for use with Approximate Bayesian Computation on whole genome or SNP array data.
Uses c++ programs macs and GERMLINE. For more information on these programs, see:
https://github.com/gchen98/macs
https://github.com/sgusev/GERMLINE
Real data must be in PLINK .tped file with 0's and 1's.
Sites in rows, individuals in columns (first 4 columns chr, rsnumber, site_begin, site_end).
The populations must be in the same order as specified in the model file for the simulations.
Put the individuals in teh correct order
https://www.cog-genomics.org/plink2/data#indiv_sort
plink --bfile bfile --indiv-sort f sample_order.txt --make-bed --out bfile_ordered
To get in the .tped format from .bed .bim .fam with 0's and 1's refer to
https://www.cog-genomics.org/plink2/formats#tped
plink --bfile bfile --recode transpose 01 --output-missing-genotype N --out tfile01
real_data_ss.py
takes 5 arguments:
model_file
param_file
output_dir
genome_file
array_file
e.g.
python real_data_ss.py examples/eg1/model_file_eg1.csv examples/eg1/param_file_eg1.txt out_dir ~/data/HapMap_example/test_10_YRI_CEU_CHB.tped ~/data/HapMap_example/test_10_YRI_CEU_CHB_KHV_hg18_ill_650.tped
Number of simulated segregating sites less than number of sites on template array. Increase size of simulated locus.