Skip to content
agladstein edited this page Jan 23, 2018 · 7 revisions

SimPrily

Created by Ariella Gladstein, based on code from Consuelo Quinto Cortes and Krishna Veeramah. Also worked on by David Christy, Logan Gantner, and Mack Skodiak.
agladstein@email.arizona.edu

About

SimPrily runs genome simulations with user defined parameters or parameters randomly generated by priors and computes genomic statistics on the simulation output.
Version 1

  1. Run genome simulation with model defined by prior distributions of parameters and demographic model structure.
  2. Take into account SNP array ascertainment bias by creating pseudo array based on priors of number of samples of discovery populations and allele frequency cut-off.
  3. Calculate genomic summary statistics on simulated genomes and pseudo arrays.

This is ideal for use with Approximate Bayesian Computation on whole genome or SNP array data.

Uses c++ programs macs and GERMLINE. For more information on these programs, see:
https://github.com/gchen98/macs
https://github.com/sgusev/GERMLINE


Calculating summary statistics on real data

Data format

Real data must be in PLINK .tped file with 0's and 1's.
Sites in rows, individuals in columns (first 4 columns chr, rsnumber, site_begin, site_end). The populations must be in the same order as specified in the model file for the simulations.

Put the individuals in teh correct order
https://www.cog-genomics.org/plink2/data#indiv_sort

plink --bfile bfile --indiv-sort f sample_order.txt --make-bed --out bfile_ordered

To get in the .tped format from .bed .bim .fam with 0's and 1's refer to
https://www.cog-genomics.org/plink2/formats#tped

plink --bfile bfile --recode transpose 01 --output-missing-genotype N --out tfile01

Usage

real_data_ss.py takes 5 arguments:

  1. model_file
  2. param_file
  3. output_dir
  4. genome_file
  5. array_file

e.g.

python real_data_ss.py examples/eg1/model_file_eg1.csv examples/eg1/param_file_eg1.txt out_dir ~/data/HapMap_example/test_10_YRI_CEU_CHB.tped ~/data/HapMap_example/test_10_YRI_CEU_CHB_KHV_hg18_ill_650.tped

Common Errors

Number of simulated segregating sites less than number of sites on template array. Increase size of simulated locus.

Clone this wiki locally