Skip to content

A really fast, simple SNP pre-processor and annotator. Millions of variants per minute.

License

Notifications You must be signed in to change notification settings

bystrogenomics/bystro-snp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

bystro-snp Build Status

TL;DR

A really fast, simple SNP pre-processor and annotator. Millions of variants per minute.

go get github.com/akotlar/bystro-snp && go install $_;

pigz -d -c in.snp.gz | bystro-snp --minGq .95 | pigz -c - > output 2> log.txt

Description

Performs several important functions:

  1. Splits multiallelics
  2. Performs QC on variants: checks whether allele is ACTG, +ACTG, or -Int
  3. Filters samples based on genotype quality
  4. Calculates whether site is transition, transversion, or neither
  5. Processes all available samples
    • calculates homozygosity, heterozygosity, missingness
    • labels samples as homozygous, heterozygous, or missing

Publication

bystro-snp is used to pre-proces SNP files for Bystro (github)

If you use bystro-snp please cite https://genomebiology.biomedcentral.com/articles/10.1186/s13059-018-1387-3


Performance

Millions of variants/rows per minute. Performance is dependent on the # of samples.


Installation

go get github.com/akotlar/bystro-snp && go install $_;

Use

Via pipe:

pigz -d -c in.snp.gz | bystro-snp --minGq .95 | pigz -c - > out.gz

Via inPath argument:

bystro-snp --inPath in.snp --minGq .95 " > out

Output

chrom <String>   pos <Int>   type <String[SNP|DEL|INS|MULTIALLELIC]>    ref <String>    alt <String>    trTv <Int[0|1|2]>     heterozygotes <String>     heterozygosity <Float64>    homozygotes <String>     homozygosity <Float64>     missingGenos <String>    missingness <Float64>    sampleMaf <Float64>

Optional arguments

--minGq <Float>

Minimum genotype quality to keep (0 - 1)


--inPath /path/to/uncompressedFile.snp

An input file path, to an uncompressed VCF file. Defaults to stdin


--errPath /path/to/log.txt

Where to store log messages. Defaults to STDERR


--emptyField "!"

Which value to assign to missing data. Defaults to !


--fieldDelimiter ";"

Which delimiter to use when joining multiple values. Defaults to ;

About

A really fast, simple SNP pre-processor and annotator. Millions of variants per minute.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages