A really fast, simple SNP pre-processor and annotator. Millions of variants per minute.
go get github.com/akotlar/bystro-snp && go install $_;
pigz -d -c in.snp.gz | bystro-snp --minGq .95 | pigz -c - > output 2> log.txt
Performs several important functions:
- Splits multiallelics
- Performs QC on variants: checks whether allele is ACTG, +ACTG, or -Int
- Filters samples based on genotype quality
- Calculates whether site is transition, transversion, or neither
- Processes all available samples
- calculates homozygosity, heterozygosity, missingness
- labels samples as homozygous, heterozygous, or missing
bystro-snp is used to pre-proces SNP files for Bystro (github)
If you use bystro-snp please cite https://genomebiology.biomedcentral.com/articles/10.1186/s13059-018-1387-3
Millions of variants/rows per minute. Performance is dependent on the # of samples.
go get github.com/akotlar/bystro-snp && go install $_;
Via pipe:
pigz -d -c in.snp.gz | bystro-snp --minGq .95 | pigz -c - > out.gz
Via inPath
argument:
bystro-snp --inPath in.snp --minGq .95 " > out
chrom <String> pos <Int> type <String[SNP|DEL|INS|MULTIALLELIC]> ref <String> alt <String> trTv <Int[0|1|2]> heterozygotes <String> heterozygosity <Float64> homozygotes <String> homozygosity <Float64> missingGenos <String> missingness <Float64> sampleMaf <Float64>
--minGq <Float>
Minimum genotype quality to keep (0 - 1)
--inPath /path/to/uncompressedFile.snp
An input file path, to an uncompressed VCF file. Defaults to stdin
--errPath /path/to/log.txt
Where to store log messages. Defaults to STDERR
--emptyField "!"
Which value to assign to missing data. Defaults to !
--fieldDelimiter ";"
Which delimiter to use when joining multiple values. Defaults to ;