-
Notifications
You must be signed in to change notification settings - Fork 24
Population stats
Say you have an aggregated VCF like this:
#CHROM POS ID REF ALT FILTER QUAL INFO
chr1 100 . A T . . AF=0.3;AF_AFR=0.4;AF_EUR=0.1
The pipeline for aggregated VCFs will only load the AF=0.3
statistic. However, it's possible to tell the pipeline how to parse the other frequencies and store them as population frequencies.
First, you have to write a mapping file (e.g. stats-mapping.properties
) like this:
AFR.AF=AF_AFR
EUR.AF=AF_EUR
ALL.AF=AF
where:
- The first string until the dot (e.g.
AFR
) is the population name as it will appear in the EVA DB and website, and can be chosen by the EVA operator, except the fixedALL
population (for the whole sample set). - The string from the dot until the
=
character (e.g.AF
) is the variable. It can only be one of AN (allele number), AC (allele count), AF (allele frequency). Providing AC requires providing AN too to be able to compute the frequency AC/AN. - From the
=
character until the end (e.g.AF_AFR
), that's the tag as it appears in the VCF, and can be whatever the submitter used.
Note that the tag from the VCF is arbitrary. It could appear in the VCF as FREQ_AFRICAN=0.4
and the line in the mapping file would be AFR.AF=FREQ_AFRICAN
.
Once the mapping file is ready, you have to put its path in the pipeline parameter properties file as:
input.vcf.aggregation.mapping-path=/path/to/stats-mapping.properties
Note that the job has to be the one for aggregated VCFs (spring.batch.job.names=aggregated-vcf-job
), and the aggregation type has to be one of input.vcf.aggregation=BASIC
, or =EVS
, or =EXAC
(in other words, can't be =NONE
).
For more details about this feature, look at the source code at https://github.com/EBIvariation/eva-pipeline/blob/master/src/main/java/uk/ac/ebi/eva/pipeline/io/mappers/VariantAggregatedVcfFactory.java#L65 . If that file doesn't exist anymore, then it's likely that the one being used is https://github.com/EBIvariation/variation-commons/blob/master/variation-commons-core/src/main/java/uk/ac/ebi/eva/commons/core/models/factories/VariantAggregatedVcfFactory.java#L69 .
Pipeline design
Database
Tutorials