v2.0.0
PacBio WGS Variant Pipeline v2.0.0
This is a major restructuring of the v1 workflow. Please read the documentation before filing issues.
Structural changes
- There are two entry-points, singleton.wdl and family.wdl.
singleton.wdl
has a flattened input/output structure that should have better compatibility with platforms like Terra.family.wdl
includes joint calling tasks for small variants and structural variants.- The
family.wdl
entrypoint can be used for both single sample (singleton) and multisample (duo, trio, quad, etc.) inputs, allowing for a single workflow to be used for all analyses. The per-sample outputs will be arrays in the same order as the sample input. Thesingleton.wdl
entrypoint will be maintained for backends that need flattened inputs and outputs.
- phenotype field has been changed from Array[String] to String, a comma-delimited string, e.g., "HP:0000118,HP:0000001"
- Static inputs like reference FASTA and BED files are now referenced through new "map" files to simplify inputs.json structure.
- Workflow
inputs.json
files have been greatly simplified. - Most tasks have been moved to the
wdl-common
submodule for reuse. - AWS AGC has been deprecated by AWS, and support has been removed.
- AWS HealthOmics support has been added (needs improved documentation). Added script to deploy container to private ECR repo for HealtOmics.
New features:
- If aligned BAMs are provided as input to the workflow, alignment and phasing information will be stripped and the reads will be realigned. If the input BAM has consensus kinetics tags, these will be stripped as well.
- Sex (or more specifically, presence or absence of chrY) is inferred by relative chrY aligned depth. This will never override user-defined sex, but is used if the sex is not provided by user.
- HiPhase now jointly phases small variants (DeepVariant), structural variants (PBSV), and tandem repeats (TRGT).
- Merged TRGT VCF will be generated by the family workflow.
- Pharmacogenomics analysis with StarPhase and PharmCAT.
- Updated tertiary analysis with gnomAD v4.1 and CoLoRSdb population datasets.
- High level summary statistics (e.g., mean depth, variant counts by type, etc) output directly by workflow in the form of workflow metadata output (e.g. miniwdl
outputs.json
) and a flatstats.txt
TSV. - Many QC plots have been added:
- read length histogram
- read quality histogram
- aligned depth distribution and cumulative depth distribution
- alignment MAPQ histogram
- alignment gap compressed identity histogram
- SNV distribution heatmap
- small indel size histogram
Tool updates
pbmm2 1.16.0
mosdepth v0.3.9
DeepVariant v1.6.1
pbsv v2.10.0
Paraphase v3.1.1
TRGT v1.2.0
HiPhase v1.4.5
HiFiCNV v1.0.1
pb-StarPhase v1.0.0
PharmCAT v2.15.4
slivar v0.3.1
CoLoRSdb v1.1.0
Thanks to: