Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: update cnvkit pons #1465

Merged
merged 32 commits into from
Oct 11, 2024
Merged

feat: update cnvkit pons #1465

merged 32 commits into from
Oct 11, 2024

Conversation

mathiasbio
Copy link
Collaborator

@mathiasbio mathiasbio commented Jul 12, 2024

Description

THIS PR IS WAS BLOCKED BY THIS: https://github.com/Clinical-Genomics/target_capture_bed/issues/133
Now solved by this: #1469

With the change to the BAM files in the TGA workflows to start using UMIs for removing duplicates we will need to rebuild the PONs for all TGA workflows:

  • GMCKSolid v4.1
  • GMSMyeloid v5.3
  • GMSlymphoid v7.3

At the same time there are issues with the current GMSmyeloid PON with reports of noisy results which may be due to the fact that the PON was built using tumor samples and with a mix of samples from an earlier version of the panel. So while re-building the PON we might take the time to choose better samples if any are available.

Also for the other panels it would be good to re-evaluate which samples we have and can use to make sure that we're using samples that are up to date with our methods.

Finally, we can see if there are any other panels available for which we have enough samples to create a PON. Such as the one mentioned here: #1460

Tasks:

  • Re-evaluate samples used in current PON.

Added google-sheet here: https://docs.google.com/spreadsheets/d/18vs_2MKk-IyByjGMqEptdSlfcn9mB9bbpx6DahCA_a4/edit?gid=1393363517#gid=1393363517

Added

  • Added extension of target bed regions to a minimum size of 100 for CNV analysis
  • PON for: Exome comprehensive 10.2
  • PON for: GMSsolid 15.2
  • PON for: GMCKsolid 4.2

Changed

  • updated PON for GMCKSolid v4.1
  • updated PON for GMSMyeloid v5.3
  • updated PON for GMSlymphoid v7.3

Documentation

  • N/A
  • Updated Balsamic documentation to reflect the changes as needed for this PR.
    • [Document Name]

Tests

Feature Tests

  • N/A
  • Test [Description]
    • [Screenshot]

Pipeline Integrity Tests

  • Report deliver (generation of the .hk file)
    • N/A
    • Verified
  • TGA T/O Workflow
    • N/A
    • Verified
  • TGA T/N Workflow
    • N/A
    • Verified
  • UMI T/O Workflow
    • N/A
    • Verified
  • UMI T/N Workflow
    • N/A
    • Verified
  • WGS T/O Workflow
    • N/A
    • Verified
  • WGS T/N Workflow
    • N/A
    • Verified
  • QC Workflow
    • N/A
    • Verified
  • PON Workflow
    • N/A
    • Verified

Clinical Genomics Stockholm

Documentation

  • Atlas documentation
    • N/A
    • Updated: [Link]
  • Web portal for Clinical Genomics
    • N/A
    • Updated: [Link]

Panel of Normal specific criteria

User Changes

  • N/A
  • This PR affects the output files or results.
    • User feedback is considered unnecessary because [Justification].
    • Affected users have been included in the development process and given a chance to provide feedback.

Infrastructure Changes

  • Stored files in Housekeeper
    • N/A
    • Updated: [Link]
  • CG (CLI and delivered/uploaded files)
    • N/A
    • Updated: [Link]
  • Servers (configuration files on Hasta)
    • N/A
    • Updated: [Link]
  • Scout interface
    • N/A
    • Updated: [Link]

Checklist

Important

Ensure that all checkboxes below are ticked before merging.

For Developers

  • PR Description
    • Provided a comprehensive description of the PR.
    • Linked relevant user stories or issues to the PR.
  • Documentation
    • Verified and updated documentation if necessary.
  • Tests
    • Described and tested the functionality addressed in the PR.
    • Ensured integration of the new code with existing workflows.
    • Confirmed that meaningful unit tests were added for the changes introduced.
    • Checked that the PR has successfully passed all relevant code smells and coverage checks.
  • Review
    • Addressed and resolved all the feedback provided during the code review process.
    • Obtained final approval from designated reviewers.

For Reviewers

  • Code
    • Code implements the intended features or fixes the reported issue.
    • Code follows the project's coding standards and style guide.
  • Documentation
    • Pipeline changes are well-documented in the CHANGELOG and relevant documentation.
  • Tests
    • The author provided a description of their manual testing, including consideration of edge cases and boundary
      conditions where applicable, with satisfactory results.
  • Review
    • Confirmed that the developer has addressed all the comments during the code review.

@mathiasbio mathiasbio linked an issue Jul 12, 2024 that may be closed by this pull request
3 tasks
@mathiasbio mathiasbio mentioned this pull request Jul 12, 2024
57 tasks
Copy link

codecov bot commented Jul 12, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.49%. Comparing base (a50ff90) to head (f7f2d1d).
Report is 1 commits behind head on deduplicate_with_umi.

Additional details and impacted files
@@                  Coverage Diff                  @@
##           deduplicate_with_umi    #1465   +/-   ##
=====================================================
  Coverage                 99.48%   99.49%           
=====================================================
  Files                        40       40           
  Lines                      1957     1976   +19     
=====================================================
+ Hits                       1947     1966   +19     
  Misses                       10       10           
Flag Coverage Δ
unittests 99.49% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mathiasbio mathiasbio changed the base branch from master to deduplicate_with_umi July 12, 2024 12:57
@mathiasbio mathiasbio mentioned this pull request Jul 16, 2024
3 tasks
Copy link
Contributor

@ivadym ivadym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, neat implementation 🧙‍♂️ 🪄

BALSAMIC/assets/scripts/extend_bedfile.py Outdated Show resolved Hide resolved
BALSAMIC/constants/cluster_cache.json Outdated Show resolved Hide resolved
BALSAMIC/models/params.py Outdated Show resolved Hide resolved
BALSAMIC/snakemake_rules/variant_calling/extend_bed.rule Outdated Show resolved Hide resolved
docs/balsamic_pon.rst Outdated Show resolved Hide resolved
@mathiasbio mathiasbio mentioned this pull request Aug 2, 2024
58 tasks
@mathiasbio mathiasbio added this to the Release 16 milestone Aug 15, 2024
@mathiasbio mathiasbio self-assigned this Aug 15, 2024
@mathiasbio mathiasbio linked an issue Aug 15, 2024 that may be closed by this pull request
3 tasks
@mathiasbio mathiasbio linked an issue Aug 15, 2024 that may be closed by this pull request
3 tasks
@mathiasbio mathiasbio linked an issue Aug 15, 2024 that may be closed by this pull request
6 tasks
Copy link

sonarcloud bot commented Aug 20, 2024

@mathiasbio mathiasbio mentioned this pull request Sep 2, 2024
66 tasks
@mathiasbio mathiasbio marked this pull request as ready for review September 27, 2024 10:06
@mathiasbio mathiasbio requested a review from a team as a code owner September 27, 2024 10:06
This PR adds post-processing steps to CNVkit results from TGA to facilitate upload to GENS, which has previously only been possible for WGS via post-processing of the GATK CollectReadCounts output.

As the gnomad vcf is required as well for the creation of the BAF visualisation track in GENS the config and the GENS rule assignment has been modified to make it possible to use of these rules and references in TGA as well.

And additional little script was added to massage the CNVkit file tumor.merged.cnr into a GENS accepted format with different resolutions.

#### Added

- Script to post-process CNVkit output to GENS-format
- DNAscope gnomad calling to TGA for GENS

#### Changed

- Parsing of GENS arguments changed to account for TGA
Copy link

sonarcloud bot commented Oct 11, 2024

@mathiasbio mathiasbio merged commit 0df968c into deduplicate_with_umi Oct 11, 2024
5 checks passed
@mathiasbio mathiasbio deleted the update_cnvkit_pons branch October 11, 2024 11:10
mathiasbio added a commit that referenced this pull request Oct 16, 2024
#### Added

- UMI extraction and deduplication to TGA workflow
- Adapter trimming of fastqs to UMI workflow
- Cap base quality in bam for Manta input

#### Changed

- Refactored multi workflow rule-files to separate files to decrease complexity
- Refactored output files to in general comply with format {sample_type}.{sample_name}
- Replaced Picard QC tools with matching Sentieon QC tools

#### Removed

- UMI specific rules for UMI-extraction and alignment (using new TGA-rules instead) 
- Fastq and UMI trimming command-line options


Merged this PR into this one: #1465

#### Added

- Added extension of target bed regions to a minimum size of 100 for CNV analysis
- PON for: Exome comprehensive 10.2 
- PON for: GMSsolid 15.2 
- PON for: GMCKsolid 4.2

#### Changed

- updated PON for GMCKSolid v4.1 
- updated PON for GMSMyeloid v5.3 
- updated PON for GMSlymphoid v7.3

Merged this PR into this one: #1448

#### Added

- Script to post-process CNVkit output to GENS-format
- DNAscope gnomad calling to TGA for GENS

#### Changed

- Parsing of GENS arguments changed to account for TGA

Merged this PR: #1475 into this one

#### Changed

- Refactored rules for bcftools filters
- Renamed final UMI bamfile to ensure hsmetrics are collected in multiqc json
- Changed ranked VCF from research to clincial
- Lowered min AF for TGA from 0.007 to 0.005
- Lowered maximal SOR for TNscope in TGA tumor only cases from 3 to 2.7
- Changed filter settings for research TNscope vcf, now either PASS or triallelic_site (fixing this issue: #1293)

#### Added

- TNscope for TGA workflows, merged with VarDict results
- New filter for VarDict for tumor in normal contamination
- Export TMP environment variables to rules that lack them
- Added genmod ranked VCFs to be delivered
- Added family-id to genmod in order to get ranked variants to Scout (solved this: #1045)
- Added DP and AF to INFO-field of TNscope vcfs for ranking model
- Raw TNscope calls and unfiltered research-annotated SNVs to delivery

#### Removed

- ML-model for TNscope is removed due to license issue with new version of Sentieon
- All code associated with TNhaplotyper
- Removed research.filtered.pass VCFs from delivery and storage list
@mathiasbio mathiasbio mentioned this pull request Oct 17, 2024
15 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Completed
2 participants