Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sarek bcftools normalization #1682

Open
wants to merge 51 commits into
base: dev
Choose a base branch
from

Conversation

Patricie34
Copy link

@Patricie34 Patricie34 commented Oct 9, 2024

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/sarek branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@Patricie34
Copy link
Author

Hi all,

I've modified the normalization step to include all VCFs, not just the germline ones. For this, I used the pull request from JC-Delmas as a base. I am aware that this still requires a lot of work, and I would greatly appreciate any advice or feedback you can provide.

Thank you!

Patricie

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you run nf-core modules update bcftools/norm that's an old version of the modules

Copy link

github-actions bot commented Oct 10, 2024

nf-core pipelines lint overall result: Passed ✅ ⚠️

Posted for pipeline commit e877ed4

+| ✅ 215 tests passed       |+
#| ❔  11 tests were ignored |#
!| ❗   4 tests had warnings |!

❗ Test warnings:

  • pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
  • pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
  • pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!
  • pipeline_todos - TODO string in base.config: Check the defaults for all processes

❔ Tests ignored:

✅ Tests passed:

Run details

  • nf-core/tools version 3.0.2
  • Run at 2024-11-27 12:40:57

@maxulysse
Copy link
Member

@nf-core-bot fix linting pretty please 🙏

@maxulysse
Copy link
Member

We're missing CHANGELOG + tests + subway map

@maxulysse
Copy link
Member

@nf-core-bot fix linting pretty please 🙏

nextflow Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to commit this file

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file is still there

CHANGELOG.md Outdated Show resolved Hide resolved
Patricie34 and others added 7 commits November 20, 2024 06:39
Co-authored-by: Maxime U Garcia <maxime.garcia@seqera.io>
Co-authored-by: Friederike Hanssen <friederike.hanssen@seqera.io>
Co-authored-by: Friederike Hanssen <friederike.hanssen@seqera.io>
@maxulysse
Copy link
Member

issues we still need to assess:

WHY do we output vcfs_tbi from the concatenate subworkflow, when we just need vcf for vcftools and we don't seem to remove them anywhere?
I think we probably need to output just vcf from there, or keep it somewhere and map it out for the downstream processes.
I have little clues why it's not failing.

We need a variant caller id from concatenate as well.

I'm guessing we might need to output vcfs = VCFS_NORM_SORT.out.vcf from the normalization subworkflows and something similar from the concatenate one.

versions = versions.mix(TABIX_VCFS_NORM_SORT.out.versions)

emit:
vcfs = VCFS_NORM_SORT.out.vcf // normalized vcfs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that one should do the trick, but we need to figure out what to do with the tbis, and what's happening with the tbis on the concatenate side

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, when I run tests for concatenate_vcfs and normalize_vcfs separately, they complete without any errors. The resulted ouputs are vcf.gz as well as vcf.gz.tbi (variant_calling/normalized/testN/testN.norm.vcf.gz, testN.norm.vcf.gz.tbi, testN.vcf.gz and variant_calling/concat/testN/testN.germline.vcf.gz, testN.germline.vcf.gz.tbi, ),. In case of normalized, there is an extra vcf.gz file (testN.vcf.gz) in the outdir, which I don't know where it comes from and if it causes any issues.
But in case I run test for both concatenate and normalize, the test fails with following warnings and errors - I think, the same we've encountered during the monday's meeting with Maxime.
Snímek obrazovky 2024-11-27 v 10 39 48 (2)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants