Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sfitz concat vcf #213

Merged
merged 18 commits into from
Aug 10, 2023
Merged

Sfitz concat vcf #213

merged 18 commits into from
Aug 10, 2023

Conversation

sorelfitzgibbon
Copy link
Contributor

@sorelfitzgibbon sorelfitzgibbon commented Jul 27, 2023

Description

Add BCFtools process to concatenate the 2+ tool consensus variants into one VCF. The output header is a uniquified concatenation of all headers. The output fields: INFO FORMAT NORMAL and TUMOR are from the first listed VCF that has the variant.

Testing Results

nftest run a_mini_n2-all-tools-std-input
log: /hot/software/pipeline/pipeline-call-sSNV/Nextflow/development/unreleased/sfitz-concat-vcf/log-nftest-20230810T214722Z.log
output: /hot/software/pipeline/pipeline-call-sSNV/Nextflow/development/unreleased/sfitz-concat-vcf/a_mini_n2-all-tools-std-input

Checklist

  • I have read the code review guidelines and the code review best practice on GitHub check-list.

  • I have reviewed the Nextflow pipeline standards.

  • The name of the branch is meaningful and well formatted following the standards, using [AD_username (or 5 letters of AD if AD is too long)]-[brief_description_of_branch].

  • I have set up or verified the branch protection rule following the github standards before opening this pull request.

  • I have added my name to the contributors listings in the manifest block in the nextflow.config as part of this pull request; I am listed already, or do not wish to be listed. (This acknowledgement is optional.)

  • I have added the changes included in this pull request to the CHANGELOG.md under the next release version or unreleased, and updated the date.

  • I have updated the version number in the metadata.yaml and manifest block of the nextflow.config file following semver, or the version number has already been updated. (Leave it unchecked if you are unsure about new version number and discuss it with the infrastructure team in this PR.)

  • I have tested the pipeline on at least one A-mini sample.

@sorelfitzgibbon
Copy link
Contributor Author

Converting to a draft as I realized the output needs to be uncompressed for the next step and some checksums need to be added.

@sorelfitzgibbon sorelfitzgibbon marked this pull request as draft July 27, 2023 23:29
@sorelfitzgibbon sorelfitzgibbon marked this pull request as ready for review July 28, 2023 00:35
publishDir path: "${params.workflow_output_dir}/intermediate/${task.process.split(':')[-1]}",
mode: "copy",
pattern: "*concat.vcf",
enabled: params.save_intermediate_files
Copy link
Contributor Author

@sorelfitzgibbon sorelfitzgibbon Jul 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this intermediate file will be used by vcf2maf (and has to be uncompressed)

Copy link
Contributor

@maotian06 maotian06 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor naming suggestion! Otherwise looks good to me!
Anything else @yashpatel6

main.nf Outdated Show resolved Hide resolved
@sorelfitzgibbon sorelfitzgibbon changed the base branch from sfitz-plot-intersections to main August 2, 2023 23:14
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't done any runs with large samples since adding plot_VennDiagram_R or concat_VCFs_BCFtools so these are just guesses. These two processes will run together, but only after everything is done. I doubt they use much memory so I don't think it matters much. The next PR, add maf, will add one more process and may be the last PR before release. With that I could test with large samples and look at memory as well as which processes will use more cpus.

module/intersect-processes.nf Show resolved Hide resolved
Copy link
Contributor

@yashpatel6 yashpatel6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of comments:

Comment on lines 55 to 57
publishDir path: "${params.workflow_output_dir}/output",
mode: "copy",
pattern: "isec-1-or-more/*.txt"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason this was moved here from the intersect process? Generally, we want to publish files from the process that generated them

module/intersect-processes.nf Show resolved Hide resolved
module/intersect-processes.nf Show resolved Hide resolved
Copy link
Contributor

@yashpatel6 yashpatel6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of minor edits to make but otherwise looks good!

main.nf Outdated Show resolved Hide resolved
r-scripts/plot-venn.R Outdated Show resolved Hide resolved
Copy link
Contributor

@tyamaguchi-ucla tyamaguchi-ucla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I've added a few comments/questions.

@@ -78,4 +78,24 @@ process {
}
}
}
withName: plot_VennDiagram_R {
cpus = 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can VennDiagram take 2 CPUs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I sort of mentioned in the PR description, these are just placeholders until I do a large bam test run before the release. I will adjust these and the processes added in sfitz-add-maf at that time.

}
}
withName: concat_VCFs_BCFtools {
cpus = 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this process doesn't use 2 CPUs?

module/intersect-processes.nf Show resolved Hide resolved
r-scripts/plot-venn.R Show resolved Hide resolved
Copy link
Contributor

@yashpatel6 yashpatel6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! The resources allocations can be tuned after running the large sample in a future PR. Anything else to add @tyamaguchi-ucla ?

@sorelfitzgibbon sorelfitzgibbon marked this pull request as draft August 9, 2023 17:04
@sorelfitzgibbon sorelfitzgibbon marked this pull request as ready for review August 10, 2023 20:27
module/intersect.nf Outdated Show resolved Hide resolved
Copy link
Contributor

@yashpatel6 yashpatel6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Anything else to add @tyamaguchi-ucla ?

Copy link
Contributor

@tyamaguchi-ucla tyamaguchi-ucla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me although we might want to think about the structure under intersect-BCFtools-1.17 before the next release. Excellent work!

@sorelfitzgibbon sorelfitzgibbon merged commit 0ceb6e8 into main Aug 10, 2023
1 check passed
@sorelfitzgibbon sorelfitzgibbon deleted the sfitz-concat-vcf branch August 10, 2023 23:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants