Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding files for Peddy analysis #63

Closed
wants to merge 21 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
f044be3
adding files for CODEC pipeline
stellaning1120 Jan 30, 2024
20c00d9
change fgbio version in the wdl to match on prem
stellaning1120 Feb 6, 2024
d585698
make eval_genome_interval/bed tunable in multiple tasks
stellaning1120 Feb 13, 2024
bcc80bb
add Dockerfile that buils the codec docker
stellaning1120 Feb 16, 2024
801bdc4
add inputs.json and modify .dockstore.yml to add 2 wdls to Dockstore
stellaning1120 Feb 16, 2024
110f8ac
change duplication rate calculation, add duplex efficiency calculatio…
stellaning1120 Feb 28, 2024
7bb7ede
add ceil() to help with disk size allocation
stellaning1120 Feb 29, 2024
0feeada
minor changes to task name and QC metrics collection
stellaning1120 Mar 1, 2024
97bcfef
minor changes to disk size input
stellaning1120 Mar 1, 2024
d71532f
change variant calling parameters and remove MarkDuplicated task
stellaning1120 May 3, 2024
c2347f1
change docker images since it is now public
stellaning1120 May 23, 2024
a73fe6c
not yet pushing to public
stellaning1120 May 23, 2024
ca095be
docker images chaneg to public ones
stellaning1120 May 25, 2024
21cf843
switch to public docker image
stellaning1120 May 25, 2024
dd0d44e
correct docker images
stellaning1120 May 29, 2024
852be3e
change to public docker from tag-public
stellaning1120 May 30, 2024
aaa259e
switch to public dockers
stellaning1120 May 30, 2024
e0dc64b
Add files for new module Signature Profiler of CODEC pipeline and mod…
stellaning1120 Jul 11, 2024
f81db0d
add files for Peddy analysis pipeline
stellaning1120 Jul 31, 2024
8b08990
adding path to .dockstore.yml
stellaning1120 Jul 31, 2024
ad06c1f
Merge branch 'master' into PeddyAnalysis
stellaning1120 Jul 31, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions .dockstore.yml
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,31 @@ workflows:
primaryDescriptorPath: /CleanupFailedSubmissions/Cleanup_Failed_Submissions.wdl
testParameterFiles:
- /CleanupFailedSubmissions/Cleanup_Failed_Submissions.inputs.json
- name: codec_bcl2fastq
subclass: WDL
primaryDescriptorPath: /CODEC/codec_bcl2fastq.wdl
testParameterFiles:
- /CODEC/codec_bcl2fastq.inputs.json
- name: demux_CODEC
subclass: WDL
primaryDescriptorPath: /CODEC/demux_CODEC.wdl
testParameterFiles:
- /CODEC/demux_CODEC.inputs.json
- name: SingleSampleCODEC
subclass: WDL
primaryDescriptorPath: /CODEC/SingleSampleCODEC.wdl
testParameterFiles:
- /CODEC/SingleSampleCODEC.inputs.json
- name: SigProfiler
subclass: WDL
primaryDescriptorPath: /CODEC/SigProfiler.wdl
testParameterFiles:
- /CODEC/SigProfiler.inputs.json
- name: Peddy_AnalysisFamiliarRelatedness
subclass: WDL
primaryDescriptorPath: /PeddyAnalysis/Peddy_AnalyzeFamilialRelateness.wdl
testParameterFiles:
- /PeddyAnalysis/Peddy_AnalyzeFamilialRelateness.inputs.json
- name: TAG_Mop
subclass: WDL
primaryDescriptorPath: /TAG_Mop/TAG_Mop.wdl
Expand Down
19 changes: 19 additions & 0 deletions CODEC/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Signature Profiling WDL for CODEC Mutlist Output
CODEC pipeline: SingleSampleCODEC pipeline provides text files for discovered mutations.

This workflow summarizes and plots mutation spectrums in 96 trinucleotide contexts and generate Mutation Matrix that will be later used to subtract SBS(Single Base Substitution) signatures from SNVs with database reference from COSMIC(https://cancer.sanger.ac.uk/signatures/).

The Signature Profiling tool is from https://github.com/AlexandrovLab/SigProfilerAssignment and has been implanted to the docker image.

The output of this WDL includes:
1) SpectrumPlots
2) MutationMetrics
3) SignatureCount
4) SignatureProportionPDF
5) SignatureStackedPlot
6) TMBPlot
7) DecomposedSignatureProbabilities


### Citation
Díaz-Gay et al. 2023 Bioinformatics and Tate et al. 2019 Nucleic Acids Research
1 change: 1 addition & 0 deletions CODEC/SigProfiler.inputs.json
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"SigProfiler.GenomeFasta":"${workspace.referenceData_hg38_ref_fasta}","SigProfiler.MutlistFiles":"${this.samples.variants_called}","SigProfiler.mutlist_to_96_contexts.GenomeFastaIndex":"${workspace.referenceData_hg38_ref_fasta_index}"}
145 changes: 145 additions & 0 deletions CODEC/SigProfiler.wdl
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
version 1.0

workflow SigProfiler {
input {
Array[File] MutlistFiles
File GenomeFasta
}

call mutlist_to_96_contexts {
input:
MutlistFiles = MutlistFiles,
GenomeFasta = GenomeFasta
}
call sigprofiler_analysis {
input:
MutationMetrics = mutlist_to_96_contexts.MutationMetrics

}
call PlotSignatures {
input:
SignatureCount = sigprofiler_analysis.SignatureCount
}

output {
File MutationMetrics = mutlist_to_96_contexts.MutationMetrics
File SpectrumPlots = mutlist_to_96_contexts.SpectrumPlots
File DecomposedSignatureProbabilities = sigprofiler_analysis.DecomposedSignatureProbabilities
File SignatureStackedPlot = sigprofiler_analysis.SignatureStackedPlot
File TMBPlot = sigprofiler_analysis.TMBPlot
File SignatureCount = sigprofiler_analysis.SignatureCount
File SignatureProportionPDF = PlotSignatures.signature_proportions_pdf
}
}



task mutlist_to_96_contexts {
input {
Array[File] MutlistFiles
File GenomeFasta
File GenomeFastaIndex
}

command {
Rscript /scripts/96_contexts_mutations.R "~{sep=' ' MutlistFiles}" ~{GenomeFasta}
}

output {
File MutationMetrics = "trinuc_mutation_metrics.txt"
File SpectrumPlots = "all_sample_spectrums.pdf"
}

runtime {
docker: "us.gcr.io/tag-public/sigprofiler:v1"
memory: "8 GB"
disks: "local-disk 20 HDD"
}
}

task sigprofiler_analysis {
input {
File MutationMetrics
String OutputFolder = "SigProfiler-output"
}

command {
python3 <<EOF
import sys
from SigProfilerMatrixGenerator import install as genInstall
genInstall.install("GRCh38")

from SigProfilerAssignment import Analyzer as Analyze
Analyze.cosmic_fit(samples="~{MutationMetrics}",
output="~{OutputFolder}",
input_type="matrix",
genome_build="GRCh38",
cosmic_version=3.3)
EOF
}

output {
File DecomposedSignatureProbabilities = "~{OutputFolder}/Assignment_Solution/Activities/Decomposed_MutationType_Probabilities.txt"
File SignatureStackedPlot = "~{OutputFolder}/Assignment_Solution/Activities/Assignment_Solution_Activity_Plots.pdf"
File TMBPlot = "~{OutputFolder}/Assignment_Solution/Activities/Assignment_Solution_TMB_plot.pdf"
File SignatureCount = "~{OutputFolder}/Assignment_Solution/Activities/Assignment_Solution_Activities.txt"
}

runtime {
docker: "us.gcr.io/tag-public/sigprofiler:v1"
memory: "8 GB"
disks: "local-disk 20 HDD"
}
}


task PlotSignatures {
input {
File SignatureCount
}

command {
python3 <<EOF
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

SigCounts = pd.read_csv("${SignatureCount}", sep='\t', header=0)
SigCounts = pd.DataFrame(SigCounts)

# Calculate proportions
signature_cols = SigCounts.columns[1:] # Exclude the 'Samples' column
SigCounts[signature_cols] = SigCounts[signature_cols].div(SigCounts[signature_cols].sum(axis=1), axis=0)

# Reshape the data
SigCounts_long = SigCounts.melt(id_vars=["Samples"], var_name="Signature", value_name="Proportion")
SigCounts_long = SigCounts_long[SigCounts_long["Proportion"] > 0]

# Plot the data
plt.figure(figsize=(16, 9))
sns.scatterplot(data=SigCounts_long, x="Samples", y="Signature", size="Proportion", sizes=(20, 200), legend=False)
plt.xticks(rotation=90)
plt.xlabel("Sample Name", fontsize=16)
plt.ylabel("Signature", fontsize=16)
plt.title("Signature Proportions by Sample", fontsize=20, pad = 20)
plt.grid(axis='y')
ax = plt.gca()
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['bottom'].set_visible(False)
plt.tight_layout()
plt.savefig("signature_proportions.pdf", format="pdf")
EOF
}

output {
File signature_proportions_pdf = "signature_proportions.pdf"
}

runtime {
docker: "us.gcr.io/tag-public/sigprofiler:v1"
memory: "8 GB"
disks: "local-disk 20 HDD"
}
}
63 changes: 63 additions & 0 deletions CODEC/SingleSampleCODEC.inputs.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
{
"SingleSampleCODEC.AlignMolecularConsensusReads.cpu_cores": "${2}",
"SingleSampleCODEC.AlignMolecularConsensusReads.disk_size": "${}",
"SingleSampleCODEC.AlignMolecularConsensusReads.memory": "${}",
"SingleSampleCODEC.AlignMolecularConsensusReads.threads": "${8}",
"SingleSampleCODEC.AlignRawTrimmed.disk": "${64}",
"SingleSampleCODEC.AlignRawTrimmed.disk_size": "${}",
"SingleSampleCODEC.AlignRawTrimmed.mem": "${}",
"SingleSampleCODEC.AlignRawTrimmed.memory": "${64}",
"SingleSampleCODEC.CDSByProduct.disk_size": "${}",
"SingleSampleCODEC.CDSByProduct.mem": "${}",
"SingleSampleCODEC.CSS_SFC_ErrorMetrics.disk_size": "${800}",
"SingleSampleCODEC.CSS_SFC_ErrorMetrics.memory": "${}",
"SingleSampleCODEC.CollectConsensusWgsMetrics.disk_size": "${160}",
"SingleSampleCODEC.CollectConsensusWgsMetrics.memory": "${}",
"SingleSampleCODEC.CollectInsertSizeMetrics.disk_size": "${100}",
"SingleSampleCODEC.CollectInsertSizeMetrics.memory": "${32}",
"SingleSampleCODEC.CollectRawWgsMetrics.disk_size": "${160}",
"SingleSampleCODEC.CollectRawWgsMetrics.memory": "${}",
"SingleSampleCODEC.FgbioCollapseReadFamilies.disk_size": "${200}",
"SingleSampleCODEC.FgbioCollapseReadFamilies.memory": "${}",
"SingleSampleCODEC.GroupReadByUMI.disk_size": "${400}",
"SingleSampleCODEC.GroupReadByUMI.memory": "${32}",
"SingleSampleCODEC.MarkRawDuplicates.disk_size": "${200}",
"SingleSampleCODEC.MarkRawDuplicates.memory": "${64}",
"SingleSampleCODEC.MergeAndSortMoleculeConsensusReads.disk_size": "${160}",
"SingleSampleCODEC.MergeAndSortMoleculeConsensusReads.memory": "${64}",
"SingleSampleCODEC.MergeLogSplit.disk_size": "${}",
"SingleSampleCODEC.MergeLogSplit.mem": "${}",
"SingleSampleCODEC.MergeSplit.disk_size": "${200}",
"SingleSampleCODEC.MergeSplit.memory": "${32}",
"SingleSampleCODEC.RAW_SFC_ErrorMetrics.disk_size": "${800}",
"SingleSampleCODEC.RAW_SFC_ErrorMetrics.memory": "${32}",
"SingleSampleCODEC.ReplaceRawReadGroup.disk_size": "${200}",
"SingleSampleCODEC.ReplaceRawReadGroup.memory": "${32}",
"SingleSampleCODEC.SortBam.disk_size": "${200}",
"SingleSampleCODEC.SortBam.mem": "${}",
"SingleSampleCODEC.SplitFastq1.disk_size": "${400}",
"SingleSampleCODEC.SplitFastq1.memory": "${32}",
"SingleSampleCODEC.SplitFastq2.disk_size": "${400}",
"SingleSampleCODEC.SplitFastq2.memory": "${32}",
"SingleSampleCODEC.Trim.disk_size": "${64}",
"SingleSampleCODEC.Trim.mem": "${32}",
"SingleSampleCODEC.ZipperBamAlignment.disk_size": "${200}",
"SingleSampleCODEC.ZipperBamAlignment.mem": "${32}",
"SingleSampleCODEC.eval_genome_bed": "gs://gptag/CODEC/ddbtp_codec_easy_regions.hg38.bed",
"SingleSampleCODEC.eval_genome_interval": "gs://gptag/CODEC/ddbtp_codec_easy_regions.hg38.interval_list",
"SingleSampleCODEC.fastq1": "${this.fastq1}",
"SingleSampleCODEC.fastq2": "${this.fastq2}",
"SingleSampleCODEC.germline_bam": "${this.germline_bam}",
"SingleSampleCODEC.germline_bam_index": "${this.germline_bam_index}",
"SingleSampleCODEC.num_parallel": "${40}",
"SingleSampleCODEC.reference_amb": "${workspace.referenceData_hg38_ref_amb}",
"SingleSampleCODEC.reference_ann": "${workspace.referenceData_hg38_ref_ann}",
"SingleSampleCODEC.reference_bwt": "${workspace.referenceData_hg38_ref_bwt}",
"SingleSampleCODEC.reference_dict": "${workspace.referenceData_hg38_ref_dict}",
"SingleSampleCODEC.reference_fasta": "${workspace.referenceData_hg38_ref_fasta}",
"SingleSampleCODEC.reference_fasta_index": "${workspace.referenceData_hg38_ref_fasta_index}",
"SingleSampleCODEC.reference_pac": "${workspace.referenceData_hg38_ref_pac}",
"SingleSampleCODEC.reference_sa": "${workspace.referenceData_hg38_ref_sa}",
"SingleSampleCODEC.sample_id": "${this.sample_id}",
"SingleSampleCODEC.sort_memory": "2G"
}
Loading
Loading