Skip to content

Commit

Permalink
Merge pull request #320 from uclahs-cds/nwiltsie-update-reference-paths
Browse files Browse the repository at this point in the history
Update cluster reference paths
  • Loading branch information
yashpatel6 authored Oct 28, 2024
2 parents 51d781a + 56cec65 commit 0e834a5
Show file tree
Hide file tree
Showing 15 changed files with 61 additions and 61 deletions.
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -194,7 +194,7 @@ input:
| `docker_container_registry` | no | string | Registry containing tool Docker images, optional. Default: `ghcr.io/uclahs-cds` |
| `base_resource_update` | optional | namespace | Namespace of parameters to update base resource allocations in the pipeline. Usage and structure are detailed in `template.config` and below. |

*Providing `intersect_regions` is required and will limit the final output to just those regions. All regions of the reference genome could be provided as a `bed` file with all contigs, however it is HIGHLY recommended to remove `decoy` contigs from the human reference genome. Including these thousands of small contigs will require the user to increase available memory for `Mutect2` and will cause a very long runtime for `Strelka2`. See [Discussion here](https://github.com/uclahs-cds/pipeline-call-sSNV/discussions/216). For `uclahs-cds` users, a GRCh38 `bed.gz` file can be found here: `/hot/ref/tool-specific-input/pipeline-call-sSNV-6.0.0/GRCh38-BI-20160721/Homo_sapiens_assembly38_no-decoy.bed.gz`.
*Providing `intersect_regions` is required and will limit the final output to just those regions. All regions of the reference genome could be provided as a `bed` file with all contigs, however it is HIGHLY recommended to remove `decoy` contigs from the human reference genome. Including these thousands of small contigs will require the user to increase available memory for `Mutect2` and will cause a very long runtime for `Strelka2`. See [Discussion here](https://github.com/uclahs-cds/pipeline-call-sSNV/discussions/216). For `uclahs-cds` users, a GRCh38 `bed.gz` file can be found here: `/hot/resource/tool-specific-input/pipeline-call-sSNV-6.0.0/GRCh38-BI-20160721/Homo_sapiens_assembly38_no-decoy.bed.gz`.

### Base resource allocation updaters
To optionally update the base resource (cpus or memory) allocations for processes, use the following structure and add the necessary parts to the [input.config](config/template.config) file. The default allocations can be found in the [node-specific config files](./config/)
Expand Down Expand Up @@ -258,8 +258,8 @@ base_resource_update {
| filter_mutect_calls_extra_args | no | string | Additional arguments for the FilterMutectCalls command |
| gatk_command_mem_diff | yes | nextflow.util.MemoryUnit | How much to subtract from the task's allocated memory where the remainder is the Java heap max. (should not be changed unless task fails for memory related reasons) |
| scatter_count | yes | int | Number of intervals to split the desired interval into. Mutect2 will call each interval seperately. |
| germline_resource_gnomad_vcf | no | path | A stripped down version of the [gnomAD VCF](https://gnomad.broadinstitute.org/) stripped of all unneeded INFO fields, keeping only AF, currently available for GRCh38:`/hot/ref/tool-specific-input/GATK/GRCh38/af-only-gnomad.hg38.vcf.gz` and GRCh37: `/hot/ref/tool-specific-input/GATK/GRCh37/af-only-gnomad.raw.sites.vcf`. |
| panel_of_normals_vcf | no | path | VCF file of sites observed in normal. Currently available for GRCh38: `/hot/ref/tool-specific-input/GATK/GRCh38/1000g_pon.hg38.vcf.gz`. This could be useful for tumor only mode. |
| germline_resource_gnomad_vcf | no | path | A stripped down version of the [gnomAD VCF](https://gnomad.broadinstitute.org/) stripped of all unneeded INFO fields, keeping only AF, currently available for GRCh38:`/hot/resource/tool-specific-input/GATK/GRCh38/af-only-gnomad.hg38.vcf.gz` and GRCh37: `/hot/resource/tool-specific-input/GATK/GRCh37/af-only-gnomad.raw.sites.vcf`. |
| panel_of_normals_vcf | no | path | VCF file of sites observed in normal. Currently available for GRCh38: `/hot/resource/tool-specific-input/GATK/GRCh38/1000g_pon.hg38.vcf.gz`. This could be useful for tumor only mode. |
#### MuSE Specific Configuration
| Input | Required | Type | Description |
Expand Down Expand Up @@ -348,7 +348,7 @@ Tumor BAM: `/hot/resource/pipeline_testing_set/WGS/GRCh38/A/full/CPCG0000000196-
|call_sIndel_Manta |1h 35m 25s |1848.6% |11.7 GB |
|call_sSNV_Strelka2 |59m 19s |3234.0% |8.2 GB |
Therefore, we strongly suggest to use the `--callRegions` if the non-canonical region is unnecessary. `-callRegions`'s input `bed.gz` file can be found here: `/hot/ref/tool-specific-input/Strelka2/GRCh38/strelka2_call_region.bed.gz`. For other genome version, you can use [UCSC Liftover](https://genome.ucsc.edu/cgi-bin/hgLiftOver) to convert.
Therefore, we strongly suggest to use the `--callRegions` if the non-canonical region is unnecessary. `-callRegions`'s input `bed.gz` file can be found here: `/hot/resource/tool-specific-input/Strelka2/GRCh38/strelka2_call_region.bed.gz`. For other genome version, you can use [UCSC Liftover](https://genome.ucsc.edu/cgi-bin/hgLiftOver) to convert.

#### MuSE v2.0
MuSE v2.0 was tested with a normal/tumor paired CPCG0196 WGS sample on a F32 slurm-dev node.
Expand Down
8 changes: 4 additions & 4 deletions config/template.config
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ includeConfig "${projectDir}/config/methods.config"

params {
algorithm = [] // 'somaticsniper', 'strelka2', 'mutect2', 'muse'
reference = '/hot/ref/reference/GRCh38-BI-20160721/Homo_sapiens_assembly38.fasta'
intersect_regions = '/hot/ref/tool-specific-input/pipeline-call-sSNV-6.0.0/GRCh38-BI-20160721/Homo_sapiens_assembly38_no-decoy.bed.gz'
reference = '/hot/resource/reference-genome/GRCh38-BI-20160721/Homo_sapiens_assembly38.fasta'
intersect_regions = '/hot/resource/tool-specific-input/pipeline-call-sSNV-6.0.0/GRCh38-BI-20160721/Homo_sapiens_assembly38_no-decoy.bed.gz'
output_dir = ''
dataset_id = ''
// set params.exome to TRUE will add the '--exome' option when running Manta and Strelka2
Expand All @@ -31,10 +31,10 @@ params {
filter_mutect_calls_extra_args = ''
gatk_command_mem_diff = 500.MB
scatter_count = 50
germline_resource_gnomad_vcf = '/hot/ref/tool-specific-input/GATK/GRCh38/af-only-gnomad.hg38.vcf.gz'
germline_resource_gnomad_vcf = '/hot/resource/tool-specific-input/GATK/GRCh38/af-only-gnomad.hg38.vcf.gz'

// MuSE options
dbSNP = '/hot/ref/database/dbSNP-155/original/GRCh38/GCF_000001405.39.gz'
dbSNP = '/hot/resource/database/dbSNP-155/original/GRCh38/GCF_000001405.39.gz'

// Variant Intersection options
ncbi_build = 'GRCh38'
Expand Down
6 changes: 3 additions & 3 deletions input/example-test-multi-sample.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@ patient_id: 'TWGSAMIN000001'
# For multi samples, list the BAMs under the corresponding state (normal or tumor).
input:
normal:
- BAM: /hot/resource/SMC-HET/normal/bams/A-mini/n2/output/HG002.N-n2.bam
- BAM: /hot/data/unregistered/SMC-HET/normal/bams/A-mini/n2/output/HG002.N-n2.bam
tumor:
- BAM: /hot/resource/SMC-HET/tumours/A-mini/bams/n2/output/S2.T-n2.bam
- BAM: /hot/data/unregistered/SMC-HET/tumours/A-mini/bams/n2/output/S2.T-n2.bam
contamination_table: /hot/software/pipeline/pipeline-call-sSNV/Nextflow/development/input/data/A-mini/S2.T-n2_getpileupsummaries_calculatecontamination.table
- BAM: /hot/resource/SMC-HET/tumours/A-mini/bams/n1/output/S2.T-n1.bam
- BAM: /hot/data/unregistered/SMC-HET/tumours/A-mini/bams/n1/output/S2.T-n1.bam
contamination_table: /hot/software/pipeline/pipeline-call-sSNV/Nextflow/development/input/data/A-mini/S2.T-n1_getpileupsummaries_calculatecontamination.table
2 changes: 1 addition & 1 deletion input/example-test-tumor-only.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,5 @@
patient_id: 'TWGSAMIN000001'
input:
tumor:
- BAM: /hot/resource/SMC-HET/tumours/A-mini/bams/n2/output/S2.T-n2.bam
- BAM: /hot/data/unregistered/SMC-HET/tumours/A-mini/bams/n2/output/S2.T-n2.bam
contamination_table: /hot/software/pipeline/pipeline-call-sSNV/Nextflow/development/input/data/A-mini/S2.T-n1_getpileupsummaries_calculatecontamination.table
4 changes: 2 additions & 2 deletions input/example-test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ patient_id: 'TWGSAMIN000001'
# For multi samples, just list all the bams under the normal or tumor.
input:
normal:
- BAM: /hot/resource/SMC-HET/normal/bams/A-mini/n2/output/HG002.N-n2.bam
- BAM: /hot/data/unregistered/SMC-HET/normal/bams/A-mini/n2/output/HG002.N-n2.bam
tumor:
- BAM: /hot/resource/SMC-HET/tumours/A-mini/bams/n2/output/S2.T-n2.bam
- BAM: /hot/data/unregistered/SMC-HET/tumours/A-mini/bams/n2/output/S2.T-n2.bam
contamination_table: /hot/software/pipeline/pipeline-call-sSNV/Nextflow/development/input/data/A-mini/S2.T-n1_getpileupsummaries_calculatecontamination.table
8 changes: 4 additions & 4 deletions test/config/a_mini-all-tools.config
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ includeConfig "${projectDir}/config/methods.config"

params {
algorithm = ['somaticsniper', 'strelka2', 'mutect2', 'muse']
reference = '/hot/ref/reference/GRCh38-BI-20160721/Homo_sapiens_assembly38.fasta'
intersect_regions = '/hot/ref/tool-specific-input/pipeline-call-sSNV-6.0.0/GRCh38-BI-20160721/Homo_sapiens_assembly38_no-decoy.bed.gz'
reference = '/hot/resource/reference-genome/GRCh38-BI-20160721/Homo_sapiens_assembly38.fasta'
intersect_regions = '/hot/resource/tool-specific-input/pipeline-call-sSNV-6.0.0/GRCh38-BI-20160721/Homo_sapiens_assembly38_no-decoy.bed.gz'
dataset_id = 'TWGSAMIN'
// setting params.exome to TRUE will add the '--exome' option when running manta and strelka2 and the -E option when running MuSE
exome = false
Expand All @@ -29,10 +29,10 @@ params {
filter_mutect_calls_extra_args = ''
gatk_command_mem_diff = 500.MB
scatter_count = 50
germline_resource_gnomad_vcf = '/hot/ref/tool-specific-input/GATK/GRCh38/af-only-gnomad.hg38.vcf.gz'
germline_resource_gnomad_vcf = '/hot/resource/tool-specific-input/GATK/GRCh38/af-only-gnomad.hg38.vcf.gz'

// MuSE options
dbSNP = '/hot/ref/database/dbSNP-155/original/GRCh38/GCF_000001405.39.gz'
dbSNP = '/hot/resource/database/dbSNP-155/original/GRCh38/GCF_000001405.39.gz'

// Intersect options
ncbi_build = 'GRCh38'
Expand Down
8 changes: 4 additions & 4 deletions test/config/a_mini-muse.config
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ includeConfig "${projectDir}/config/methods.config"

params {
algorithm = ['muse'] // 'somaticsniper', 'strelka2', 'mutect2', 'muse'
reference = '/hot/ref/reference/GRCh38-BI-20160721/Homo_sapiens_assembly38.fasta'
intersect_regions = '/hot/ref/tool-specific-input/pipeline-call-sSNV-6.0.0/GRCh38-BI-20160721/Homo_sapiens_assembly38_no-decoy.bed.gz'
reference = '/hot/resource/reference-genome/GRCh38-BI-20160721/Homo_sapiens_assembly38.fasta'
intersect_regions = '/hot/resource/tool-specific-input/pipeline-call-sSNV-6.0.0/GRCh38-BI-20160721/Homo_sapiens_assembly38_no-decoy.bed.gz'
dataset_id = 'TWGSAMIN'
// set params.exome to TRUE will add the '--exome' option when running manta and strelka2
// set params.exome to TRUE will add the '-E' option when running MuSE
Expand All @@ -30,10 +30,10 @@ params {
filter_mutect_calls_extra_args = ''
gatk_command_mem_diff = 500.MB
scatter_count = 50
germline_resource_gnomad_vcf = '/hot/ref/tool-specific-input/GATK/GRCh38/af-only-gnomad.hg38.vcf.gz'
germline_resource_gnomad_vcf = '/hot/resource/tool-specific-input/GATK/GRCh38/af-only-gnomad.hg38.vcf.gz'

// MuSE options
dbSNP = '/hot/ref/database/dbSNP-155/original/GRCh38/GCF_000001405.39.gz'
dbSNP = '/hot/resource/database/dbSNP-155/original/GRCh38/GCF_000001405.39.gz'

// Intersect options
ncbi_build = 'GRCh38'
Expand Down
8 changes: 4 additions & 4 deletions test/config/a_mini-mutect2.config
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ includeConfig "${projectDir}/config/methods.config"

params {
algorithm = ['mutect2'] // 'somaticsniper', 'strelka2', 'mutect2', 'muse'
reference = '/hot/ref/reference/GRCh38-BI-20160721/Homo_sapiens_assembly38.fasta'
intersect_regions = '/hot/ref/tool-specific-input/pipeline-call-sSNV-6.0.0/GRCh38-BI-20160721/Homo_sapiens_assembly38_no-decoy.bed.gz'
reference = '/hot/resource/reference-genome/GRCh38-BI-20160721/Homo_sapiens_assembly38.fasta'
intersect_regions = '/hot/resource/tool-specific-input/pipeline-call-sSNV-6.0.0/GRCh38-BI-20160721/Homo_sapiens_assembly38_no-decoy.bed.gz'
dataset_id = 'TWGSAMIN'
// set params.exome to TRUE will add the '--exome' option when running manta and strelka2
// set params.exome to TRUE will add the '-E' option when running MuSE
Expand All @@ -30,10 +30,10 @@ params {
filter_mutect_calls_extra_args = ''
gatk_command_mem_diff = 500.MB
scatter_count = 50
germline_resource_gnomad_vcf = '/hot/ref/tool-specific-input/GATK/GRCh38/af-only-gnomad.hg38.vcf.gz'
germline_resource_gnomad_vcf = '/hot/resource/tool-specific-input/GATK/GRCh38/af-only-gnomad.hg38.vcf.gz'

// MuSE options
dbSNP = '/hot/ref/database/dbSNP-155/original/GRCh38/GCF_000001405.39.gz'
dbSNP = '/hot/resource/database/dbSNP-155/original/GRCh38/GCF_000001405.39.gz'

// Intersect options
ncbi_build = 'GRCh38'
Expand Down
8 changes: 4 additions & 4 deletions test/config/a_mini-somaticsniper.config
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ includeConfig "${projectDir}/config/methods.config"

params {
algorithm = ['somaticsniper'] // 'somaticsniper', 'strelka2', 'mutect2', 'muse'
reference = '/hot/ref/reference/GRCh38-BI-20160721/Homo_sapiens_assembly38.fasta'
intersect_regions = '/hot/ref/tool-specific-input/pipeline-call-sSNV-6.0.0/GRCh38-BI-20160721/Homo_sapiens_assembly38_no-decoy.bed.gz'
reference = '/hot/resource/reference-genome/GRCh38-BI-20160721/Homo_sapiens_assembly38.fasta'
intersect_regions = '/hot/resource/tool-specific-input/pipeline-call-sSNV-6.0.0/GRCh38-BI-20160721/Homo_sapiens_assembly38_no-decoy.bed.gz'
dataset_id = 'TWGSAMIN'
// set params.exome to TRUE will add the '--exome' option when running manta and strelka2
// set params.exome to TRUE will add the '-E' option when running MuSE
Expand All @@ -30,10 +30,10 @@ params {
filter_mutect_calls_extra_args = ''
gatk_command_mem_diff = 500.MB
scatter_count = 50
germline_resource_gnomad_vcf = '/hot/ref/tool-specific-input/GATK/GRCh38/af-only-gnomad.hg38.vcf.gz'
germline_resource_gnomad_vcf = '/hot/resource/tool-specific-input/GATK/GRCh38/af-only-gnomad.hg38.vcf.gz'

// MuSE options
dbSNP = '/hot/ref/database/dbSNP-155/original/GRCh38/GCF_000001405.39.gz'
dbSNP = '/hot/resource/database/dbSNP-155/original/GRCh38/GCF_000001405.39.gz'

// Intersect options
ncbi_build = 'GRCh38'
Expand Down
8 changes: 4 additions & 4 deletions test/config/a_mini-strelka2.config
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ includeConfig "${projectDir}/config/methods.config"

params {
algorithm = ['strelka2'] // 'somaticsniper', 'strelka2', 'mutect2', 'muse'
reference = '/hot/ref/reference/GRCh38-BI-20160721/Homo_sapiens_assembly38.fasta'
intersect_regions = '/hot/ref/tool-specific-input/pipeline-call-sSNV-6.0.0/GRCh38-BI-20160721/Homo_sapiens_assembly38_no-decoy.bed.gz'
reference = '/hot/resource/reference-genome/GRCh38-BI-20160721/Homo_sapiens_assembly38.fasta'
intersect_regions = '/hot/resource/tool-specific-input/pipeline-call-sSNV-6.0.0/GRCh38-BI-20160721/Homo_sapiens_assembly38_no-decoy.bed.gz'
dataset_id = 'TWGSAMIN'
// set params.exome to TRUE will add the '--exome' option when running manta and strelka2
// set params.exome to TRUE will add the '-E' option when running MuSE
Expand All @@ -30,10 +30,10 @@ params {
filter_mutect_calls_extra_args = ''
gatk_command_mem_diff = 500.MB
scatter_count = 50
germline_resource_gnomad_vcf = '/hot/ref/tool-specific-input/GATK/GRCh38/af-only-gnomad.hg38.vcf.gz'
germline_resource_gnomad_vcf = '/hot/resource/tool-specific-input/GATK/GRCh38/af-only-gnomad.hg38.vcf.gz'

// MuSE options
dbSNP = '/hot/ref/database/dbSNP-155/original/GRCh38/GCF_000001405.39.gz'
dbSNP = '/hot/resource/database/dbSNP-155/original/GRCh38/GCF_000001405.39.gz'

// Intersect options
ncbi_build = 'GRCh38'
Expand Down
8 changes: 4 additions & 4 deletions test/config/a_mini-two-tools.config
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ includeConfig "${projectDir}/config/methods.config"

params {
algorithm = ['somaticsniper', 'strelka2']
reference = '/hot/ref/reference/GRCh38-BI-20160721/Homo_sapiens_assembly38.fasta'
intersect_regions = '/hot/ref/tool-specific-input/pipeline-call-sSNV-6.0.0/GRCh38-BI-20160721/Homo_sapiens_assembly38_no-decoy.bed.gz'
reference = '/hot/resource/reference-genome/GRCh38-BI-20160721/Homo_sapiens_assembly38.fasta'
intersect_regions = '/hot/resource/tool-specific-input/pipeline-call-sSNV-6.0.0/GRCh38-BI-20160721/Homo_sapiens_assembly38_no-decoy.bed.gz'
dataset_id = 'TWGSAMIN'
// setting params.exome to TRUE will add the '--exome' option when running manta and strelka2 and the -E option when running MuSE
exome = false
Expand All @@ -29,10 +29,10 @@ params {
filter_mutect_calls_extra_args = ''
gatk_command_mem_diff = 500.MB
scatter_count = 50
germline_resource_gnomad_vcf = '/hot/ref/tool-specific-input/GATK/GRCh38/af-only-gnomad.hg38.vcf.gz'
germline_resource_gnomad_vcf = '/hot/resource/tool-specific-input/GATK/GRCh38/af-only-gnomad.hg38.vcf.gz'

// MuSE options
dbSNP = '/hot/ref/database/dbSNP-155/original/GRCh38/GCF_000001405.39.gz'
dbSNP = '/hot/resource/database/dbSNP-155/original/GRCh38/GCF_000001405.39.gz'

// Intersect options
ncbi_build = 'GRCh38'
Expand Down
Loading

0 comments on commit 0e834a5

Please sign in to comment.