Merge pull request #187 from nf-core/variant_calling

address suggestions from Chris
nf-core · Jun 20, 2022 · bc81eeb · bc81eeb
2 parents 3255a97 + f3319f1
commit bc81eeb
Show file tree

Hide file tree

Showing 5 changed files with 17 additions and 14 deletions.
diff --git a/CITATIONS.md b/CITATIONS.md
@@ -64,7 +64,7 @@
 
 - [pycoQC](https://doi.org/10.21105/joss.01236)
 
-  > Leger A, Leonardi T, (2019). pycoQC, interactive quality control for Oxford Nanopore Sequencing. Journal of Open Source Software, 4(34), 1236.
+  > Leger A, Leonardi T, (2019). pycoQC, interactive quality control for Oxford Nanopore Sequencing. Journal of Open Source Software, 4(34), 1236, https://doi.org/10.21105/joss.01236
 
 - [qcat](https://github.com/nanoporetech/qcat)
 

diff --git a/conf/test_nobc_nodx_vc.config b/conf/test_nobc_nodx_vc.config
@@ -24,6 +24,6 @@ params {
     skip_demultiplexing = true
     call_variants       = true
 
-    variant_caller      = 'deepvariant'
-    structural_variant_caller = 'cutesv'
+    variant_caller      = 'medaka'
+    structural_variant_caller = 'sniffles'
 }
diff --git a/docs/output.md b/docs/output.md
@@ -8,7 +8,7 @@ This document describes the output produced by the nf-core/nanoseq pipeline. Mos
 
 ## Pipeline overview
 
-The nf-core/nanoseq pipeline is built using [Nextflow](https://www.nextflow.io/). There are many different potential outputs for the pipeline depending on what file inputs and parameters you use. Please see [`main README.md`](../README.md) for a condensed overview of the steps in the pipeline, and the bioinformatics tools used at each step.
+The nf-core/nanoseq pipeline is built using [Nextflow](https://www.nextflow.io/). There are many different potential outputs for the pipeline depending on what file inputs and parameters you use. Please see [`main README.md`](../README.md) for a condensed overview of the steps in the pipeline and the bioinformatics tools used at each step.
 
 See [Oxford NanoPore website](https://nanoporetech.com/) for more information regarding the sequencing technology, protocol, and for an extensive list of additional resources.
 
@@ -54,7 +54,7 @@ _Documentation_:
 [NanoLyse](https://github.com/wdecoster/nanolyse)
 
 _Description_:
-If you would like to run NanoLyse on the raw FASTQ files you can provide `--run_nanolyse` when running the pipeline. By default, the pipeline will filter lambda phage reads, however you can provide your own FASTA file of "contaminants" with `--nanolyse_fasta`. The filtered FASTQ files will contain raw reads without the specified reference sequences (default: lambda phage sequences).
+If you would like to run NanoLyse on the raw FASTQ files you can provide `--run_nanolyse` when running the pipeline. By default, the pipeline will filter lambda phage reads. However, you can provide your own FASTA file of "contaminants" with `--nanolyse_fasta`. The filtered FASTQ files will contain raw reads without the specified reference sequences (default: lambda phage sequences).
 
 ## Sequencing QC
 
@@ -71,7 +71,7 @@ _Documentation_:
 [PycoQC](https://github.com/a-slide/pycoQC), [NanoPlot](https://github.com/wdecoster/NanoPlot)
 
 _Description_:
-_PycoQC_ and _NanoPlot_ can compute QC metrics and generate plots using the sequencing summary information generated by _Guppy_ e.g. distribution of read length, read length over time, number of reads per barcode and other general stats. _NanoPlot_ can also generates QC metrics directly from FASTQ files as described in the next section.
+_PycoQC_ and _NanoPlot_ can compute QC metrics and generate plots using the sequencing summary information generated by _Guppy_, e.g., distribution of read length, read length over time, number of reads per barcode and other general stats. _NanoPlot_ can also generate QC metrics directly from FASTQ files as described in the next section.
 
 ![PycoQC - Number of reads per barcode](images/pycoqc_readsperbarcode.png)
 
@@ -90,11 +90,11 @@ _Documentation_:
 [NanoPlot](https://github.com/wdecoster/NanoPlot), [FastQC](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/)
 
 _Description_:
-_NanoPlot_ can be used to produce general quality metrics from the per barcode FASTQ files generated by _Guppy_ e.g. quality score distribution, read lengths and other general stats.
+_NanoPlot_ can be used to produce general quality metrics from the per barcode FASTQ files generated by _Guppy_ e.g. quality score distribution, read lengths, and other general stats.
 
 ![Nanoplot - Read quality vs read length](images/nanoplot_readlengthquality.png)
 
-_FastQC_ can give general quality metrics about your reads. It can provide information about the quality score distribution across your reads, the per base sequence content (%A/C/G/T). You can also generate information about adapter contamination and other over-represented sequences.
+_FastQC_ can give general quality metrics about your reads. It can provide information about the quality score distribution across your reads, and the per-base sequence content (%A/C/G/T). You can also generate information about adapter contamination and other over-represented sequences.
 
 ## Alignment
 
@@ -131,7 +131,7 @@ _Documentation_:
 [BEDTools](https://bedtools.readthedocs.io/en/latest/), [bedGraphToBigWig](https://genome.ucsc.edu/goldenpath/help/bigWig.html#Ex3), [`bedToBigBed`](https://genome.ucsc.edu/goldenPath/help/bigBed.html#Ex2)
 
 _Description_:
-The [bigWig](https://genome.ucsc.edu/goldenpath/help/bigWig.html) format is in an indexed binary format useful for displaying dense, continuous data in Genome Browsers such as the [UCSC](https://genome.ucsc.edu/cgi-bin/hgTracks) and [IGV](http://software.broadinstitute.org/software/igv/). This mitigates the need to load the much larger BAM file for data visualisation purposes which will be slower and result in memory issues. The bigWig format is also supported by various bioinformatics software for downstream processing such as meta-profile plotting.
+The [bigWig](https://genome.ucsc.edu/goldenpath/help/bigWig.html) format is an indexed binary format useful for displaying dense, continuous data in Genome Browsers such as the [UCSC](https://genome.ucsc.edu/cgi-bin/hgTracks) and [IGV](http://software.broadinstitute.org/software/igv/). This mitigates the need to load the much larger BAM file for data visualisation purposes which will be slower and result in memory issues. The bigWig format is also supported by various bioinformatics software for downstream processing such as meta-profile plotting.
 
 [bigBed](https://genome.ucsc.edu/goldenPath/help/bigBed.html) are more useful for displaying distribution of reads across exon intervals as is typically observed for RNA-seq data. Therefore, these files will only be generated if `--protocol directRNA` or `--protocol cDNA` are defined.
 
@@ -176,7 +176,7 @@ _Documentation_:
 [Sniffles](https://github.com/fritzsedlazeck/Sniffles), [cuteSV](https://github.com/tjiangHIT/cuteSV)
 
 _Description_:
-If the `--protocol DNA` and the `--call_variants` parameters are defined small and structural variants.
+If the `--protocol DNA` and the `--call_variants` parameters are defined then both small and structural variant variant calls can be generated.
 Short variants can be called using _medaka_, _deepvariant_ or _pepper_margin_deepvariant_. The short variant caller is specified using the `--variant_caller` parameter.
 Structural variants can be called using either _cuteSV_ or _sniffles_. The structural variant caller is specified using the `--structural_variant_caller` parameter.
 The short variant and/or structural variant calling steps is skipped if using the `--skip_vc` and `--skip_sniffles` flags.
@@ -189,9 +189,9 @@ The short variant and/or structural variant calling steps is skipped if using th
 If bambu is used:
 
 - `bambu/`
-  - `extended_annotations.gtf`: a GTF file that contains both annotated and novel transcripts
-  - `counts_gene.txt`: a TXT file containing gene expression estimates
-  - `counts_transcript.txt`: a TXT file containing transcript expression estimates
+  - `extended_annotations.gtf`: a GTF file that contains both annotated and novel transcripts.
+  - `counts_gene.txt`: a TXT file containing gene expression estimates.
+  - `counts_transcript.txt`: a TXT file containing transcript expression estimates.
 
 If StringTie2 is used:
 

diff --git a/modules/local/bam_rename.nf b/modules/local/bam_rename.nf
@@ -5,6 +5,9 @@ process BAM_RENAME {
     //container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
     //    'https://containers.biocontainers.pro/s3/SingImgsRepo/biocontainers/v1.2.0_cv1/biocontainers_v1.2.0_cv1.img' :
     //    'quay.io/biocontainers/biocontainers:latest' }"
+    container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
+        'https://depot.galaxyproject.org/singularity/sed:4.7.0' :
+        'quay.io/biocontainers/sed:4.7.0' }"
 
     input:
     tuple val(meta), path(bam)

diff --git a/workflows/nanoseq.nf b/workflows/nanoseq.nf
@@ -92,7 +92,7 @@ if (params.call_variants) {
     if (!params.skip_sv && params.structural_variant_caller != 'sniffles' && params.structural_variant_caller != 'cutesv') {
         exit 1, "Invalid structural variant caller option: ${params.structural_variant_caller}. Valid options: 'sniffles', 'cutesv"
     }
-    if (params.enable_conda && params.variant_caller != 'medaka') {
+    if (!params.skip_vc && params.enable_conda && params.variant_caller != 'medaka') {
         exit 1, "Conda environments cannot be used when using the deepvariant or pepper_margin_deepvariant tools. Valid options: 'docker', 'singularity'"
     }
 }