Skip to content

Commit

Permalink
Merge pull request #187 from nf-core/variant_calling
Browse files Browse the repository at this point in the history
address suggestions from Chris
  • Loading branch information
yuukiiwa authored Jun 20, 2022
2 parents 3255a97 + f3319f1 commit bc81eeb
Show file tree
Hide file tree
Showing 5 changed files with 17 additions and 14 deletions.
2 changes: 1 addition & 1 deletion CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@
- [pycoQC](https://doi.org/10.21105/joss.01236)

> Leger A, Leonardi T, (2019). pycoQC, interactive quality control for Oxford Nanopore Sequencing. Journal of Open Source Software, 4(34), 1236.
> Leger A, Leonardi T, (2019). pycoQC, interactive quality control for Oxford Nanopore Sequencing. Journal of Open Source Software, 4(34), 1236, https://doi.org/10.21105/joss.01236
- [qcat](https://github.com/nanoporetech/qcat)

Expand Down
4 changes: 2 additions & 2 deletions conf/test_nobc_nodx_vc.config
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,6 @@ params {
skip_demultiplexing = true
call_variants = true

variant_caller = 'deepvariant'
structural_variant_caller = 'cutesv'
variant_caller = 'medaka'
structural_variant_caller = 'sniffles'
}
20 changes: 10 additions & 10 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ This document describes the output produced by the nf-core/nanoseq pipeline. Mos

## Pipeline overview

The nf-core/nanoseq pipeline is built using [Nextflow](https://www.nextflow.io/). There are many different potential outputs for the pipeline depending on what file inputs and parameters you use. Please see [`main README.md`](../README.md) for a condensed overview of the steps in the pipeline, and the bioinformatics tools used at each step.
The nf-core/nanoseq pipeline is built using [Nextflow](https://www.nextflow.io/). There are many different potential outputs for the pipeline depending on what file inputs and parameters you use. Please see [`main README.md`](../README.md) for a condensed overview of the steps in the pipeline and the bioinformatics tools used at each step.

See [Oxford NanoPore website](https://nanoporetech.com/) for more information regarding the sequencing technology, protocol, and for an extensive list of additional resources.

Expand Down Expand Up @@ -54,7 +54,7 @@ _Documentation_:
[NanoLyse](https://github.com/wdecoster/nanolyse)

_Description_:
If you would like to run NanoLyse on the raw FASTQ files you can provide `--run_nanolyse` when running the pipeline. By default, the pipeline will filter lambda phage reads, however you can provide your own FASTA file of "contaminants" with `--nanolyse_fasta`. The filtered FASTQ files will contain raw reads without the specified reference sequences (default: lambda phage sequences).
If you would like to run NanoLyse on the raw FASTQ files you can provide `--run_nanolyse` when running the pipeline. By default, the pipeline will filter lambda phage reads. However, you can provide your own FASTA file of "contaminants" with `--nanolyse_fasta`. The filtered FASTQ files will contain raw reads without the specified reference sequences (default: lambda phage sequences).

## Sequencing QC

Expand All @@ -71,7 +71,7 @@ _Documentation_:
[PycoQC](https://github.com/a-slide/pycoQC), [NanoPlot](https://github.com/wdecoster/NanoPlot)

_Description_:
_PycoQC_ and _NanoPlot_ can compute QC metrics and generate plots using the sequencing summary information generated by _Guppy_ e.g. distribution of read length, read length over time, number of reads per barcode and other general stats. _NanoPlot_ can also generates QC metrics directly from FASTQ files as described in the next section.
_PycoQC_ and _NanoPlot_ can compute QC metrics and generate plots using the sequencing summary information generated by _Guppy_, e.g., distribution of read length, read length over time, number of reads per barcode and other general stats. _NanoPlot_ can also generate QC metrics directly from FASTQ files as described in the next section.

![PycoQC - Number of reads per barcode](images/pycoqc_readsperbarcode.png)

Expand All @@ -90,11 +90,11 @@ _Documentation_:
[NanoPlot](https://github.com/wdecoster/NanoPlot), [FastQC](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/)

_Description_:
_NanoPlot_ can be used to produce general quality metrics from the per barcode FASTQ files generated by _Guppy_ e.g. quality score distribution, read lengths and other general stats.
_NanoPlot_ can be used to produce general quality metrics from the per barcode FASTQ files generated by _Guppy_ e.g. quality score distribution, read lengths, and other general stats.

![Nanoplot - Read quality vs read length](images/nanoplot_readlengthquality.png)

_FastQC_ can give general quality metrics about your reads. It can provide information about the quality score distribution across your reads, the per base sequence content (%A/C/G/T). You can also generate information about adapter contamination and other over-represented sequences.
_FastQC_ can give general quality metrics about your reads. It can provide information about the quality score distribution across your reads, and the per-base sequence content (%A/C/G/T). You can also generate information about adapter contamination and other over-represented sequences.

## Alignment

Expand Down Expand Up @@ -131,7 +131,7 @@ _Documentation_:
[BEDTools](https://bedtools.readthedocs.io/en/latest/), [bedGraphToBigWig](https://genome.ucsc.edu/goldenpath/help/bigWig.html#Ex3), [`bedToBigBed`](https://genome.ucsc.edu/goldenPath/help/bigBed.html#Ex2)

_Description_:
The [bigWig](https://genome.ucsc.edu/goldenpath/help/bigWig.html) format is in an indexed binary format useful for displaying dense, continuous data in Genome Browsers such as the [UCSC](https://genome.ucsc.edu/cgi-bin/hgTracks) and [IGV](http://software.broadinstitute.org/software/igv/). This mitigates the need to load the much larger BAM file for data visualisation purposes which will be slower and result in memory issues. The bigWig format is also supported by various bioinformatics software for downstream processing such as meta-profile plotting.
The [bigWig](https://genome.ucsc.edu/goldenpath/help/bigWig.html) format is an indexed binary format useful for displaying dense, continuous data in Genome Browsers such as the [UCSC](https://genome.ucsc.edu/cgi-bin/hgTracks) and [IGV](http://software.broadinstitute.org/software/igv/). This mitigates the need to load the much larger BAM file for data visualisation purposes which will be slower and result in memory issues. The bigWig format is also supported by various bioinformatics software for downstream processing such as meta-profile plotting.

[bigBed](https://genome.ucsc.edu/goldenPath/help/bigBed.html) are more useful for displaying distribution of reads across exon intervals as is typically observed for RNA-seq data. Therefore, these files will only be generated if `--protocol directRNA` or `--protocol cDNA` are defined.

Expand Down Expand Up @@ -176,7 +176,7 @@ _Documentation_:
[Sniffles](https://github.com/fritzsedlazeck/Sniffles), [cuteSV](https://github.com/tjiangHIT/cuteSV)

_Description_:
If the `--protocol DNA` and the `--call_variants` parameters are defined small and structural variants.
If the `--protocol DNA` and the `--call_variants` parameters are defined then both small and structural variant variant calls can be generated.
Short variants can be called using _medaka_, _deepvariant_ or _pepper_margin_deepvariant_. The short variant caller is specified using the `--variant_caller` parameter.
Structural variants can be called using either _cuteSV_ or _sniffles_. The structural variant caller is specified using the `--structural_variant_caller` parameter.
The short variant and/or structural variant calling steps is skipped if using the `--skip_vc` and `--skip_sniffles` flags.
Expand All @@ -189,9 +189,9 @@ The short variant and/or structural variant calling steps is skipped if using th
If bambu is used:

- `bambu/`
- `extended_annotations.gtf`: a GTF file that contains both annotated and novel transcripts
- `counts_gene.txt`: a TXT file containing gene expression estimates
- `counts_transcript.txt`: a TXT file containing transcript expression estimates
- `extended_annotations.gtf`: a GTF file that contains both annotated and novel transcripts.
- `counts_gene.txt`: a TXT file containing gene expression estimates.
- `counts_transcript.txt`: a TXT file containing transcript expression estimates.

If StringTie2 is used:

Expand Down
3 changes: 3 additions & 0 deletions modules/local/bam_rename.nf
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,9 @@ process BAM_RENAME {
//container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
// 'https://containers.biocontainers.pro/s3/SingImgsRepo/biocontainers/v1.2.0_cv1/biocontainers_v1.2.0_cv1.img' :
// 'quay.io/biocontainers/biocontainers:latest' }"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/sed:4.7.0' :
'quay.io/biocontainers/sed:4.7.0' }"

input:
tuple val(meta), path(bam)
Expand Down
2 changes: 1 addition & 1 deletion workflows/nanoseq.nf
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ if (params.call_variants) {
if (!params.skip_sv && params.structural_variant_caller != 'sniffles' && params.structural_variant_caller != 'cutesv') {
exit 1, "Invalid structural variant caller option: ${params.structural_variant_caller}. Valid options: 'sniffles', 'cutesv"
}
if (params.enable_conda && params.variant_caller != 'medaka') {
if (!params.skip_vc && params.enable_conda && params.variant_caller != 'medaka') {
exit 1, "Conda environments cannot be used when using the deepvariant or pepper_margin_deepvariant tools. Valid options: 'docker', 'singularity'"
}
}
Expand Down

0 comments on commit bc81eeb

Please sign in to comment.