Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 162 add assembler qc report #167

Merged
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,4 @@ testing*
.screenrc
eggnog
kofam/
eukulele/
7 changes: 6 additions & 1 deletion CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@

- [Prodigal](https://github.com/hyattpd/Prodigal)

- [BBmap] (https://sourceforge.net/projects/bbmap/)
- [BBmap](https://sourceforge.net/projects/bbmap/)

- [FeatureCounts](https://subread.sourceforge.net)

Expand Down Expand Up @@ -73,8 +73,13 @@
- [EUKulele](https://github.com/AlexanderLabWHOI/EUKulele)

- [CAT](https://github.com/dutilh/CAT)

tfalkarkea marked this conversation as resolved.
Show resolved Hide resolved
> von Meijenfeldt FAB, Arkhipova K, Cambuy DD, Coutinho FH, Dutilh BE. Robust taxonomic classification of uncharted microbial sequences and bins with CAT and BAT. Genome Biology. 2019;20:217.

- [transrate](https://hibberdlab.com/transrate/)

> TransRate: reference free quality assessment of de-novo transcriptome assemblies (2016). Richard D Smith-Unna, Chris Boursnell, Rob Patro, Julian M Hibberd, Steven Kelly. Genome Research doi: [http://dx.doi.org/10.1101/gr.196469.115](http://dx.doi.org/10.1101/gr.196469.115)

## Software packaging/containerisation tools

- [Anaconda](https://anaconda.com)
Expand Down
14 changes: 14 additions & 0 deletions assets/multiqc_config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,17 @@ report_section_order:
order: -1002

export_plots: true

custom_data:
megahit_assemblies:
description: "Describes assembly statistics, generated by TransRate."
plot_type: table
rnaspades_assemblies:
description: "Describes assembly statistics, generated by TransRate."
plot_type: table

custom_plot_config:
megahit_assemblies-plot:
col1_header: "File Name"
rnaspades_assemblies-plot:
col1_header: "File Name"
41 changes: 41 additions & 0 deletions modules/local/transrate.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
process TRANSRATE {
tag "$meta.id"
label 'process_low'

conda "bioconda::transrate=1.0.3"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/transrate:1.0.3--hec16e2b_4':
'biocontainers/transrate:1.0.3--hec16e2b_4' }"

input:
tuple val(meta), path(assembly)

output:
tuple val(meta), path("*assemblies_mqc.csv") , emit: assembly_qc
path "versions.yml" , emit: versions

when:
task.ext.when == null || task.ext.when

script:
def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${meta.id}"

"""

transrate \\
--threads $task.cpus \\
--assembly $assembly \\
--output ${prefix}_transrate \\
$args

mv ${prefix}_transrate/assemblies.csv ${prefix}_assemblies_mqc.csv

# transrate flashes a warning about a ruby gem being out of date, so call the version before it is being piped into the yaml
transrate --version > version.txt
cat <<-END_VERSIONS > versions.yml
"${task.process}":
transrate: \$(cat version.txt)
END_VERSIONS
"""
}
19 changes: 16 additions & 3 deletions workflows/metatdenovo.nf
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,7 @@ include { FORMATSPADES } from '../modules/local/formatspades
include { UNPIGZ as UNPIGZ_CONTIGS } from '../modules/local/unpigz'
include { UNPIGZ as UNPIGZ_GFF } from '../modules/local/unpigz'
include { MERGE_TABLES } from '../modules/local/merge_summary_tables'
include { TRANSRATE } from '../modules/local/transrate'

//
// SUBWORKFLOW: Consisting of a mix of local and nf-core/modules
Expand Down Expand Up @@ -385,7 +386,7 @@ workflow METATDENOVO {
BAM_SORT_STATS_SAMTOOLS ( BBMAP_ALIGN.out.bam, ch_assembly_contigs )
ch_versions = ch_versions.mix(BAM_SORT_STATS_SAMTOOLS.out.versions)

// if ( orf_caller ==
// if ( orf_caller ==
BAM_SORT_STATS_SAMTOOLS.out.bam
.combine(ch_gff.map { it[1] } )
.set { ch_featurecounts }
Expand Down Expand Up @@ -448,6 +449,12 @@ workflow METATDENOVO {
.set { ch_merge_tables }

}

// set up contig channel to use in CAT and TransRate
UNPIGZ_CONTIGS(ch_assembly_contigs)
ch_unzipped_contigs = UNPIGZ_CONTIGS.out.unzipped
ch_versions = ch_versions.mix(UNPIGZ_CONTIGS.out.versions)

//
// CAT: Bin Annotation Tool (BAT) are pipelines for the taxonomic classification of long DNA sequences and metagenome assembled genomes (MAGs/bins)
//
Expand All @@ -460,9 +467,8 @@ workflow METATDENOVO {
CAT_DB_GENERATE ()
ch_cat_db = CAT_DB_GENERATE.out.db
}
UNPIGZ_CONTIGS(ch_assembly_contigs)
CAT_CONTIGS (
UNPIGZ_CONTIGS.out.unzipped,
ch_unzipped_contigs,
ch_cat_db
)
CAT_SUMMARY(
Expand All @@ -472,6 +478,12 @@ workflow METATDENOVO {
ch_versions = ch_versions.mix(CAT_SUMMARY.out.versions)
}

//
// MODULE: Use TransRate to judge assembly quality, piped into MultiQC
//
TRANSRATE(ch_unzipped_contigs)
ch_versions = ch_versions.mix(TRANSRATE.out.versions)

//
// SUBWORKFLOW: Eukulele
//
Expand Down Expand Up @@ -526,6 +538,7 @@ workflow METATDENOVO {

ch_multiqc_files = ch_multiqc_files.mix(CUSTOM_DUMPSOFTWAREVERSIONS.out.mqc_yml.collect())
ch_multiqc_files = ch_multiqc_files.mix(FASTQC_TRIMGALORE.out.trim_zip.collect{it[1]}.ifEmpty([]))
ch_multiqc_files = ch_multiqc_files.mix(TRANSRATE.out.assembly_qc.collect{it[1]}.ifEmpty([]))
ch_multiqc_files = ch_multiqc_files.mix(BAM_SORT_STATS_SAMTOOLS.out.idxstats.collect{it[1]}.ifEmpty([]))
ch_multiqc_files = ch_multiqc_files.mix(FEATURECOUNTS_CDS.out.summary.collect{it[1]}.ifEmpty([]))

Expand Down
Loading