These open source metagenomics workflows are intended to analyze the biological contents of complex environmental samples. The expected input is paired-end Illumina FASTQ files, and the current outputs include filtered reads, assembled contigs, MultiQC reports for FastQC and QUAST results, metagenome comparison estimates, taxonomic classifications, and functional predictions.
The wiki for this project has helpful instructions for installing and running the workflows.
These workflows have been tested to run offline on Linux operating systems, including CentOS, Red Hat, and Ubuntu.
The workflows have been tested with a subsampled dataset from this publication:
The original Shakya et al. 2013 dataset is available online as SRR606249. The subsampled dataset, which was used as the default example in our metagenomics workflows, can be downloaded here:
Please read CONTRIBUTING.md for details on our code of conduct and how to contribute to this project.
This software is licensed under the BSD 3-Clause License.
This project builds off work that began in the Dahak project. A variety of open source tools are used within the workflows, and more information about those tools is available in the DEPENDENCY_LICENSES file.