docs: Improve docs (#20)

* docs: Update output doc * docs: Add params documentation * docs: Add mkdocs
dalmolingroup · Aug 28, 2023 · 5185c46 · 5185c46
1 parent 2384147
commit 5185c46
Show file tree

Hide file tree

Showing 8 changed files with 234 additions and 36 deletions.
diff --git a/.github/workflows/docs.yml b/.github/workflows/docs.yml
@@ -0,0 +1,31 @@
+# This workflow will automatically deploy the mkdocs documentation
+# See https://parkererickson.github.io/portfolio/blog/MkDocsCD/
+
+name: docs
+
+on:
+  push:
+    branches: [master]
+  workflow_dispatch:
+
+jobs:
+  build:
+    name: Build and Deploy Documentation
+    runs-on: ubuntu-latest
+    steps:
+      - name: Checkout Master
+        uses: actions/checkout@v2
+
+      - name: Set up Python 3
+        uses: actions/setup-python@v2
+        with:
+          python-version: "3.10.8"
+
+      - name: Install dependencies
+        run: |
+          python -m pip install --upgrade pip
+          pip install mkdocs
+      - name: Deploy
+        run: |
+          git pull
+          mkdocs gh-deploy
diff --git a/README.md b/README.md
@@ -36,22 +36,42 @@ The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool
 
 3. Download the pipeline and test it on a minimal dataset with a single command:
 
-   ```bash
-   nextflow run dalmolingroup/euryale -profile test,YOURPROFILE --outdir <OUTDIR>
-   ```
+```bash
+nextflow run dalmolingroup/euryale -profile test,YOURPROFILE --outdir <OUTDIR>
+```
 
-   Note that some form of configuration will be needed so that Nextflow knows how to fetch the required software. This is usually done in the form of a config profile (`YOURPROFILE` in the example command above). You can chain multiple config profiles in a comma-separated string.
+Note that some form of configuration will be needed so that Nextflow knows how to fetch the required software. This is usually done in the form of a config profile (`YOURPROFILE` in the example command above). You can chain multiple config profiles in a comma-separated string.
 
-   > - The pipeline comes with config profiles called `docker`, `singularity`, `podman`, `shifter`, `charliecloud` and `conda` which instruct the pipeline to use the named tool for software management. For example, `-profile test,docker`.
-   > - Please check [nf-core/configs](https://github.com/nf-core/configs#documentation) to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use `-profile <institute>` in your command. This will enable either `docker` or `singularity` and set the appropriate execution settings for your local compute environment.
-   > - If you are using `singularity`, please use the [`nf-core download`](https://nf-co.re/tools/#downloading-pipelines-for-offline-use) command to download images first, before running the pipeline. Setting the [`NXF_SINGULARITY_CACHEDIR` or `singularity.cacheDir`](https://www.nextflow.io/docs/latest/singularity.html?#singularity-docker-hub) Nextflow options enables you to store and re-use the images from a central location for future pipeline runs.
-   > - If you are using `conda`, it is highly recommended to use the [`NXF_CONDA_CACHEDIR` or `conda.cacheDir`](https://www.nextflow.io/docs/latest/conda.html) settings to store the environments in a central location for future pipeline runs.
+> - The pipeline comes with config profiles called `docker`, `singularity`, `podman`, `shifter`, `charliecloud` and `conda` which instruct the pipeline to use the named tool for software management. For example, `-profile test,docker`.
+> - Please check [nf-core/configs](https://github.com/nf-core/configs#documentation) to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use `-profile <institute>` in your command. This will enable either `docker` or `singularity` and set the appropriate execution settings for your local compute environment.
+> - If you are using `singularity`, please use the [`nf-core download`](https://nf-co.re/tools/#downloading-pipelines-for-offline-use) command to download images first, before running the pipeline. Setting the [`NXF_SINGULARITY_CACHEDIR` or `singularity.cacheDir`](https://www.nextflow.io/docs/latest/singularity.html?#singularity-docker-hub) Nextflow options enables you to store and re-use the images from a central location for future pipeline runs.
+> - If you are using `conda`, it is highly recommended to use the [`NXF_CONDA_CACHEDIR` or `conda.cacheDir`](https://www.nextflow.io/docs/latest/conda.html) settings to store the environments in a central location for future pipeline runs.
 
-4. Start running your own analysis!
+- Start running your own analysis!
 
-   ```bash
-   nextflow run dalmolingroup/euryale --input samplesheet.csv --outdir <OUTDIR> --kaiju_db kaiju_reference --diamond_db diamond_db --reference_fasta diamond_fasta --host_fasta host_reference_fasta --id_mapping id_mapping_file -profile <docker/singularity/podman/shifter/charliecloud/conda/institute>
-   ```
+```bash
+nextflow run dalmolingroup/euryale \
+  --input samplesheet.csv \
+  --outdir <OUTDIR> \
+  --kaiju_db kaiju_reference \
+  --diamond_db diamond_db \
+  --reference_fasta diamond_fasta \
+  --host_fasta host_reference_fasta \
+  --id_mapping id_mapping_file \
+  -profile <docker/singularity/podman/shifter/charliecloud/conda/institute>
+```
+
+## Documentation
+
+The dalmolingroup/euryale documentation is split into the following pages:
+
+- [Usage](usage.md)
+
+      - An overview of how the pipeline works, how to run it and a description of all of the different command-line flags.
+
+- [Output](output.md)
+
+      - An overview of the different results produced by the pipeline and how to interpret them.
 
 ## Credits
 
@@ -61,10 +81,6 @@ We thank the following people for their extensive assistance in the development
 
 - Diego Morais (for developing the original [MEDUSA](https://github.com/dalmolingroup/medusa) pipeline)
 
-## Contributions and Support
-
-If you would like to contribute to this pipeline, please see the [contributing guidelines](.github/CONTRIBUTING.md).
-
 ## Citations
 
 > Morais DAA, Cavalcante JVF, Monteiro SS, Pasquali MAB and Dalmolin RJS (2022)

diff --git a/docs/CITATIONS.md b/docs/CITATIONS.md
@@ -0,0 +1 @@
+../CITATIONS.md
diff --git a/docs/README.md b/docs/README.md
diff --git a/docs/README.md b/docs/README.md
@@ -0,0 +1 @@
+../README.md
diff --git a/docs/output.md b/docs/output.md
@@ -10,32 +10,93 @@ The directories listed below will be created in the results directory after the
 
 ## Pipeline overview
 
-The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes data using the following steps:
+The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes data using the following steps (steps in **italics** don't run by default):
 
-- [FastQC](#fastqc) - Raw read QC
+- [Kaiju](#kaiju) - Taxonomically classify reads or contigs
+- [Krona](#krona) - Visualize the taxonomic classification for each sample.
+- [Diamond](#diamond) - Alignment reads and contigs against a reference database (such as NCBI-nr).
+- [Annotate](#annotate) - Functional annotation of alignment matches.
+- [MEGAHIT](#megahit) - Assembled contigs.
 - [MultiQC](#multiqc) - Aggregate report describing results and QC from the whole pipeline
 - [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution
 
-### FastQC
+### Kaiju
 
 <details markdown="1">
 <summary>Output files</summary>
 
-- `fastqc/`
-  - `*_fastqc.html`: FastQC report containing quality metrics.
-  - `*_fastqc.zip`: Zip archive containing the FastQC report, tab-delimited data file and plot images.
+- `taxonomy/`
+  - `${sample}.tsv`: Kaiju classification output.
+  - `${sample}.txt`: Kaiju2Table TSV output.
 
 </details>
 
-[FastQC](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) gives general quality metrics about your sequenced reads. It provides information about the quality score distribution across your reads, per base sequence content (%A/T/G/C), adapter contamination and overrepresented sequences. For further reading and documentation see the [FastQC help pages](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/).
+[Kaiju](https://github.com/bioinformatics-centre/kaiju/) is a
+software to perform fast taxonomic classification of metagenomic sequencing reads using a protein reference database.
 
-![MultiQC - FastQC sequence counts plot](images/mqc_fastqc_counts.png)
+### Krona
 
-![MultiQC - FastQC mean quality scores plot](images/mqc_fastqc_quality.png)
+<details markdown="1">
+<summary>Output files</summary>
+
+- `taxonomy/`
+  - `${sample}.html`: Krona visualization for the sample.
+
+</details>
+
+- [Krona](https://github.com/marbl/Krona/) is a tool to interactively explore metagenomes and more from a web browser.
+
+### Diamond
+
+<details markdown="1">
+<summary>Output files</summary>
+
+- `alignment/${sample}/`
+  - `${sample}.txt`: Alignment matches in blast tabular output format.
+  - `${sample}.log`: DIAMOND execution log.
+
+</details>
+
+- [DIAMOND](https://github.com/bbuchfink/diamond) is an accelerated BLAST compatible local sequence aligner.
+
+### Annotate
+
+<details markdown="1">
+<summary>Output files</summary>
+
+- `functional/${sample}/`
+  - `${sample}_annotated.txt`: Alignment matches annotated to chosen
+    functional database (e.g. GO).
+
+</details>
+
+- [Annotate](https://github.com/dalmolingroup/annotate) is a tool to annotate each query using the best alignment for which a mapping is known.
+
+### MEGAHIT
+
+<details markdown="1">
+<summary>Output files</summary>
+
+- `assembly/${sample}/`
+  - `${sample}.contigs.fa.gz`: Contigs assembled for the sample.
+
+</details>
+
+- [MEGAHIT](https://github.com/voutcn/megahit) is an ultra-fast and memory-efficient (meta-)genome assembler
+
+## _DIAMOND database_
+
+<details markdown="1">
+<summary>Output files</summary>
+
+- `diamond_db/`
+  - `${database_name}.dmnd`: DIAMOND database for the reference fasta file.
+
+</details>
 
-![MultiQC - FastQC adapter content plot](images/mqc_fastqc_adapter.png)
+- This output is present if you add the `--save_db` parameter.
 
-> **NB:** The FastQC plots displayed in the MultiQC report shows _untrimmed_ reads. They may contain adapter sequence and potentially regions with low quality.
+- [DIAMOND](https://github.com/bbuchfink/diamond) is an accelerated BLAST compatible local sequence aligner.
 
 ### MultiQC