Skip to content

Commit

Permalink
docs: Improve docs (#20)
Browse files Browse the repository at this point in the history
* docs: Update output doc

* docs: Add params documentation

* docs: Add mkdocs
  • Loading branch information
jvfe authored Aug 28, 2023
1 parent 2384147 commit 5185c46
Show file tree
Hide file tree
Showing 8 changed files with 234 additions and 36 deletions.
31 changes: 31 additions & 0 deletions .github/workflows/docs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# This workflow will automatically deploy the mkdocs documentation
# See https://parkererickson.github.io/portfolio/blog/MkDocsCD/

name: docs

on:
push:
branches: [master]
workflow_dispatch:

jobs:
build:
name: Build and Deploy Documentation
runs-on: ubuntu-latest
steps:
- name: Checkout Master
uses: actions/checkout@v2

- name: Set up Python 3
uses: actions/setup-python@v2
with:
python-version: "3.10.8"

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install mkdocs
- name: Deploy
run: |
git pull
mkdocs gh-deploy
48 changes: 32 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,22 +36,42 @@ The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool

3. Download the pipeline and test it on a minimal dataset with a single command:

```bash
nextflow run dalmolingroup/euryale -profile test,YOURPROFILE --outdir <OUTDIR>
```
```bash
nextflow run dalmolingroup/euryale -profile test,YOURPROFILE --outdir <OUTDIR>
```

Note that some form of configuration will be needed so that Nextflow knows how to fetch the required software. This is usually done in the form of a config profile (`YOURPROFILE` in the example command above). You can chain multiple config profiles in a comma-separated string.
Note that some form of configuration will be needed so that Nextflow knows how to fetch the required software. This is usually done in the form of a config profile (`YOURPROFILE` in the example command above). You can chain multiple config profiles in a comma-separated string.

> - The pipeline comes with config profiles called `docker`, `singularity`, `podman`, `shifter`, `charliecloud` and `conda` which instruct the pipeline to use the named tool for software management. For example, `-profile test,docker`.
> - Please check [nf-core/configs](https://github.com/nf-core/configs#documentation) to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use `-profile <institute>` in your command. This will enable either `docker` or `singularity` and set the appropriate execution settings for your local compute environment.
> - If you are using `singularity`, please use the [`nf-core download`](https://nf-co.re/tools/#downloading-pipelines-for-offline-use) command to download images first, before running the pipeline. Setting the [`NXF_SINGULARITY_CACHEDIR` or `singularity.cacheDir`](https://www.nextflow.io/docs/latest/singularity.html?#singularity-docker-hub) Nextflow options enables you to store and re-use the images from a central location for future pipeline runs.
> - If you are using `conda`, it is highly recommended to use the [`NXF_CONDA_CACHEDIR` or `conda.cacheDir`](https://www.nextflow.io/docs/latest/conda.html) settings to store the environments in a central location for future pipeline runs.
> - The pipeline comes with config profiles called `docker`, `singularity`, `podman`, `shifter`, `charliecloud` and `conda` which instruct the pipeline to use the named tool for software management. For example, `-profile test,docker`.
> - Please check [nf-core/configs](https://github.com/nf-core/configs#documentation) to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use `-profile <institute>` in your command. This will enable either `docker` or `singularity` and set the appropriate execution settings for your local compute environment.
> - If you are using `singularity`, please use the [`nf-core download`](https://nf-co.re/tools/#downloading-pipelines-for-offline-use) command to download images first, before running the pipeline. Setting the [`NXF_SINGULARITY_CACHEDIR` or `singularity.cacheDir`](https://www.nextflow.io/docs/latest/singularity.html?#singularity-docker-hub) Nextflow options enables you to store and re-use the images from a central location for future pipeline runs.
> - If you are using `conda`, it is highly recommended to use the [`NXF_CONDA_CACHEDIR` or `conda.cacheDir`](https://www.nextflow.io/docs/latest/conda.html) settings to store the environments in a central location for future pipeline runs.
4. Start running your own analysis!
- Start running your own analysis!

```bash
nextflow run dalmolingroup/euryale --input samplesheet.csv --outdir <OUTDIR> --kaiju_db kaiju_reference --diamond_db diamond_db --reference_fasta diamond_fasta --host_fasta host_reference_fasta --id_mapping id_mapping_file -profile <docker/singularity/podman/shifter/charliecloud/conda/institute>
```
```bash
nextflow run dalmolingroup/euryale \
--input samplesheet.csv \
--outdir <OUTDIR> \
--kaiju_db kaiju_reference \
--diamond_db diamond_db \
--reference_fasta diamond_fasta \
--host_fasta host_reference_fasta \
--id_mapping id_mapping_file \
-profile <docker/singularity/podman/shifter/charliecloud/conda/institute>
```

## Documentation

The dalmolingroup/euryale documentation is split into the following pages:

- [Usage](usage.md)

- An overview of how the pipeline works, how to run it and a description of all of the different command-line flags.

- [Output](output.md)

- An overview of the different results produced by the pipeline and how to interpret them.

## Credits

Expand All @@ -61,10 +81,6 @@ We thank the following people for their extensive assistance in the development

- Diego Morais (for developing the original [MEDUSA](https://github.com/dalmolingroup/medusa) pipeline)

## Contributions and Support

If you would like to contribute to this pipeline, please see the [contributing guidelines](.github/CONTRIBUTING.md).

## Citations

> Morais DAA, Cavalcante JVF, Monteiro SS, Pasquali MAB and Dalmolin RJS (2022)
Expand Down
1 change: 1 addition & 0 deletions docs/CITATIONS.md
8 changes: 0 additions & 8 deletions docs/README.md

This file was deleted.

1 change: 1 addition & 0 deletions docs/README.md
83 changes: 72 additions & 11 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,32 +10,93 @@ The directories listed below will be created in the results directory after the

## Pipeline overview

The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes data using the following steps:
The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes data using the following steps (steps in **italics** don't run by default):

- [FastQC](#fastqc) - Raw read QC
- [Kaiju](#kaiju) - Taxonomically classify reads or contigs
- [Krona](#krona) - Visualize the taxonomic classification for each sample.
- [Diamond](#diamond) - Alignment reads and contigs against a reference database (such as NCBI-nr).
- [Annotate](#annotate) - Functional annotation of alignment matches.
- [MEGAHIT](#megahit) - Assembled contigs.
- [MultiQC](#multiqc) - Aggregate report describing results and QC from the whole pipeline
- [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution

### FastQC
### Kaiju

<details markdown="1">
<summary>Output files</summary>

- `fastqc/`
- `*_fastqc.html`: FastQC report containing quality metrics.
- `*_fastqc.zip`: Zip archive containing the FastQC report, tab-delimited data file and plot images.
- `taxonomy/`
- `${sample}.tsv`: Kaiju classification output.
- `${sample}.txt`: Kaiju2Table TSV output.

</details>

[FastQC](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) gives general quality metrics about your sequenced reads. It provides information about the quality score distribution across your reads, per base sequence content (%A/T/G/C), adapter contamination and overrepresented sequences. For further reading and documentation see the [FastQC help pages](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/).
[Kaiju](https://github.com/bioinformatics-centre/kaiju/) is a
software to perform fast taxonomic classification of metagenomic sequencing reads using a protein reference database.

![MultiQC - FastQC sequence counts plot](images/mqc_fastqc_counts.png)
### Krona

![MultiQC - FastQC mean quality scores plot](images/mqc_fastqc_quality.png)
<details markdown="1">
<summary>Output files</summary>

- `taxonomy/`
- `${sample}.html`: Krona visualization for the sample.

</details>

- [Krona](https://github.com/marbl/Krona/) is a tool to interactively explore metagenomes and more from a web browser.

### Diamond

<details markdown="1">
<summary>Output files</summary>

- `alignment/${sample}/`
- `${sample}.txt`: Alignment matches in blast tabular output format.
- `${sample}.log`: DIAMOND execution log.

</details>

- [DIAMOND](https://github.com/bbuchfink/diamond) is an accelerated BLAST compatible local sequence aligner.

### Annotate

<details markdown="1">
<summary>Output files</summary>

- `functional/${sample}/`
- `${sample}_annotated.txt`: Alignment matches annotated to chosen
functional database (e.g. GO).

</details>

- [Annotate](https://github.com/dalmolingroup/annotate) is a tool to annotate each query using the best alignment for which a mapping is known.

### MEGAHIT

<details markdown="1">
<summary>Output files</summary>

- `assembly/${sample}/`
- `${sample}.contigs.fa.gz`: Contigs assembled for the sample.

</details>

- [MEGAHIT](https://github.com/voutcn/megahit) is an ultra-fast and memory-efficient (meta-)genome assembler

## _DIAMOND database_

<details markdown="1">
<summary>Output files</summary>

- `diamond_db/`
- `${database_name}.dmnd`: DIAMOND database for the reference fasta file.

</details>

![MultiQC - FastQC adapter content plot](images/mqc_fastqc_adapter.png)
- This output is present if you add the `--save_db` parameter.

> **NB:** The FastQC plots displayed in the MultiQC report shows _untrimmed_ reads. They may contain adapter sequence and potentially regions with low quality.
- [DIAMOND](https://github.com/bbuchfink/diamond) is an accelerated BLAST compatible local sequence aligner.

### MultiQC

Expand Down
Loading

0 comments on commit 5185c46

Please sign in to comment.