Skip to content

Commit

Permalink
Merge pull request #120 from rhpvorderman/release_0.6.0
Browse files Browse the repository at this point in the history
Release 0.6.0
  • Loading branch information
rhpvorderman authored Mar 29, 2024
2 parents 8014193 + e65d35b commit 72a7e65
Show file tree
Hide file tree
Showing 41 changed files with 2,342 additions and 465 deletions.
2 changes: 2 additions & 0 deletions .github/release_checklist.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
Release checklist
- [ ] Check outstanding issues on JIRA and Github.
- [ ] Check [latest documentation](https://sequali.readthedocs.io/en/latest)
looks fine.
- [ ] Create a release branch.
- [ ] Change current development version in `CHANGELOG.rst` to stable version.
- [ ] Check memory leaks with `tox -e asan`
Expand Down
17 changes: 3 additions & 14 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,24 +13,14 @@ on:
- "*"

jobs:
lint:
runs-on: ubuntu-20.04
steps:
- uses: actions/checkout@v2.3.4
- name: Set up Python 3.8
uses: actions/setup-python@v2.2.1
with:
python-version: 3.8
- name: Install tox
run: pip install tox
- name: Lint
run: tox -e lint

package-checks:
strategy:
matrix:
tox_env:
- twine_check
- docs
- lint
runs-on: ubuntu-20.04
steps:
- uses: actions/checkout@v2.3.4
Expand Down Expand Up @@ -86,7 +76,6 @@ jobs:
# test-arch:
# if: startsWith(github.ref, 'refs/tags') || github.ref == 'refs/heads/develop' || github.ref == 'refs/heads/main'
# runs-on: "ubuntu-latest"
# needs: lint
# strategy:
# matrix:
# distro: [ "ubuntu20.04" ]
Expand All @@ -108,7 +97,7 @@ jobs:
deploy:
if: github.event_name == 'push' && startsWith(github.ref, 'refs/tags')
runs-on: ${{ matrix.os }}
needs: [lint, package-checks, test]
needs: [package-checks, test]
strategy:
matrix:
os:
Expand Down
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
src/sequali/_version.py

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
Expand Down
16 changes: 16 additions & 0 deletions .readthedocs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
version: 2
formats: [] # Do not build epub and pdf

python:
install:
- requirements: "docs/requirements-docs.txt"
- method: "pip"
path: "."

sphinx:
configuration: docs/conf.py

build:
os: "ubuntu-22.04"
tools:
python: "3"
15 changes: 15 additions & 0 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,21 @@ Changelog
.. This document is user facing. Please word the changes in such a way
.. that users understand how the changes affect the new version.
version 0.6.0
-----------------
+ Add links to the documentation in the report.
+ Moved documentation to readthedocs and added extensive module documentation.
+ Change the ``-deduplication-estimate-bits`` to a more understandable
``--duplication-max-stored-fingerprints``.
+ Add a small table that lists how many reads are >=Q5, >=Q7 etc. in the
per sequence average quality report.
+ The progressbar can track progress through more file formats.
+ The deduplication fingerprint that is used is now configurable from the
command line.
+ The deduplication module starts by gathering all sequences rather than half
of the sequences. This allows all sequences to be considered using a big
enough hash table.

version 0.5.1
-----------------
+ Fix a bug in the overrepresented sequence sampling where the fragments from
Expand Down
132 changes: 59 additions & 73 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,25 @@
:target: https://github.com/rhpvorderman/sequali/blob/main/LICENSE
:alt:

.. image:: https://readthedocs.org/projects/sequali/badge/?version=latest
:target: https://sequali.readthedocs.io/en/latest/?badge=latest
:alt:

.. image:: https://codecov.io/gh/rhpvorderman/sequali/graph/badge.svg?token=MSR1A6BEGC
:target: https://codecov.io/gh/rhpvorderman/sequali
:alt:

.. image:: https://zenodo.org/badge/DOI/10.5281/zenodo.10854010.svg
:target: https://doi.org/10.5281/zenodo.10854010
:alt:

========
sequali
Sequali
========
Sequence quality metrics

.. introduction start
Sequence quality metrics for FASTQ and uBAM files.

Features:

Expand All @@ -36,11 +51,18 @@ Features:

Example reports:

+ `GM24385_1.fastq.gz <https://github.com/rhpvorderman/sequali/files/14617717/GM24385_1.fastq.gz.html.zip>`_;
+ `GM24385_1.fastq.gz <https://github.com/rhpvorderman/sequali/files/14725146/GM24385_1.fastq.gz.html.zip>`_;
HG002 (Genome In A Bottle) on ultra-long Nanopore Sequencing. `Sequence file download <https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/AshkenazimTrio/HG002_NA24385_son/UCSC_Ultralong_OxfordNanopore_Promethion/GM24385_1.fastq.gz>`_.

.. introduction end
For more information check `the documentation <https://sequali.readthedocs.io>`_.

Supported formats
=================

.. formats start
- FASTQ. Only the Sanger variation with a phred offset of 33 and the error rate
calculation of 10 ^ (-phred/10) is supported. All sequencers use this
format today.
Expand All @@ -55,9 +77,13 @@ Supported formats
- For uBAM data as delivered by dorado additional nanopore plots will be
provided.

.. formats end
Installation
============

.. installation start
Installation via pip is available with::

pip install sequali
Expand All @@ -66,83 +92,37 @@ Sequali is also distributed via bioconda. It can be installed with::

conda install -c conda-forge -c bioconda sequali

Usage
=====
.. installation end
Quickstart
==========

.. quickstart start
.. code-block::
usage: sequali [-h] [--json JSON] [--html HTML] [--outdir OUTDIR]
[--adapter-file ADAPTER_FILE]
[--overrepresentation-threshold-fraction FRACTION]
[--overrepresentation-min-threshold THRESHOLD]
[--overrepresentation-max-threshold THRESHOLD]
[--overrepresentation-max-unique-fragments N]
[--overrepresentation-fragment-length LENGTH]
[--overrepresentation-sample-every DIVISOR]
[--deduplication-estimate-bits BITS] [-t THREADS] [--version]
INPUT
Create a quality metrics report for sequencing data.
positional arguments:
INPUT Input FASTQ or uBAM file. The format is autodetected
and compressed formats are supported.
options:
-h, --help show this help message and exit
--json JSON JSON output file. default: '<input>.json'.
--html HTML HTML output file. default: '<input>.html'.
--outdir OUTDIR, --dir OUTDIR
Output directory for the report files. default:
current working directory.
--adapter-file ADAPTER_FILE
File with adapters to search for. See default file for
formatting. Default: src/sequali/adapters/adapter_list.tsv.
--overrepresentation-threshold-fraction FRACTION
At what fraction a sequence is determined to be
overrepresented. The threshold is calculated as
fraction times the number of sampled sequences.
Default: 0.001 (1 in 1,000).
--overrepresentation-min-threshold THRESHOLD
The minimum amount of occurrences for a sequence to be
considered overrepresented, regardless of the bound
set by the threshold fraction. Useful for smaller
files. Default: 100.
--overrepresentation-max-threshold THRESHOLD
The amount of occurrences for a sequence to be
considered overrepresented, regardless of the bound
set by the threshold fraction. Useful for very large
files. Default: unlimited.
--overrepresentation-max-unique-fragments N
The maximum amount of unique fragments to store.
Larger amounts increase the sensitivity of finding
overrepresented sequences at the cost of increasing
memory usage. Default: 5,000,000.
--overrepresentation-fragment-length LENGTH
The length of the fragments to sample. The maximum is
31. Default: 21.
--overrepresentation-sample-every DIVISOR
How often a read should be sampled. More samples leads
to better precision, lower speed, and also towards
more bias towards the beginning of the file as the
fragment store gets filled up with more sequences from
the beginning. Default: 1 in 8.
--deduplication-estimate-bits BITS
Determines how many sequences are maximally stored to
estimate the deduplication rate. Maximum stored
sequences: 2 ** bits * 7 // 10. Memory required: 2 **
bits * 24. Default: 21.
-t THREADS, --threads THREADS
Number of threads to use. If greater than one sequali
will use an additional thread for gzip decompression.
Default: 2.
--version show program's version number and exit
sequali path/to/my.fastq.gz
This will create a report ``my.fastq.gz.html`` and a json ``my.fastq.gz.json``
in the current working directory.

.. quickstart end
For all command line options checkout the
`usage documentation <https://sequali.readthedocs.io/#usage>`_.

For more extensive information about the module options check the
`documentation on the module options
<https://sequali.readthedocs.io/#module-option-explanations>`_.

Acknowledgements
================

.. acknowledgements start
+ `FastQC <https://www.bioinformatics.babraham.ac.uk/projects/fastqc/>`_ for
its excellent selection of relevant metrics. For this reason these metrics
are also gathered by sequali.
are also gathered by Sequali.
+ The matplotlib team for their excellent work on colormaps. Their work was
an inspiration for how to present the data and their RdBu colormap is used
to represent quality score data. Check their `writings on colormaps
Expand All @@ -152,11 +132,17 @@ Acknowledgements
scores <https://gigabaseorgigabyte.wordpress.com/2017/06/26/averaging-basecall-quality-scores-the-right-way/>`_.
+ Marcel Martin for providing very extensive feedback.

.. acknowledgements end
License
=======

.. license start
This project is licensed under the GNU Affero General Public License v3. Mainly
to avoid commercial parties from using it without notifying the users that they
can run it themselves. If you want to include code from sequali in your
can run it themselves. If you want to include code from Sequali in your
open source project, but it is not compatible with the AGPL, please contact me
and we can discuss a separate license.

.. license end
8 changes: 8 additions & 0 deletions codecov.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
coverage:
status:
project:
default:
target: 90 # let's try to hit high standards
patch:
default:
target: 90 # Tests should be written for new features
Loading

0 comments on commit 72a7e65

Please sign in to comment.