Implement qiime dada denoise_ccs
in q2-dada2 plugin for denoising Pacbio CCS long reads.
Test on qiime2-2021.4
Qiime2 v2022.2 has offically supported Pacbio CCS denoise.
- Overwrite the file __init__.py, plugin_setup.py, and _denoise.py in /path/to/your/conda/envs/qiime2-version/lib/python3.8/site-packages/q2_dada2/ with the file in this repo.
- put a new file run_dada_ccs.R in /path/to/your/conda/envs/qiime2-version/bin and use command
chmod u+x ./run_dada_ccs.R
to make sure you have execution authority. - use command
qiime dev refresh-cache
to refresh your QIIME2 environment.
Changing following files in /path/to/your/conda/envs/qiime2-version/lib/python3.8/site-packages/q2_dada2/
-
__init__.py : import a new function
denoise_ccs
from _denoise.py -
plugin_setup.py :
-
Register the new function
denoise_ccs
based ondenoise_pyro
-
Add following explicit parameters :
- front and adapter for primer removing
- max_mismatch and indels for primer matching
- min_len to filter the min length of CCS reads
-
Change the default value of following explicits parameters :
- trunc_len : default 0
- min_fold_parent_over_abundance from 1 to 3.5
- n_reads_learn from 250000 to 1000000 (same with
denoise_single
)
-
-
_denoise.py :
-
define the new function
denoise_ccs
based ondenoise_pyro
and_denoise_single
:-
add new parameters :
- front and adapter : string
- max_mismatch : interger, default 2
- indels : boolean, default False
- min_len : interger, default 20
-
change the default value of following parameters :
- trunc_len : default 0
- min_fold_parent_over_abundance from 1 to 3.5
- n_reads_learn from 250000 to 1000000 (same with
denoise_single
)
-
remove implicit parameters homopolymer_gap_penalty and band_size , re-define their default values in run_dada_ccs.R
-
change the command line
run_dada_ccs.R
to work with previous changes :- add new temporary directory nop_fp for PRIMER REMOVING step
- change the path of temporary directory filt_fp
- add new arguments in
run_dada_ccs.R
for the new parameters indenoise_ccs
-
return the stats log of primer removing step to function
_denoise_helper
-
-
change the function
_denoise_helper
for exposing the PRIMER REMOVING step stat in final denoise_stats.qza, including primer-removed and percentage of input primer-removed -
add new parameters to the
_valid_inputs
for paramters validation:- max_mismatch : _WHOLE_NUM
- min_len : _WHOLE_NUM
- front, adapter and indels : _SKIP
-
Then add a new Rscript run_dada_ccs.R based on run_dada_single.R in /path/to/your/conda/envs/qiime2-version/bin
-
assign the new arguments and reorder all the arguments :
- primer.removed.dir for the path of temporary directory in PRIMER REMOVING step , corresponding to nop_fp in
denoise_ccs
- primerF and primerR for primer sequences, corresponding to front and adapter in
denoise_ccs
- maxMismatch and indels for primer matching, corresponding to max_mismatch and indels in
denoise_ccs
- *minLen for min length CCS reads filtering, corresponding to min_len in
denoise_ccs
- primer.removed.dir for the path of temporary directory in PRIMER REMOVING step , corresponding to nop_fp in
-
add a new PRIMER REMOVING step before TRIM AND FILTER step :
-
modify the TRIM AND FILTER step :
- change the default value of implicit parameters rm.phix from TRUE to FALSE in function
filterAndTrim
- add paramters in function
filterAndTrim
:- add minLen = minLen
- minQ = 3
- change the default value of implicit parameters rm.phix from TRUE to FALSE in function
-
modify the LEARN EROOR RATES step :
- change the default value of following implicit parameters :
- errorEstimationFunction from dada2:::loessErrfun to dada2:::PacBioErrfun
- add an explicit default value of BAND_SIZE to 32
- remove the HOMOPOLYMER_GAP_PENALTY parameters so its default value change to NULL
- change the default value of following implicit parameters :
-
modify the REPORT READ FRACTIONS THROUGH PIPELINE step to add statistics of PRIMER REMOVING step in denoise_stats.qza
Enter qiime dev refresh-cache
after changes those files.
Note: Still lack of auto test of those new parameters so make sure your input is valid.
All parameters used in those files are referenced from Callahan et al., 2019 and https://github.com/benjjneb/LRASManuscript
PR to qiime/q2-dada2, waiting for merging.