Welcome to RepNano repository (implementation from Hennion et al., Genome Biol 21, 125. 2020). RepNano allows to estimate BrdU content from Oxford Nanopore raw sequencing reads.
The software outputs two files:
- a
.fa
fasta file with the read sequences where T have been replaced by T, X or B according to our transition matrix (TM) approach - a
.fa_ratio_B
file with the BrdU ratio for each base of the sequence as computed by the neural network (CNN).
conda create --name tomboenv --override-channels -c defaults -c bioconda python=3.6 keras pandas numba tqdm joblib ont-tombo matplotlib
conda activate tomboenv
git clone https://github.com/organic-chemistry/RepNano.git
cd RepNano
python setup.py develop
For compressed fast5 it is now necessary to install vbz from here: https://github.com/nanoporetech/vbz_compression/releases once extracted, create the repertory and move the plugin there:
mkdir -p place_where_conda_is_located/.conda/envs/tomboenv/lib/hdf5/plugin/
cp ont-vbz-hdf-plugin-1.0.1-Linux/usr/local/hdf5/lib/plugin/libvbz_hdf_plugin.so place_where_conda_is_located/.conda/envs/tomboenv/lib/hdf5/plugin/
The typical pipeline consists in Oxford Nanopore reads alignment on a reference genome followed by prediction of BrdU content by the neural network and the transition matrix approaches.
Oxford Nanopore outputs 2 kinds of files:
- a fastq file containing all the sequences obtained by the Guppy basecaller,
- several fast5 files (raw currents) containing 4000 reads each. If the fast5 folder is compressed, first run :
tar -xvzf data_fast5.tgz
This will create a directory with the .fast5 files inside. Then every .fast5 is processed separately (parallelizable).
The first step is to create a output.fast5 file that will contain the currents aligned on the reference genome (it uses Tombo (ONT) package):
python src/repnano/data/preprocess.py --hdf5 fast5_file.fast5 --fastq fastq_file.fastq --ref reference_genome.fa --output_name output.fast5 --njobs 6
RepNano is then called on this file:
python src/repnano/models/predict_simple.py output.fast5 --bigf --output=BrdU_calls/output_file.fa --overlap 10
RepNano generates two files:
- output_file.fa with the read sequences where T have been replaced by T, X or B according to our transition matrix (TM) approach
- output_file.fa_ratio_B with the BrdU ratio for each base of the sequence as computed by the neural network (CNN).
This pipeline requires additional installing steps to be done only once:
the _preprocess.py file installed in tombo has to be modified. To find the _preprocess.py file to replace, run :
conda config --show envs_dirs
It should output the directory where the python library is installed, for example miniconda3/envs/
In this example, the _preprocess.py file to replace is in miniconda3/envs/tomboenv/lib/python3.6/site-packages/tombo/
You should run:
cp modif_tombo/_preprocess.py miniconda3/envs/tomboenv/lib/python3.6/site-packages/tombo/
If the data is compressed, first run :
tar -xvzf data_fast5.tgz
This will create one or several directories ('fast5_directory') with the fast5 files inside.
Then process every fast5 folder separately (parallelizable).
The first step is to associate the sequence of each read from the fastq file to the corresponding fast5 file using Tombo (see Tombo documentation):
tombo preprocess annotate_raw_with_fastqs --fast5-basedir fast5_directory/ --fastq-filenames fastq_file.fastq --overwrite --processes 4
If the files are not processed:
- check if --overwrite is in the command line
- check that you properly did the installation (replacing the _preprocess.py file)
- check that you have write permission on the files
Then use Tombo resquiggle
command to map the fastq sequence to the reference genome (here yeast: S288C_reference_sequence_R64-2-1_20150113.fa; alternatively a .mmi index generated by minimap2 can also be given) and to realign the raw currents to the reference sequence:
tombo resquiggle fast5_directory/ S288C_reference_sequence_R64-2-1_20150113.fa --processes 4 --num-most-common-errors 5 --dna
Finally run RepNano to estimate BrdU content along mapped reads :
python src/repnano/models/predict_simple.py --directory=fast5_directory/ --output=BrdU_calls/output_files.fa --overlap 10
The previous steps can be used to detect BrdU in any experimental context. In contrast, this last part is only to detect replication forks labelled in conditions resembling the ones used in Hennion et al..
Fork detection relies on the module simplification that have to be installed first.
pip install simplification
To detect replication forks, as well as initiation and termination events, you have to run the following command, where the 'BrdU_calls' folder is the output of RepNano, the 'DetectionFOLDER' is the location of the detection output files and 'prefix' is a prefix in the output files (it can be a sample ID for instance).
python src/repnano/detection/ForkPrediction-CNN-TM.py BrdU_calls/ DetectionFOLDER prefix
A number of parameters are set up at the beginning of ForkPrediction-CNN-TM.py and can be modified to make the detection more or less stringent.
The detection results obtained in Hennion et al. can be found in Detected_events folder (S288C yeast genome):
Finally the folder R contains the R scripts used to perform the downstream analysis and to generate most of the figures of the article.
This software is licensed under the MIT license.
During the course of development of this software, part of the software DeepNano
have been used (Boža, Vladimír, Broňa Brejová, and Tomáš Vinař. "DeepNano: deep recurrent neural networks for base calling in MinION nanopore reads." PloS one 12.6 (2017).)
show error message when analysing R10 samples
create conda package / facilitate installation
separer resquiggle from mapping