Improved detection resolution of PRRSV using nanopore direct RNA sequencing and bioinformatics tools
This repository provides relavant bioinformatics methods and codes for the manuscript: Improved detection resolution of PRRSV using nanopore direct RNA sequencing and bioinformatics tools.
The objective of this study is to provide a high-resoluation detecion method for PRRSV diagnosis, and to pave the way for the future goal of rapid, unbiased, on-site infectious diseases investigation using portable sequencer. The technology we used in this study was Oxford Nanopore direct RNA sequencing (https://nanoporetech.com/).
Raw sequencing data generated was deposited in NCBI SRA:
Basecalling of raw reads was performed using Albacore (https://nanoporetech.com/) to generate FASTQ files.
Total yield, total reads, read quality, and read length from whole genome sequencing were analyzed using NanoPlot (https://github.com/wdecoster/NanoPlot).
In order to evaluate the raw error rates of direct RNA sequencing, raw reads were mapped to the VR2332 reference sequence using minimap2-2.12 (https://github.com/lh3/minimap2), processed with SAMtools (https://github.com/samtools/samtools) to generate BAM files, and then evaluated by AlignQC (https://github.com/jason-weirather/AlignQC).
minimap2 -ax map-ont prrsv.fastq > minimap_prrsv.sam
samtools view -b minimap_prrsv.sam > minimap_prrsv.bam
samtools sort minimap_prrsv.bam -o minimap_prrsv_sorted.bam
samtools index minimap_prrsv_sorted.bam
qualimap_v2.2.1/qualimap bamqc -bam minimap_prrsv_sorted.bam -outdir qualimap_results
alignqc analyze minimap_prrsv_sorted.bam -r reference.fasta --no_annotation -o prrsv_alignqc.xhtml
Software and version: Racon-1.3.2
cat prrsv.fastq | paste - - - - | awk -F '\t' '{L=length($2);if(L>M) {M=L;R=$0;}} END {print R;}' | tr "\t" "\n" > largest.fastq
seqtk-master/seqtk seq -a largest.fastq > largest.fasta
minimap2 -ax map-ont largest.fasta prrsv.fastq > prrsv_mapped.sam
racon prrsv.fastq prrsv_mapped.sam largest.fasta > racon.fasta
Depth of coverage across the consensus genome was analyzed using Qualimap (http://qualimap.bioinfo.cipf.es/).
a custom PRRSV sequence database containing 951 PRRSV whole genome sequences was generated by downloading all PRRSV whole genome sequences available in GenBank (949 sequences including our VR2332 strain, download date: Nov 2018) with the addition of sequences from our SDEU (MN175678) and 1-7-4 (MN175677) lab strains. Alignment Search Tool (BLAST) with a significance filter of expect value (E) < 10 ^-50 to examine the PRRSV sequence reads.
makeblastdb -in downloaded_prrsv.fasta -title prrsvdatabase -dbtype nucl -out prrsvdatabase
blastn -db prrsvdatabase -query prrsv.fastq -evalue 1e-50 -word_size 11 -outfmt 7 > prrsv_blast
In order to detect PRRSV strains, PRRSV reads were first BLASTn analyzed to identify the top BLAST hit as determined by bit score.
Then, all PRRSV reads were mapped to the this top BLAST hit using minimap2 and mapped reads were extracted using SAMtools.
The unmapped reads were also extracted and were analyzed against the PRRSV database a second time to detect any other strain existing in the same sample. The top BLAST hit was recorded and the mapped and unmapped reads to the second top match were again separated.
blastn -db prrsvdatabase -query prrsv.fastq -evalue 1e-50 -word_size 11 -outfmt 7 > prrsv_blast1 # take the top blast hit as reference1
minimap2 -ax map-ont reference1.fasta prrsv.fastq > prrsv1.sam
samtools view -b prrsv1.sam > prrsv1.bam
samtools sort prrsv1.bam -o prrsv1_sorted.bam
samtools index prrsv1_sorted.bam
samtools view -S -F4 prrsv1_sorted.bam > prrsv1.mapped.sam
samtools view -S -f4 prrsv1_sorted.bam > prrsv1.unmapped.sam
cut -f1 VR2332.mapped.sam | sort | uniq > prrsv1_mapped.lst
seqtk-master/seqtk subseq prrsv.fastq prrsv1_mapped.lst > prrsv1_mapped.fastq
cut -f1 VR2332.unmapped.sam | sort | uniq > prrsv1_unmapped.lst
seqtk-master/seqtk subseq prrsv.fastq prrsv1_unmapped.lst > prrsv1_unmapped.fastq
blastn -db prrsvdatabase -query prrsv1_unmapped.fastq -evalue 1e-50 -word_size 11 -outfmt 7 > prrsv_blast1 # take the top blast hit as reference2
minimap2 -ax map-ont reference2.fasta prrsv.fastq > prrsv2.sam
...
This was repeated until no PRRSV strain was detected in the extracted unmapped reads.