Skip to content

Latest commit

 

History

History
86 lines (55 loc) · 4.46 KB

README.md

File metadata and controls

86 lines (55 loc) · 4.46 KB

Improved detection resolution of PRRSV using nanopore direct RNA sequencing and bioinformatics tools

This repository provides relavant bioinformatics methods and codes for the manuscript: Improved detection resolution of PRRSV using nanopore direct RNA sequencing and bioinformatics tools.

The objective of this study is to provide a high-resoluation detecion method for PRRSV diagnosis, and to pave the way for the future goal of rapid, unbiased, on-site infectious diseases investigation using portable sequencer. The technology we used in this study was Oxford Nanopore direct RNA sequencing (https://nanoporetech.com/).

Raw sequencing data generated was deposited in NCBI SRA:

PRRSV whole genome generation

Raw reads assessment

Basecalling of raw reads was performed using Albacore (https://nanoporetech.com/) to generate FASTQ files.

Total yield, total reads, read quality, and read length from whole genome sequencing were analyzed using NanoPlot (https://github.com/wdecoster/NanoPlot).

In order to evaluate the raw error rates of direct RNA sequencing, raw reads were mapped to the VR2332 reference sequence using minimap2-2.12 (https://github.com/lh3/minimap2), processed with SAMtools (https://github.com/samtools/samtools) to generate BAM files, and then evaluated by AlignQC (https://github.com/jason-weirather/AlignQC).

minimap2 -ax map-ont prrsv.fastq > minimap_prrsv.sam  

samtools view -b minimap_prrsv.sam > minimap_prrsv.bam
samtools sort minimap_prrsv.bam -o minimap_prrsv_sorted.bam
samtools index minimap_prrsv_sorted.bam

qualimap_v2.2.1/qualimap bamqc -bam minimap_prrsv_sorted.bam -outdir qualimap_results

alignqc analyze minimap_prrsv_sorted.bam -r reference.fasta --no_annotation -o prrsv_alignqc.xhtml

Whole genome generation

Software and version: Racon-1.3.2

cat prrsv.fastq | paste - - - - | awk -F '\t' '{L=length($2);if(L>M) {M=L;R=$0;}} END {print R;}' | tr "\t" "\n" > largest.fastq

seqtk-master/seqtk seq -a largest.fastq > largest.fasta

minimap2 -ax map-ont largest.fasta prrsv.fastq > prrsv_mapped.sam  

racon prrsv.fastq prrsv_mapped.sam largest.fasta > racon.fasta

Coverage analysis

Depth of coverage across the consensus genome was analyzed using Qualimap (http://qualimap.bioinfo.cipf.es/).

Viral strain detection

a custom PRRSV sequence database containing 951 PRRSV whole genome sequences was generated by downloading all PRRSV whole genome sequences available in GenBank (949 sequences including our VR2332 strain, download date: Nov 2018) with the addition of sequences from our SDEU (MN175678) and 1-7-4 (MN175677) lab strains. Alignment Search Tool (BLAST) with a significance filter of expect value (E) < 10 ^-50 to examine the PRRSV sequence reads.

makeblastdb -in downloaded_prrsv.fasta -title prrsvdatabase -dbtype nucl -out prrsvdatabase
blastn -db prrsvdatabase -query prrsv.fastq -evalue 1e-50 -word_size 11 -outfmt 7 > prrsv_blast

Differentiation of multiple viral strains

In order to detect PRRSV strains, PRRSV reads were first BLASTn analyzed to identify the top BLAST hit as determined by bit score.

Then, all PRRSV reads were mapped to the this top BLAST hit using minimap2 and mapped reads were extracted using SAMtools.

The unmapped reads were also extracted and were analyzed against the PRRSV database a second time to detect any other strain existing in the same sample. The top BLAST hit was recorded and the mapped and unmapped reads to the second top match were again separated.


blastn -db prrsvdatabase -query prrsv.fastq -evalue 1e-50 -word_size 11 -outfmt 7 > prrsv_blast1 # take the top blast hit as reference1

minimap2 -ax map-ont reference1.fasta prrsv.fastq > prrsv1.sam

samtools view -b prrsv1.sam > prrsv1.bam
samtools sort prrsv1.bam -o prrsv1_sorted.bam
samtools index prrsv1_sorted.bam
samtools view -S -F4 prrsv1_sorted.bam > prrsv1.mapped.sam 
samtools view -S -f4 prrsv1_sorted.bam > prrsv1.unmapped.sam

cut -f1 VR2332.mapped.sam | sort | uniq > prrsv1_mapped.lst
seqtk-master/seqtk subseq prrsv.fastq prrsv1_mapped.lst > prrsv1_mapped.fastq

cut -f1 VR2332.unmapped.sam | sort | uniq > prrsv1_unmapped.lst
seqtk-master/seqtk subseq prrsv.fastq prrsv1_unmapped.lst > prrsv1_unmapped.fastq

blastn -db prrsvdatabase -query prrsv1_unmapped.fastq -evalue 1e-50 -word_size 11 -outfmt 7 > prrsv_blast1 # take the top blast hit as reference2

minimap2 -ax map-ont reference2.fasta prrsv.fastq > prrsv2.sam
...

This was repeated until no PRRSV strain was detected in the extracted unmapped reads.