Releases: BIMSBbioinfo/crispr_DART
Releases · BIMSBbioinfo/crispr_DART
version 0.1.0
0.0.2
News for version 0.0.2
- Now support both insertions and deletions
- Integrated Rmarkdown reports with interactive plotly figures
- One report per amplicon
- Integrated Rmarkdown reports for pairwise comparison of samples to discover sites that show differential peaks of indel efficiencies in a case-control setting.
- Folders
bed
andbedgraph
replaced with folderindels
, which contains:- bedgraph files for insertions, deletions, and indels (combined insertions/deletions)
- bed files for top insertions, deletions (sorted by read support per genotype)
- indel stats at per-base resolution as a raw tsv file
- indel stats at expected cut sites
reports
folder now contains one report per each amplicon. Also a sub-folder calledcomparisons
is produced if acomparisons.tsv
file is provided, which contains pairwise samples that a user desires to compare (seesample_data/comparisons.tsv
file as an example).- The pipeline now uses R libraries (
GenomicAlignments
) to extract indel stats from bam files rather than usingsamtools mpileup
.
0.0.1
The pipeline currently supports single-end reads.
Inputs
- Sample sheet (.csv format) consisting of 4 fields
- sample_name: unique name of the sequenced sample
- amplicon: name of the targeted amplicon (seqname)
- reads: name of the zipped fastq file of the sequenced sample
- sgRNA_ids: Column (:) separated list of the ids of guide RNAs used to target the amplicon
- Settings file (.yaml format)
- For each amplicon (as identified in the sample sheet)
- Fasta format sequence of the amplicon
- Cutsites (text file with sgRNA id and position of expected cut site)
- Location of the sample sheet
- Folder containing the fastq files
Rules
- Quality improvement and control (trimmomatic + fastqc + multiqc)
- Mapping the reads against the amplicon sequence (using bbmap to discover short + long indels)
- Extraction of variants (samtools mpileup + custom R scripts)
Output
- bbmap_indexes: Contains the indices generated for each amplicon sequence
- trimmed_reads: Reads trimmed for quality/adapter sequences using trimmomatic
- aln: contains the BAM files and
samtools stats
output for each sample's alignment (a sub-folder is created for each amplicon) - fastqc: FASTQC output for each sample (a sub-folder is created for each amplicon)
- logs: Log files for each task
- multiqc: MULTIQC output
- bedgraph:
- bed: BED files for the location of detected deletions, a TSV file for deletion counts, a PDF file containing plots for the distribution of deletions w.r.t length of deletions.
- bedgraph: BEDGRAPH files for deletion score profiles of each sample (i.e. percentage of reads overlapping a deletion at single-base resolution), TSV file for cutting efficiency of sgRNA cut sites (percentange of reads with deletions at the speicified cut-sites)