Skip to content

Releases: BIMSBbioinfo/crispr_DART

version 0.1.0

31 Jul 08:52
a2ae2fd
Compare
Choose a tag to compare

This release corresponds to the version that was described in the pre-print Froehlich & Uyar et al on Biorxiv.

0.0.2

22 May 12:35
Compare
Choose a tag to compare

News for version 0.0.2

  • Now support both insertions and deletions
  • Integrated Rmarkdown reports with interactive plotly figures
    • One report per amplicon
  • Integrated Rmarkdown reports for pairwise comparison of samples to discover sites that show differential peaks of indel efficiencies in a case-control setting.
  • Folders bed and bedgraph replaced with folder indels, which contains:
    • bedgraph files for insertions, deletions, and indels (combined insertions/deletions)
    • bed files for top insertions, deletions (sorted by read support per genotype)
    • indel stats at per-base resolution as a raw tsv file
    • indel stats at expected cut sites
  • reports folder now contains one report per each amplicon. Also a sub-folder called comparisons is produced if a comparisons.tsv file is provided, which contains pairwise samples that a user desires to compare (see sample_data/comparisons.tsv file as an example).
  • The pipeline now uses R libraries (GenomicAlignments) to extract indel stats from bam files rather than using samtools mpileup.

0.0.1

23 Apr 09:47
Compare
Choose a tag to compare

The pipeline currently supports single-end reads.

Inputs

  1. Sample sheet (.csv format) consisting of 4 fields
  • sample_name: unique name of the sequenced sample
  • amplicon: name of the targeted amplicon (seqname)
  • reads: name of the zipped fastq file of the sequenced sample
  • sgRNA_ids: Column (:) separated list of the ids of guide RNAs used to target the amplicon
  1. Settings file (.yaml format)
  • For each amplicon (as identified in the sample sheet)
    • Fasta format sequence of the amplicon
    • Cutsites (text file with sgRNA id and position of expected cut site)
  • Location of the sample sheet
  • Folder containing the fastq files

Rules

  • Quality improvement and control (trimmomatic + fastqc + multiqc)
  • Mapping the reads against the amplicon sequence (using bbmap to discover short + long indels)
  • Extraction of variants (samtools mpileup + custom R scripts)

Output

  • bbmap_indexes: Contains the indices generated for each amplicon sequence
  • trimmed_reads: Reads trimmed for quality/adapter sequences using trimmomatic
  • aln: contains the BAM files and samtools stats output for each sample's alignment (a sub-folder is created for each amplicon)
  • fastqc: FASTQC output for each sample (a sub-folder is created for each amplicon)
  • logs: Log files for each task
  • multiqc: MULTIQC output
  • bedgraph:
  • bed: BED files for the location of detected deletions, a TSV file for deletion counts, a PDF file containing plots for the distribution of deletions w.r.t length of deletions.
  • bedgraph: BEDGRAPH files for deletion score profiles of each sample (i.e. percentage of reads overlapping a deletion at single-base resolution), TSV file for cutting efficiency of sgRNA cut sites (percentange of reads with deletions at the speicified cut-sites)