31 Jul 08:52

borauyar

version 0.1.0 Latest

Latest

This release corresponds to the version that was described in the pre-print Froehlich & Uyar et al on Biorxiv.

Assets 2

22 May 12:35

borauyar

0.0.2

News for version 0.0.2

Now support both insertions and deletions
Integrated Rmarkdown reports with interactive plotly figures
- One report per amplicon
Integrated Rmarkdown reports for pairwise comparison of samples to discover sites that show differential peaks of indel efficiencies in a case-control setting.
Folders bed and bedgraph replaced with folder indels, which contains:
- bedgraph files for insertions, deletions, and indels (combined insertions/deletions)
- bed files for top insertions, deletions (sorted by read support per genotype)
- indel stats at per-base resolution as a raw tsv file
- indel stats at expected cut sites
reports folder now contains one report per each amplicon. Also a sub-folder called comparisons is produced if a comparisons.tsv file is provided, which contains pairwise samples that a user desires to compare (see sample_data/comparisons.tsv file as an example).
The pipeline now uses R libraries (GenomicAlignments) to extract indel stats from bam files rather than using samtools mpileup.

Assets 2

23 Apr 09:47

borauyar

0.0.1

The pipeline currently supports single-end reads.

Inputs

Sample sheet (.csv format) consisting of 4 fields

sample_name: unique name of the sequenced sample
amplicon: name of the targeted amplicon (seqname)
reads: name of the zipped fastq file of the sequenced sample
sgRNA_ids: Column (:) separated list of the ids of guide RNAs used to target the amplicon

Settings file (.yaml format)

For each amplicon (as identified in the sample sheet)
- Fasta format sequence of the amplicon
- Cutsites (text file with sgRNA id and position of expected cut site)
Location of the sample sheet
Folder containing the fastq files

Rules

Quality improvement and control (trimmomatic + fastqc + multiqc)
Mapping the reads against the amplicon sequence (using bbmap to discover short + long indels)
Extraction of variants (samtools mpileup + custom R scripts)

Output

bbmap_indexes: Contains the indices generated for each amplicon sequence
trimmed_reads: Reads trimmed for quality/adapter sequences using trimmomatic
aln: contains the BAM files and samtools stats output for each sample's alignment (a sub-folder is created for each amplicon)
fastqc: FASTQC output for each sample (a sub-folder is created for each amplicon)
logs: Log files for each task
multiqc: MULTIQC output
bedgraph:
bed: BED files for the location of detected deletions, a TSV file for deletion counts, a PDF file containing plots for the distribution of deletions w.r.t length of deletions.
bedgraph: BEDGRAPH files for deletion score profiles of each sample (i.e. percentage of reads overlapping a deletion at single-base resolution), TSV file for cutting efficiency of sgRNA cut sites (percentange of reads with deletions at the speicified cut-sites)

Assets 2