This data engineering pipeline is designed to treat single-cell chromatin Immuno-Precipitation sequencing from raw reads (fastq, paired end) to exploitable count matrix. The multiple steps involved in the pipeline are :
# The Pipeline- 0. Creation of config file
- 1. Cell barcode mapping
- 2. Trimming
- 3. Genomic mapping
- 4. Assignation of cell barcodes to mapped read
- 5. Removal of Reverse Transcription (RT) & Polymerase Chain Reaction (PCR) duplicates
- 6. Removal of reads based on window screening (if Read2 was unmapped)
- 7. Counting (Generation of count matrix)
- 8. Generation of coverage file (bigwig)
- 9. Reporting
- 10. Downstream R automatic analysis
usage : schip_processing All -f FORWARD -r REVERSE -o OUTPUT -c CONFIG [-d] [-h] [-v]
[Sub-Commands]
All Execute the entire pipeline based on CONFIG file
GetConf [PreRun] Complete a configuration template based on the genome assembly and the design type
--version : print version
---------------
All
-f|--forward R1_READ: forward fastq file
-r|--reverse R2_READ: forward fastq file
-c|--conf CONFIG: configuration file for ChIP processing
-o|--output OUTPUT: output folder
-n|--name NAME: name given to samples
-s|--downstreamOutput R analysis downstream output: if present, will run downstream analysis in given dir
-u|--override : Override defined arguments (semicolon-separated (;)) from config file (i.e: 'MIN_MAPQ=0;MIN_BAPQ=10') [optional]
[-d|--dryrun]: dry run mode
[-h|--help]: help
[-v|--version]: version
GetConf
-T/--template : Pipeline config template
-C/--configFile : Config description file
-D/--designType : Design type
-G/--genomeAssembly : Genome assembly
-O/--outputConfig : Output config file
-O/--mark : Histone mark : either 'h3k27me3', 'h3k4me3' or 'unbound'.
-B/--targetBed : Target BED file
Depending on your Bead type (Hifibio or LBC), your genomeAssembly (hg38, mm10), your bed target file.
Example :
cd ~/GitLab/ChIP-seq_single-cell_LBC_PAIRED_END_3.4/
ASSEMBLY=hg38
OUTPUT_CONFIG=/data/tmp/pprompsy/results/CONFIG_LBC
MARK=h3k27me3
TARGET_BED=/data/users/pprompsy/Annotation/bed/hg38.G5k.bed
./schip_processing.sh GetConf --template CONFIG_TEMPLATE --configFile species_design_configs.csv --designType LBC --genomeAssembly ${ASSEMBLY} --outputConfig ${OUTPUT_CONFIG} --mark ${MARK} --targetBed ${TARGET_BED}
OUTPUT_DIR=/data/tmp/pprompsy/results/test
DOWNSTREAM_DIR=/data/tmp/pprompsy/results/
NAME=test
READ1=/data/users/pprompsy/tests/A1082C1.R1.fastq.gz
READ2=/data/users/pprompsy/tests/A1082C1.R2.fastq.gz
./schip_processing.sh All -f ${READ1} -r ${READ2} -c ${OUTPUT_CONFIG} -o ${OUTPUT_DIR} --name ${NAME} -s ${DOWNSTREAM_DIR}
Authors - Pacôme Prompsy (pacome.prompsy@curie.fr), Nicolas Servant date - 20th September 2022