-
Notifications
You must be signed in to change notification settings - Fork 13
Home
**If your sequencing (paired-end) read length is >= 100 bases long and your sample was also enriched for circular DNA then it is recommended to use the following script ** "circle_finder-pipeline-bwa-mem-samblaster.sh"
**Note: If your sample was not enriched for circular DNA (like normal ATAC-seq, whole-genome sequencing, etc.) and read length >75 bases long then use the following script ** "circle_finder-pipeline-bwa-mem-samblaster.sh"
**Note: Circle_finder can not be used if your sample was not enriched for circular DNA before library preparation AND length of read <75.
#Usage: bash "Number of processors" "/path-of-whole-genome-file/hg38.fa" "fastq file 1" "fastq file 2" "minNonOverlap between two split reads" "Sample name" "genome build"
#bash /path-of-script-dirctory/microDNA-pipeline-bwa-mem-samblaster.sh 16 /path-of-script-dirctory/hg38.fa 1E_S1_L1-L4_R1_001.fastq.75bp-R1.fastq 1E_S1_L1-L4_R2_001.fastq.75bp-R2.fastq 10 1E hg38
#Arg1 = Number of processors
#Arg2 = Genome or index file "/hdata1/MICRODNA-HG38/hg38.fa"
#Arg3 = fastq file 1 "1E_S1_L1-L4_R1_001.fastq"
#Arg4 = fastq file 2 "1E_S1_L1-L4_R2_001.fastq"
#Arg5 = minNonOverlap between two split reads "10"
#Arg6 = Sample name "1E"
#Arg7 = genome build "hg38"
#####################################################################################################
Welcome to the Circle_finder wiki!
This is a step by step guide to run Circle_Finder (if your sample was enriched for circular DNA and the read length of your paired-end sequencing library is <75 bases).
git clone https://github.com/pk7zuva/Circle_finder.git
cd Circle_finder
In this directory you will find four types of files: 1) *.c 2) *.sh 3) *.txt and 4) C executable that has no extension
Note: Though the "C" executable files are provided it is advisable to make these executable afresh
cc -o ADDRESS2PROFILEPAIREND address2profile.pairend.c
cc -o DIRECT.REPEAT.FINDER1 direct.repeat.finder1.c
cc -o JUNCTIONAL.TAG junctional.tag.c
cc -o LEFT.ALIGNMENT left.alignment.c
cc -o MIDNA_START_END_SCORE midna_start_end_score.c
Step 4: Download the whole genome files and bowtie index files from link given in file "download-link-hg38-and-bowtie-index.txt"
cat download-link-hg38-and-bowtie-index.txt
http://genome.bioch.virginia.edu/CIRCLE_FINDER_MASTER/hg38.fa http://genome.bioch.virginia.edu/CIRCLE_FINDER_MASTER/hg38.1.bt2 http://genome.bioch.virginia.edu/CIRCLE_FINDER_MASTER/hg38.2.bt2 http://genome.bioch.virginia.edu/CIRCLE_FINDER_MASTER/hg38.3.bt2 http://genome.bioch.virginia.edu/CIRCLE_FINDER_MASTER/hg38.4.bt2 http://genome.bioch.virginia.edu/CIRCLE_FINDER_MASTER/hg38.fa.amb http://genome.bioch.virginia.edu/CIRCLE_FINDER_MASTER/hg38.fa.ann http://genome.bioch.virginia.edu/CIRCLE_FINDER_MASTER/hg38.fa.bwt http://genome.bioch.virginia.edu/CIRCLE_FINDER_MASTER/hg38.fa.fai http://genome.bioch.virginia.edu/CIRCLE_FINDER_MASTER/hg38.fa.pac http://genome.bioch.virginia.edu/CIRCLE_FINDER_MASTER/hg38.fa.sa http://genome.bioch.virginia.edu/CIRCLE_FINDER_MASTER/hg38.rev.1.bt2 http://genome.bioch.virginia.edu/CIRCLE_FINDER_MASTER/hg38.rev.2.bt2
Example download command: wget http://genome.bioch.virginia.edu/CIRCLE_FINDER_MASTER/hg38.fa
Step 5: Download the fastq files. Link to download these files is given in file "fastq-file-download-link.txt"
cat fastq-file-download-link.txt
http://genome.bioch.virginia.edu/CIRCLE_FINDER_MASTER/Index11_1.fq http://genome.bioch.virginia.edu/CIRCLE_FINDER_MASTER/Index11_2.fq
bash /path-of-the-"Circle_finder"-directory/microDNA.InOne.sh /path-of-the-"Circle_finder"-directory/hg38 Index11_1.fq Index11_2.fq 24 C4-2 49 10000 /path-of-the-"Circle_finder"-directory &
head microDNA.JT.postmotif.fa chr1 28761 29551 0 1 NOMOTIF
chr1 199385 199915 0 1 GTC
chr1 631932 632604 0 1 NOMOTIF
chr1 632019 632252 1 0 CA
chr1 632112 632242 0 1 T
chr1 889483 890225 4 0 C
chr1 897103 898784 2 0 C
chr1 980217 981339 0 1 G
chr1 982484 982697 1 0 NOMOTIF
chr1 983705 984358 0 2 C
Column 1 "Chromosome name"
Column 2 "start position of circle"
Column 3 "end position of circle"
Column 4 "Number of reads mapping on circle junction from "+" strand"
Column 5 "Number of reads mapping on circle junction from "-" strand"
Column 6 "micro homology (if any) at the junction of circle"
Step 9: If you wish to extract only those circular DNA that has evidence of at least one read mapping on circle junction as "+" and "-" orientation
awk '$4>0 && $5>0' microDNA.JT.postmotif.fa | head
chr1 1069854 1070524 1 2 C
chr1 1069934 1071919 6 2 NOMOTIF
chr1 1070501 1070786 1 2 GAGTC
chr1 1428170 1428595 5 5 NOMOTIF
chr1 1459119 1460224 6 2 NOMOTIF
chr1 1459425 1462380 3 1 GGG
chr1 1495168 1495816 1 3 GG
chr1 1579383 1580962 1 1 GTA
chr1 1667878 1668245 9 6 C
chr1 1772882 1773318 2 3 A