-
Notifications
You must be signed in to change notification settings - Fork 0
/
commands.txt
71 lines (55 loc) · 1.81 KB
/
commands.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
## Commands
This section lists command(s) run by ercc workflow
* Running ercc
ercc workflow checks out spike-in signal (reads) and plots the QC graph(s).
Count reads in fastq file:
```
zcat FASTQ_R1 | paste - - - - | wc -l
```
Align reads to chimeric reference (modified reference also containing all spike-in sequences)
```
bwa mem -t THREADS -M REF_GENOME FASTQ_R1 FASTQ_R2 | samtools view -S - |
cut -f 3 | grep ERCC | sort | uniq -c | sed s/^\ *// > CNV_FILE
```
```
In this custom script we build the RPKM table
python <<CODE
import os
ercc = os.path.expandvars("~{erccData}")
counts = "~{erccCounts}"
total = ~{totalReads}
output_file = "~{basename(erccCounts, '.csv')}_rpkm.csv"
# 1. Open file, read and collect length data
ercc_data = open(ercc, 'r')
length_hash = {}
for nextLine in ercc_data.readlines():
nextFields = nextLine.split("\t")
length_hash[nextFields[0]] = int(nextFields[4])
ercc_data.close()
# 2. Calculate RPKMs, put them in a hash
count_data = open(counts, 'r')
rpkm_hash = {}
count_hash = {}
for nextLine in count_data.readlines():
nextLine = nextLine.strip("\n")
countChunk = nextLine.split(" ")
if len(countChunk) == 2 and countChunk[1] in length_hash.keys():
count_hash[countChunk[1]] = int(countChunk[0])
pm = int(total)/1000000.0
for e in sorted(length_hash.keys()):
rpkm = 0.0
if e in count_hash.keys():
rpkm = count_hash[e]/pm/(length_hash[e]/1000.0)
rpkm_hash[e] = "{0:.2f}".format(rpkm)
count_data.close()
# 3. print out RPKMS sorted by
f = open(output_file, "w+")
for e in sorted(rpkm_hash.keys()):
f.write('\t'.join([e, rpkm_hash[e]]) + '\n')
f.close()
CODE
```
Report Making using oputputs from the other steps:
```
Rscript IMAGING_SCRIPT RPKM_TABLE CONTROL_DATA ERCC_DATA PREFIX DoseResponse SAMPLES
```