-
Notifications
You must be signed in to change notification settings - Fork 0
/
full_analysis.Rmd
217 lines (141 loc) · 6.61 KB
/
full_analysis.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
---
title: "mTEC development analysis"
author: "Kristen Wells"
date: "May 29, 2019"
output: html_document
---
## Contents
[Loading in the Data](#data)
[Aire Trace Processing](#aireTrace)
[Control Cell Processing](#controlCells)
[Make Figures](#figures)
This is a document that can be used to reproduce all analysis from the (eventual paper). While all steps are included, some steps take a long time and it is faster to just load in the provided R object.
For best reproducability, run this analysis using the Snakemake pipeline provided as package differences can make minor changes to clustering and dimensionality reduction.
If you would like to skip the data processing and just make figures from pre-made objects, go [here](#figures)
All analysis depends on loading the mTEC.10x.pipeline r package
```{r}
library(mTEC.10x.pipeline)
```
## Loading in the data
<a name = "data">
</a>
*In the Snakemake pipeline, this correponds to rule seurat_object and runs the script create_seurat.R*
All data is provided in the form of empty R Seurat objects that were made by running the command
```{r, eval = FALSE}
mtec_data <- Seurat::Read10X(data.dir = sequence_dir)
mtec <- CreateSeuratObject(raw.data = mtec_data, min.cells = 3, min.genes = 200,
project = project_name)
```
where "sequence_dir" is the path to the output of cellranger count and project_name = "RankL_ablation"
## Aire trace processing
<a name = "aireTrace">
</a>
*In the Snakemake pipeline, this correponds to rule final_analysis and runs the script analysis_driver.R found in the aireTrace/scripts directory*
This is the processing that was done to create the aireTrace Seurat object. If you would rather skip this processing, skip to [figures](#figures) section.
load in necessary packages
```{r}
library(Seurat)
library(dplyr)
library(scran)
library(DDRTree)
```
load in empty aireTrace object and gene lists
```{r}
# Change this to load from the mtec.10x.data package
load("/home/kwells4/mTEC_dev/mtec_snakemake/aireTrace/analysis_outs/seurat_aireTrace_empty.rda")
data_dir <- "/home/kwells4/mTEC_dev/data/"
load(paste0(data_dir, "TFs.rda"))
load(paste0(data_dir, "gene_to_ensembl.rda"))
```
Set plotting colors
```{r}
stage_color_df <- data.frame("Cortico_medullary" = "#CC6600",
"Ccl21a_high" = "#009933",
"Early_Aire" = "#0066CC",
"Aire_positive" = "#660099",
"Late_Aire" = "#FF0000",
"Tuft" = "#990000",
"unknown" = "#666666")
stage_color <- t(stage_color_df)[ , 1]
stage_levels <- c("Cortico_medullary", "Ccl21a_high", "Early_Aire",
"Aire_positive", "Late_Aire", "Tuft", "unknown")
```
Initial processing. This is run through the mTEC.10x.pipeline package, but combines the initial processing steps recommended by Seurat
```{r}
# Add mitochondiral percent to the meta data
mtec <- add_perc_mito(mtec)
# Plot quality plots
qc_plot(mtec)
# remove low quality cells, normalize and scale data, find variable genes, and perform PCA
mtec <- process_cells(mtec)
# Plot a PCA and determine dimensions to use by looking at heatmaps and elbow plots
PC_plots(mtec, jackstraw = FALSE, test_pcs = 1:20)
```
If running on a cluster and if you have the time, you can make a jackstraw plot to test the number of PCs to keep. Run using:
```{r, eval = FALSE}
PC_plots(mtec, jackstraw = TRUE, test_pcs = 1:20)
```
After determining the number of PCs to use from the PC_plots outputs, clustering and dimensional reduction can be performed using the appropriate number of PCs. Here, we used the first 12 PCs.
```{r}
# Decide PCs based on output of jackstraw plot
mtec <- group_cells(mtec, dims_use = 1:12)
```
Now that we have clusters, we can find marker genes of each cluster to be able to identify what cell types are contained in each cluster.
```{r}
mtec.markers <- FindAllMarkers(object = mtec, only.pos = TRUE, min.pct = 0.25,
thresh.use = 0.25)
# Only print the top 10 markers of each cluster
top10 <- mtec.markers %>% group_by(cluster) %>% top_n(10, avg_logFC)
print(top10)
DoHeatmap(object = mtec, genes.use = top10$gene, slim.col.label = TRUE, remove.key = TRUE)
```
Use the gene list to name clusters. We chose to merge clusters 0, 1, and 2 because of their highly similar gene expression patterns, especially levels of Ccl21a expression.
```{r}
stage_list <- c("0" = "Ccl21a_high", "1" = "Ccl21a_high", "2" = "Ccl21a_high",
"3" = "Late_Aire", "4" = "Cortico_medullary",
"5" = "Early_Aire", "6" = "Aire_positive",
"7" = "Tuft", "8" = "unknown")
mtec <- set_stage(mtec, stage_list)
mtec@meta.data$stage <- factor(mtec@meta.data$stage,
levels = stage_levels)
```
Now we can find markers of each stage of development and determine cell cycle phase
```{r, eval = FALSE}
# These steps take a long time. Only run if you have processing power.
mtec <- Seurat::StashIdent(mtec, save.name = "seurat_cluster")
mtec <- Seurat::SetAllIdent(mtec, id = "stage")
mtec@ident <- factor(mtec@ident, levels = stage_levels)
mtec <- significant_markers(mtec)
mtec <- run_cyclone(mtec, gene_to_ensembl)
```
Check the data visually
```{r}
load("/home/kwells4/mTEC_dev/mtec_snakemake/aireTrace/analysis_outs/seurat_aireTrace.rda")
tSNE_PCA(mtec, "seurat_cluster", PCA = TRUE)
tSNE_PCA(mtec, "cluster", PCA = TRUE, color = stage_color)
tSNE_PCA(mtec, "Aire")
tSNE_PCA(mtec, "Ccl21a")
tSNE_PCA(mtec, "Mki67")
tSNE_PCA(mtec, "Ascl1")
tSNE_PCA(mtec, "Hmgb2")
tSNE_PCA(mtec, "Dclk1")
tSNE_PCA(mtec, "Fezf2")
tSNE_PCA(mtec, "cycle_phase", color = c("black", "red", "purple"))
```
## Control cell processing
<a name = "controlCells">
</a>
### Combine control samples
*In the Snakemake pipeline, this correponds to rule combine_controls and runs the script create_combined_seurat.R found in the controls/scripts directory*
### Analyze control samples
*In the Snakemake pipeline, this correponds to rule combine_controls and runs the script analysis_driver.R found in the controls/scripts directory*
## Making the paper figures
<a name = "figures">
</a>
*In the Snakemake pipeline, this correponds to rule combine_controls and runs the script figures.R found in the scripts directory*
If you haven't followed the steps outlined above to pre-process the data and would just like to make figures using the final seurat objects, run this command
```{r}
# Change this to load from mTEC.10x.data
load("/home/kwells4/mTEC_dev/mtec_snakemake/aireTrace/analysis_outs/seurat_aireTrace.rda")
```
This loads in complete Seurat objects that were created using the Snakemake pipeline.