Skip to content

Latest commit

 

History

History
235 lines (180 loc) · 7.81 KB

vignette.md

File metadata and controls

235 lines (180 loc) · 7.81 KB

testisAtlas: Vignette (Tutorial)

Daniel Wells 2020-01-25

This tutorial guides you through how to use the testisAtlas package to explore the single cell RNAseq data from mice testis from our publication. If you just want the QC counts data to explore with your own functions you can find it on Zenodo.

Download the data (250MB)

download.file("https://zenodo.org/record/3233870/files/SDA_objects.zip", "SDA_objects.zip")
unzip("SDA_objects.zip", exdir="SDA_Objects")

Install (if not allready) & load

# remotes::install_github("myersgroup/testisAtlas")
library(testisAtlas)
## Loading testisAtlas
load2("SDA_Objects") # convenience function to load all rds objects in a folder
load_component_orderings()

Inspect the top genes for a given component (with metadata):

gene_list(5, n=5)
##    gene_symbol Human_Orthologue   Loading Infertility_Gene Testis_Enriched
## 1:      Gm7972             <NA> -0.559250               NA              NA
## 2:       Prdm9            PRDM9 -0.475946             TRUE            TRUE
## 3:      Zcwpw1           ZCWPW1 -0.426587            FALSE            TRUE
## 4:        Ddb2             DDB2 -0.425611            FALSE           FALSE
## 5:      Prss50           PRSS50 -0.396677            FALSE            TRUE
##    fold_enrichment_gtex fold_enrichment_fantom chromosome_name
## 1:                   NA                     NA               9
## 2:            8.1790000                  0.222              17
## 3:            9.1540688                  1.430               5
## 4:            0.2635425                  0.250               2
## 5:            2.3779343                  2.800               9
##    start_position
## 1:       75468616
## 2:       15543079
## 3:      137787798
## 4:       91211572
## 5:      110857967

Plot gene expression or cell scores in tSNE space:

print_tsne("Prdm9")

print_tsne(5) # component 5 cell scores

Use the predict argument to use SDA imputed expression values. Internally calls the sda_predict() function.

print_tsne("Prdm9", predict = T)

By default the axes are tSNE coordinates, but you can use any column in cell_data using the dim1/2 arguments, for example to use UMAP projection or to plot components by e.g. pseudotime:

print_tsne("Prdm9", dim1 = "Umap1", dim2 = "Umap2")

print_tsne("Zfy2", dim1 = "msci_ratio", dim2 = "V38", predict = T)

print_tsne("Ssxb1", dim1 = "PseudoTime", dim2 = "V42", predict = T) + scale_y_reverse() + scale_x_reverse() 
## Warning: Removed 3417 rows containing missing values (geom_point).

To plot multiple gene expressions on a single tSNE there’s a seperate function:

tricolour_tsne(c("Prdm9","Esx1","Piwil1"))

To get imputed expression with pseudotime/Tsne/experiment information use the melt_genes() function:

tmp <- melt_genes(c("Prdm9","Ssxb1"), predict = T)

ggplot(tmp, aes(-PseudoTime, Expression)) +
  geom_point(stroke=0, size=1, alpha=0.3) +
  facet_wrap(~Gene) +
  theme_minimal()
## Warning: Removed 6834 rows containing missing values (geom_point).

Visualise gene loadings in a manhatten style plot:

genome_loadings(SDAresults$loadings[[1]][5,], gene_locations = gene_annotations)
## Using mmusculus_gene_ensembl chromosome lengths

Which components have the highest loading for gene X:

highest_components(SDAresults, "Prdm9")

Plot component scores by experimental group:

plot_cell_scores("V38", point_size = 2)

Check (precomputed) Gene Ontology enrichments for a component:

head(GO_enrich[Component=="V30P"])
##            ID
## 1: GO:0016998
## 2: GO:0044036
## 3: GO:0071554
## 4: GO:0007338
## 5: GO:0009566
## 6: GO:0007342
##                                                                Description
## 1:                               cell wall macromolecule catabolic process
## 2:                               cell wall macromolecule metabolic process
## 3:                                    cell wall organization or biogenesis
## 4:                                                    single fertilization
## 5:                                                           fertilization
## 6: fusion of sperm to egg plasma membrane involved in single fertilization
##    GeneRatio   BgRatio       pvalue     p.adjust       qvalue
## 1:     8/226  10/14566 1.300596e-13 1.469673e-10 1.469673e-10
## 2:     8/226  11/14566 4.705388e-13 1.772363e-10 1.772363e-10
## 3:     8/226  11/14566 4.705388e-13 1.772363e-10 1.772363e-10
## 4:    16/226 108/14566 9.543034e-12 2.695907e-09 2.695907e-09
## 5:    17/226 147/14566 1.209353e-10 2.733138e-08 2.733138e-08
## 6:     6/226  12/14566 1.116134e-08 2.102052e-06 2.102052e-06
##                                                                                                          geneID
## 1:                                                  Spaca1/Lyzl6/Spaca3/Lyzl1/Lyzl4/Spaca7/Spaca4/1700016D06Rik
## 2:                                                  Spaca1/Lyzl6/Spaca3/Lyzl1/Lyzl4/Spaca7/Spaca4/1700016D06Rik
## 3:                                                  Spaca1/Lyzl6/Spaca3/Lyzl1/Lyzl4/Spaca7/Spaca4/1700016D06Rik
## 4:       Spata46/Lyzl6/Spaca3/Cast/Spam1/Lyzl4/Fam170b/Spaca7/Zp3r/Eqtn/Glipr1l1/Adam24/Hyal5/Izumo1/H3f3b/Zpbp
## 5: Spata46/Lyzl6/Spaca3/Cast/Spam1/Lyzl4/Fam170b/Spaca7/Zp3r/Eqtn/Glipr1l1/Tekt4/Adam24/Hyal5/Izumo1/H3f3b/Zpbp
## 6:                                                                        Spata46/Lyzl6/Spam1/Eqtn/Hyal5/Izumo1
##    Count Enrichment   GeneOdds       BgOdds Component
## 1:     8  51.561062 0.03669725 0.0006870019      V30P
## 2:     8  46.873693 0.03669725 0.0007557540      V30P
## 3:     8  46.873693 0.03669725 0.0007557540      V30P
## 4:    16   9.548345 0.07619048 0.0074699129      V30P
## 5:    17   7.453555 0.08133971 0.0101948818      V30P
## 6:     6  32.225664 0.02727273 0.0008245156      V30P
go_volcano_plot("V30P", top_n = 10)
## Warning: Removed 565 rows containing missing values (geom_point).

Test all components for enrichment of a custom gene list:

gene_list <- c("Prdm9","Zcwpw1","Meiob","Dmc1","Esx1")
tmp <- component_enrichment(gene_list)
tmp[order(p.value)][1:5]
##    component      p.value        OR hits               hit_names
## 1:        5N 2.197426e-06 150.97100    4 Prdm9,Zcwpw1,Dmc1,Meiob
## 2:        2P 1.672239e-04  56.54825    3       Zcwpw1,Meiob,Dmc1
## 3:       37N 1.672239e-04  56.54825    3       Zcwpw1,Dmc1,Meiob
## 4:       38N 1.672239e-04  56.54825    3      Zcwpw1,Meiob,Prdm9
## 5:       13P 6.383884e-03  25.08346    2              Meiob,Dmc1
##                      name rank    Type
## 1:         (pre)Leptotene   56 Meiotic
## 2:        B Spermatogonia   51 Meiotic
## 3:   Sertoli (Trim7 High)   18 Somatic
## 4: X activation (Hormad1)   60 Meiotic
## 5:      Early Pachytene 1   65 Meiotic
manhatten_plot(tmp)