Dissection of artifactual and confounding glial signatures by single cell sequencing of mouse and human brain (Nature Neuroscience)
Samuel E. Marsh1,* , Alec J. Walker, Tushar Kamath1, Lasse Dissing-Olesen, Timothy R. Hammond2, T. Yvanka de Soysa, Adam M.H. Young, Sarah Murphy, Abdulraouf Abdulraouf, Naeem Nadaf, Connor Dufort, Alicia C. Walker, Liliana E. Lucca, Velina Kozareva2, Charles Vanderburg, Soyon Hong, Harry Bulstrode, Peter J. Hutchinson, Daniel J. Gaffney, David A. Hafler, Robin J.M. Franklin, Evan Z. Macosko, & Beth Stevens.
1Performed analysis
2Assisted analysis
*Analysis lead (contact: samuel.marsh@childrens.harvard.edu)
*NOTE* If you do not have institutional access to the above article please use the request link in bibliography here to submit request for copy from corresponding authors.
bioRxiv Preprint
An earlier version of this work appeared in preprint form on bioRxiv. A link to preprint and zip folder of the GitHub repository from the preprint can be found below:
Link to the earlier preprint version of this manuscript here.
A copy of the code/prior repository which contained analyses from preprint can be downloaded in zip form here.
Included is the code necessary to replicate the Seurat or LIGER (or both) objects used for analysis and plotting.
-
Each R file specifies version of Seurat/LIGER used for analysis/object creation.
- Some analyses were performed across multiple versions of Seurat (V2 > V3). In this scenario objects were updated to V3 using
UpdateSeuratObject
- Scripts specify point of upgrade to V3 in regard to analysis or object modification.
- Seurat V2.3.4 source package can be downloaded here from CRAN Archive and installed from local source.
- To maintain consistency, Seurat V3.1.5 was downloaded from CRAN Archive and installed from local source when switching between V2 and V3 was necessary.
- Some analyses were performed across multiple versions of Seurat (V2 > V3). In this scenario objects were updated to V3 using
-
Where possible date of analysis performed prior to is specified. To replicate analyses performed on specific date the following actions are recommended or described in code:
- Use of contained environment using packrat or renv packages. Followed by date-specific version installation of CRAN packages using versions package.
- Archived source versions of specific packages may also be needed depending on version of R and can be downloaded from CRAN archives and installed from local source.
-
LIGER analyses were performed using the in development "online" branch, updating throughout analysis to accommodate bug fixes.
- LIGER analyses also utilize multiple versions of Seurat as specified in code for some of the following situations:
- Seurat V3 used used for data import, QC filtering (genes, UMIs, % mito), and majority of plotting.
- Seurat V2 was used during LIGER analysis workflow to accommodate use of now deprecated
clusterLouvainJaccard
function which relied on Seurat V2 object structure. - Conversion between Seurat and LIGER objects was performed using built in LIGER functions
seuratToLiger
andligerToSeurat
.
- LIGER analyses also utilize multiple versions of Seurat as specified in code for some of the following situations:
-
scCustomize R package was used in a pre-release development form during analysis.
- Some of the function names may be different in this repo compared to their public release form.
- List of functions (and tutorials) for scCustomize can be found at website here.
The data in this project can be broadly divided into 2 categories (7 sub-projects). Please see SI Table 1 & 2 (SI Table 1: Mouse Experiments 1-4) and (SI Table 2; Human Experiments 4-7 & Human Literature Reanalysis) for breakdown by sample, metadata, and more information.
A brief overview with links to the raw data (fastqs) and processed data (Cell Ranger count
Gene Expression Matrices) see table below
Experiment | Species | Seq Used | Description | Raw/Count Data |
---|---|---|---|---|
Exp. 1 | Mouse | scRNA-seq (10X 3' V2) | scRNA-seq of microglia with 4 different dissociation protocols | GSE152183 |
Exp. 2 | Mouse | scRNA-seq (10X 3' V2) | scRNA-seq of all CNS cells with or without inhibitors | GSE152182 |
Exp. 3 | Mouse | scRNA-seq (10X 3' V2) | scRNA-seq of microglia (tail vein PBS injection) | GSE152210 |
Exp. 4 | Mouse | scRNA-seq (10X 3' V3.0 & V3.1) | scRNA-seq of microglia w or w/o Inhibitors (10X Version Analysis) | GSE188441 |
Exp. 5 | Human | snRNA-seq (10X 3' V3.0) | snRNA-seq of post-mortem brain tissue | GSE157760 |
Exp. 6 | Human | snRNA-seq (10X 3' V3.0) | snRNA-seq of surgically resected brain tissue with or without freezing time delay | EGAD00001008541 |
Exp. 7 | Human | scRNA-seq (10X 5' V1) | scRNA-seq | phs002222.v2.p1 |
All proceesed data files represent the output from Cell Ranger count
. Files provided are the "filtered_feature_bc_matrix" (i.e. only containing the barcodes that Cell Ranger called as cells during preprocessing). Information on Cell Ranger version and Genome/Annotation for each experiment can be found in SI Table 1 & 2 as well as individual repository meta data.
Experiments 1-4, 5 (NCBI GEO)
There are 3 processed data files per library:
- GSM*_Sample-Name_barcodes.tsv.gz: corresponds to the cell barcodes (i.e. column names).
- GSM*_Sample-Name_features.tsv.gz: corresponds to the gene identifiers (i.e. row names).
- GSM*_Sample-Name_matrix.mtx.gz: expression matrix in sparse format.
All raw data fastq/BAM files can be downloaded from SRA linked from NCBI GEO records, or from EGA/dbGaP records.
Reanalyzed data from literature is summarized detailed in table below.
aFPKM data and raw fastq files are available via GEO. Raw count matrix was obtained via personal communication with authors.
bOnly a specific subset of samples were used in reanalysis. See reanalysis code for more information.
cData on synapse are post-QC and were used for re-analysis. GEO records contain the all barcodes (unfiltered) HDF5 cellranger output files and fastqs.
iReanalysis of Morabito et al., was also used for calculation of cell type proportions in Liddelow, Marsh, & Stevens et al., 2020 (Trends in Immunology)
Meta data for human data was assembled from published SI Tables, public data on synapse, or restricted access data on synapse
- Compiled publicly available meta data variables for each human dataset can be found in SI Table 2.
- "DUC" in the table indicates data available from synapse following submission and approval of Data Use Certificate.
This study was supported by funding from Cure Alzheimer's Fund (B.S.). Special thanks to authors Tushar Kamath, Tim Hammond, Alec Walker, Lasse-Dissing-Olesen, Velina Kozareva, Evan Macosko, as well other members of Stevens and Macosko labs for helpful discussions and assistance during the analysis phase of this project.
Data Acknowledgements:
The analysis and results published here from Zhou et al., 2020 in whole or in part are based on data obtained from the AMP-AD Knowledge Portal. Samples for this study were provided by the Rush Alzheimer’s Disease Center, Rush University Medical Center, Chicago. Data collection was supported through funding by NIA grants P30AG10161, R01AG15819, R01AG17917, R01AG30146, R01AG36836, U01AG32984, U01AG46152, the Illinois Department of Public Health, and the Translational Genomics Research Institute. Raw data used in analysis here are available from AMP-AD/Synapse database through links provided in table above. Additional ROSMAP data can be requested at https://www.radc.rush.edu.
The analysis and results published here for Morabito et al., 2020 are based on reanalysis of study data downloaded from Synapse as provided by Dr. Vivek Swarup, Institute for Memory Impairments and Neurological Disorders, University of California, Irvine. Data collection was supported through funding UCI Startup funds and American Federation of Aging Research. Raw data used in analysis here are available from the Synapse database through link provided in table above.