Single cell transcriptional and chromatin accessibility profiling redefine cellular heterogeneity in the adult human kidney
*Yoshiharu Muto, *Parker C. Wilson, Nicolas Ledru, Haojia Wu, Henrik Dimke, Sushrut S. Waikar, Benjamin D. Humphreys
*These authors contributed equally
If you use any code or workflows in this repository please cite the following manuscript
Muto Y, Wilson PC, Ledru N, Wu H, Dimke H, Waikar SS, Humphreys BD.
Single cell transcriptional and chromatin accessibility profiling redefine cellular heterogeneity in the adult human kidney.
Nat Commun. 2021 Apr 13;12(1):2190.
doi: 10.1038/s41467-021-22368-w.
PMID: 33850129.
The code associated with this publication has been deposited in Zenodo
The bioRxiv preprint can be found here
Raw fastq files and cellranger v4.0 / cellranger-atac v1.2 gene and peak count matrices can be found in GEO or Human Cell Atlas at the links below. Note that there are two different sequencing configurations for the snATAC libraries. Samples 1-3 have an I1 index file, whereas samples 4-5 do not. The I1 index files are not needed to run cellranger-atac count on samples 4-5. Index sequences used for demultplexing are printed in the header lines of R1 and R2. R1 and R3 contain the paired-end reads and R2 contains the cell barcode information.
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE151302
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE131882
https://data.humancellatlas.org/explore/projects/2af52a13-65cb-4973-b513-39be38f2df3f
h5ad files for the snRNA and snATAC libraries were processed and annotated for the development of the single cell analysis tool GLUE. The authors of GLUE have hosted the processed datasets for their manuscript on their lab website. You can find more information about how they processed the data here and you can download the files in their repository.
snATAC barcodes and snRNA barcodes used for the original analysis can be found in this github repository
Finalized R objects and h5ad files can be visualized interactively or downloaded from the cellxgene website. The snATAC object only includes the gene activity "RNA" assay and does not have a raw or normalized peak count matrix.
Welcome to our GitHub repository!
Here you will find analysis scripts for our manuscript where we integrate paired snRNAseq and snATACseq from 5 healthy adult kidney cortex samples. Please contact the corresponding author, Dr. Benjamin Humphreys, with questions or comments.
Thanks,
Parker and Yoshi
Visit the Wilson lab website:
www.parkerwilsonlab.com
Visit the Humphreys lab website:
www.humphreyslab.com
Check out our interactive datasets with Kidney Interactive mulTiomics (KIT):
http://humphreyslab.com/SingleCell/
Find us on Twitter:
@parkercwilson
@YoshiharuMuto
@HumphreysLab
Find us on Docker Hub:
p4rkerw@dockerhub
Sample analysis and processing workflow
-
Generate a custom pre-mRNA index for cellranger (snRNA_prep)
Libraries were generated from a nuclear dissociation and require a custom pre-mRNA reference to count introns. We used refdata-cellranger-GRCh38-3.0.0 which can be downloaded from the 10X genomics website: https://support.10xgenomics.com/ . The gtf file is processed with cellranger_rna_mkref.sh to create the GRCh38-1.2.0_premrna reference. -
Count each of the five snRNA libraries with cellranger and GRCh38-1.2.0 (snRNA_prep)
cellranger_rna_count.sh -
Aggregate the five snRNA libraries using the cellranger_rna_aggr.csv file (snRNA_prep)
cellranger_rna_aggr.sh -
Process the aggregated snRNA library and exclude doublets identified with DoubletFinder. This script will generate a processed snRNA library file rnaAggr_control.rds (snRNA_prep)
seurat_rna_process.sh -
Count each of the five snATAC libraries with cellranger-atac and refdata-cellranger-atac-GRCh38-1.2.0 (snATAC_prep)
cellranger_atac_count.sh -
Aggregate the five snATAC libraries using the cellranger_atac_aggr.csv file (snATAC_prep)
cellranger_atac_aggr.sh -
Processs the aggregated snATAC library and integrate with the snRNA library to remove doublets with label transfer. This script will generate a processed snATAC library file atacAggr_control.rds (snATAC_prep)
signac_atac_process.sh -
Find cell-specific differentially expressed genes in the snRNA dataset (analysis)
find_rna_deg.R -
Find cell-specific differentially accessible chromatin in the snATAC dataset (analysis)
find_atac_dar.R -
Compare snATAC peaks with dnase hypersensitive sites (analysis)
find_atac_overlap_dnase.R -
Find differentially expressed genes with nearby differentially accessible chromatin regions (analysis)
find_overlap_deg_dar.R -
Find cis-coaccessibility networks in the snATAC dataset with Cicero (analysis)
find_atac_ccans.R -
Annotate cis-coaccessibility networks and create circos plots to visualize links (analysis)
annotate_ccans_circos.R -
Annotate cis-coaccessibility networks to find promoter-enhancer links (analysis)
annotate_peaks_fantom.R -
Annotate cis-coaccessibility networks to find overlap with the GeneHancer database (analysis) get_atac_dar_genehancer_conns.R find_atac_overlap_ccans_genehancer.R
-
Find transcription factor motif activity in the snATAC dataset with chromVAR (analysis)
find_atac_chromVAR.R -
Correlate transcription factor motif activity in the snATAC dataset with gene expression in the snRNA dataset (analysis)
corr_chromVar_TF_exp.R -
Correlate cicero gene activity in the snATAC dataset with gene expression in the snRNA dataset (analysis)
corr_cicero_gene_exp.R -
Perform pseudotemporal ordering of the distal nephron with Monocle (analysis)
pseudotime_distal_nephron.R
Deconvolution:
Each dataset is first prepared as an ExpressionSet and then deconvolved with Bisque using the snRNA library. When necessary, transcript counts are prepared with Salmon and GRCh38.
-
Fan et al Human Diabetic Nephropathy (PMID:31578193, GSE142025)
salmon_count_fan.R
find_bulk_degs_fan.R
deconvolution_fan_BisqueRNA.R -
Liu et al Mouse Kidney IRI (PMID:24569379, GSE98622)
find_bulk_degs_liu.R
deconvolution_liu_BisqueRNA.R -
TCGA non-tumor Kidney
find_bulk_degs_tcga.R
deconvolution_tcga_BisqueRNA.R
Allele Specific Analysis: The majority of the allele-specific analysis workflow has been implemented as a separate GitHub repository called 🌶️SALSA . You can check it out here
These scripts can be run in publicly-available docker containers found here: p4rkerw@dockerhub
Each script has an example command to run the corresponding docker container
(Follow the steps in order)
- Genotype the snRNA or snATAC libraries using GATK (or obtain a vcf from another method)
- Annotate the genotyped vcf with GATK Funcotator to evaluate gnomAD MAF and variant context
- Process the variant annotation into a csv file
- Filter the genotyped vcf for variants that overlap coding transcripts and introns
- Apply the WASP pipeline to cellranger-aligned bam files and realign overlapping variants with STAR
- Get allele-specific counts with GATK ASEReadCounter
- Filter heterozygous SNV and perform allele-specific analysis with ASEP
Utility Scripts:
getContigLengths.py
make_ucsc_tracks.R
Figures:
Code for generating figures in the manuscript