Here we provide source code for a clustering and visualization method for single-cell RNA-seq data sets, that exploits path metrics. Using this method, distances between cells are measured in a data-driven way which is both density sensitive (decreasing distances across high density regions) and respects the underlying data geometry. By combining path metrics with multidimensional scaling, a low dimensional embedding of the data is obtained which respects both the global geometry of the data and preserves cluster structure.
- Adjusted Rand index Matlab function version 1.0.0.0
- Munkres Assignment Algorithm Matlab function
- Uniform Manifold Approximation and Projection (UMAP) Matlab function >= version 1.5.2
- Matlab version 2021a
-
To reproduce results of processed data sets:
- Use the link in Processed data sets folder to access download data sets.
- Run RunPathMetrics.m using Matlab version 2021a.
-
To produce the simulated beta cells data set:
- Download "GSM2230757_human1_umifm_counts.csv.gz" from here
- Run Simulationscript.R
-
To process the data sets used in results:
- Download the real data sets using the links mentioned on “processing and imputation of data.R”
- Run processing and imputation of data.R
- Scale data
- For Basic scaling: input in Matlab the csv file of the data set of interest produced in step 2, along with the true labels file. Then run Code/ProcessData.m after uncommenting the name of the data set and the line “Normalization = 'Basic';”.
- For Linnorm scaling: run Linnorm transformation.R for the data set of interest produced in a. Then input in Matlab the output csv file along with the true labels file. Then run Code/ProcessData.m after uncommenting the name of the data set and the line “Normalization = 'Linnorm';”. Now you can use the output to run Code and data sets/Code_for_github/RunPathMetrics.m.
- For SCT scaling: run sctranform_for_PM_after_SAVER_imputation.R for the data set of interest produced in a. Then input in Matlab the output csv file along with the true labels file. Then run Code/ProcessData.m after uncommenting the name of the data set and the line “Normalization = 'SCT';”. Now you can use the output to run Code/RunPathMetrics.m.
-
Other files description:
- Seurat_clustering_analysis.R : Apply Seurat clustering on data set produced by processing and imputation of data.R
- Seurat_clustering_analysis_default_res: Apply Seurat clustering with resolution 0.8 on data set produced by processing and imputation of data.R
- RunSNNClustering.R: Apply SNN clustering on data sets after Basic or SCT scaling (created as described in 3b.)
- clustering_evaluation.R: Function that produces ARI, Entropy of cluster accuracy and Entropy of cluster Purity for a clustering.
- numerical_to_word_labels.xlsx: For interpretation purposed we provide the correspondence of numerical labels and descriptive labels for the data sets used for the results.