Skip to content

Dynamo workflow

Xiaojie Qiu edited this page Feb 21, 2020 · 18 revisions

dynamo_workflow

In the following, you can see python snippets that highlight each of the main steps in dynamo. Note: this workflow will need to be updated (no later than Feb. 23, 2020), you may need to pick up one notebook tutorial from https://github.com/aristoteleo/dynamo-tutorials to get started for now.

Surely the first step in applying dynamo is to load some datasets of interests. Dynamo uses AnnData as its container so you can load loom, h5 or h5ad, etc. files that generated from Kallisto + bustool, velocyto, scanpy or even 10x cellranger (because dynamo can be reduced as a regular scRNA-seq analysis toolkit) via read_loom, read_h5, read_h5ad. Dynamo also provides a function (read_NASC_seq) to load output files from the NASC-seq pipeline that applied to single cell based SLAM-seq experiments. If you want to learn protein velocity, you should then attach the protein abundance data to the .obsm attribute of the AnnData object via the protein key.

# first step: Load data 
import dynamo as dyn 
# set matplotlib's rcParams. this setting tries to emulate ggplot style. 
dyn.configuration.set_figure_params('dynamo')  
# you can read your own data via read_loom, read_h5ad or read_h5
adata = dyn.read_loom(filename, **param) # (or use read_h5ad, read_NASC_seq to integrate with scanpy work flow, load result generated from NASC-seq pipeline, etc.)
# here let us play with Dentate Gyrus example dataset (27, 998 genes and 18, 213 cells)
adata = dyn.sample_data.DentateGyrus()  # there are many sample datasets available. 

Next you may want to first check the fraction of the spliced, unspliced or new (metabolic labeled) or total mRNAs in your data by show_fraction. Then you are ready to run the recipe_monocle function that performs preprocessing of the data in a single shot. recipe_monocle uses similar strategy from Monocle 3 to normalize all datasets in different layers (the spliced and unspliced or new (metabolic labeled) and total mRNAs or others). You can check the mean-dispersion plot to see where do the automatically selected feature genes locate and also the variance explained plot to take a glance at the variance explained by principal components.

# second step: preprocess
dyn.pl.show_fraction(adata)
dyn.pp.recipe_monocle(adata, n_top_genes=3000)
dyn.pl.featureGenes(adata)
dyn.pl.variance_explained(adata)

Next you will want to learn the velocity values for all genes that pass some filters (default it is all the selected feature genes) across cells. The dyn.tl.dynamics does all the hard work for you. It automatically checks the data you have and learns the velocity vectors accordingly. For example, if you have scSLAM-seq data (identified by the new and total or uu, ul, su and sl layers, etc.), dynamo will learn the transcription, splicing and degradation rates for each gene. We developed two mode of estimation, either the steady_state assumption (from the seminal RNA velocity work) or the moment model (see the full derivation of this model).

# third step: learning dynamics
dyn.tl.dynamics(adata)

You would like to also use different dimension reduction approaches to reduce your scRNA-seq into low dimensional embedding. By default, we use the novel trimap dimension reduction method, which uses triplet constraints to form a low-dimensional embedding of a set of points. trimap is arguably more scalable and better at preserving the global structure of the data than UMAP. Note that we also generalized diffusion map so we can performing dimension reduction via diffusion while also considering drift. A class of structural learning based dimension reduction methods from us will also be supported shortly.

# fourth step: reduce dimension 
dyn.tl.reduceDimension(adata, velocity_key='velocity_S',reduction_method='trimap')

You love to see the velocity vector on low dimensional embedding. To get there, you want to first apply our improved transition matrix reconstruction method. In contrast to the "correlation kernel" from velocyto or scVelo, dynamo is powered by the Itô kernel that not only considers the correlation between the vector from any cell to its nearest neighbors and its velocity vector but also the corresponding distances. We expect this new kernel will enable us to visualize more intricate vector flow or steady states in low dimension. We also expect it will improve the calculation of the stationary distribution or source states of sampled cells.

# fifth step: project velocity 
dyn.tl.cell_velocities(adata, vkey='pca', basis='trimap', method='analytical')

At this stage, you may already cannot wait to see how does the velocity vector looks like in low dimension space. Similar to velocyto and scvelo, we provide three plotting utilities that visualize the cell-wise velocity vector, velocity vector on a grid or the stream line plot that integrates along grid velocity vectors via fourth-order Runger-Kutta algorithm.

# sixth step: visualize vector field
basis='trimap'
gene_list = ['Rgs20', 'Eya1']
dyn.pl.cell_wise_velocity(adata, genes=gene_list, basis=basis, n_columns=3) 
dyn.pl.grid_velocity(adata, genes=gene_list, basis=basis, n_columns=2, figsize=[8, 8], method='Gaussian')  
dyn.pl.grid_velocity(adata, genes=gene_list,  basis=basis, n_columns=1, figsize=[8, 8], color=['ClusterName'], method='Gaussian')  
dyn.pl.stremline_plot(adata, genes=gene_list, n_columns=2, figsize=[8, 8], density=3, method='Gaussian') 

Obviously we don't want to stop here. Let us move to the real exciting part of dynamo in the next section to learn the velocity vector in the full transcriptomic space and to map the potential landscape.

Dynamo aspires to learn a vector field function in the transcriptomic space. With the learnt vector field, you can then predict the cell fate in high dimension over arbitrary time scale from arbitrary initial cell states. You can experiment it via the dyn.tl.VectorField or dyn.tl.fate function.

# seventh step: learn vector field
dyn.tl.VectorField(adata) 
dyn.pl.grid_velocity(adata, genes=gene_list, basis=basis, n_columns=2, figsize=[8, 8], method='SparseVFC')  
dyn.pl.grid_velocity(adata, genes=gene_list,  basis=basis, n_columns=1, figsize=[8, 8], color=['ClusterName'], method='SparseVFC')  
dyn.pl.stremline_plot(adata, genes=gene_list, n_columns=2, figsize=[8, 8], density=3, method='SparseVFC') 
dyn.pl.line_integral_conv(adata)
dyn.tl.Fate(adata)

Potential landscape is an intuitive concept that is widely used in various disciplines. It provides a global description of cell state stability. Once we learnt vector field, dynamo allows you to map the potential landscape and the least action paths that convert from any cell types (states) to any other cell types (states) with the highest probability. You can experiment it via the dyn.tl.Potential and the dyn.tl.action function.

# eigth step: map potential landscape
dyn.tl.action(adata) 
dyn.pl.Potential(adata)

Installation

Note that this is our first alpha version of Dynamo (as of July 9th, 2019). Dynamo is still under active development. Stable version of Dynamo will be released when it is ready. Until then, please use Dynamo with caution. We welcome any bugs reports (via GitHub issue reporter) and especially code contribution (via GitHub pull requests) of Dynamo from users to make it an accessible, useful and extendable tool. For discussion about different usage cases, comments or suggestions related to our manuscript and questions regarding the underlying mathematical formulation of dynamo, we provided a google group goolge group. Dynamo developers can be reached by xqiu.sc@gmail.com. To install the newest version of dynamo, you can git clone our repo and then use:

git clone git@github.com:aristoteleo/dynamo-release.git
pip install dynamo-release/ --user 

Note that --user flag is used to install the package to your home directory instead, in cases you don't have the root privilege.

Alternatively, You can install Dynamo when you are in the dyname-release folder

git clone git@github.com:aristoteleo/dynamo-release.git
cd dynamo-release/ 
python setup.py install --user 

from source, using the following script:

pip install git+https://github.com:aristoteleo/dynamo-release

Citation

Xiaojie Qiu, Yan Zhang, Dian Yang, Shayan Hosseinzadeh, Li Wang, Ruoshi Yuan, Song Xu, Yian Ma, Joseph Replogle, Spyros Darmanis, Jianhua Xing, Jonathan S Weissman (2019): Mapping Vector Field of Single Cells. BioRxiv

biorxiv link: https://www.biorxiv.org/content/10.1101/696724v1

Theory behind dynamo

For the vector field reconstruction and potential landscape mapping, please refer to our preprint. We also released the complete derivation of the matrix form of the moment generation functions for parameter estimation in full_derivation.pdf file in the dynamo-notebook GitHub repo.

The dynamo-notebook repo also provides tutorials on how to use dynamo for reconstructing vector field, calculating least action path and potential of cell states.

Clone this wiki locally