-
Notifications
You must be signed in to change notification settings - Fork 59
Dynamo workflow
In the following, you can see python snippets that highlight each of the main steps in dynamo. Note: this workflow will need to be updated (no later than Feb. 23, 2020), you may need to pick up one notebook tutorial from https://github.com/aristoteleo/dynamo-tutorials to get started for now.
Surely the first step in applying dynamo is to load some datasets of interests. Dynamo uses AnnData as its
container so you can load loom, h5 or h5ad, etc. files that generated from Kallisto + bustool, velocyto, scanpy or even 10x cellranger (because dynamo
can be reduced as a regular scRNA-seq analysis toolkit) via read_loom
, read_h5
, read_h5ad
. Dynamo also provides a function (read_NASC_seq
) to load output files
from the NASC-seq pipeline that applied to single cell based SLAM-seq experiments. If you want to learn
protein velocity, you should then attach the protein abundance data to the .obsm
attribute of the AnnData
object via the protein
key.
# first step: Load data
import dynamo as dyn
# set matplotlib's rcParams. this setting tries to emulate ggplot style.
dyn.configuration.set_figure_params('dynamo')
# you can read your own data via read_loom, read_h5ad or read_h5
adata = dyn.read_loom(filename, **param) # (or use read_h5ad, read_NASC_seq to integrate with scanpy work flow, load result generated from NASC-seq pipeline, etc.)
# here let us play with Dentate Gyrus example dataset (27, 998 genes and 18, 213 cells)
adata = dyn.sample_data.DentateGyrus() # there are many sample datasets available.
Next you may want to first check the fraction of the spliced, unspliced or new (metabolic labeled) or total mRNAs in your
data by show_fraction
. Then you are ready to run the recipe_monocle
function that performs preprocessing of the data in a single
shot. recipe_monocle
uses similar strategy from Monocle 3 to normalize all
datasets in different layers (the spliced and unspliced or new (metabolic labeled) and total mRNAs or others). You can check the mean-dispersion
plot to see where do the automatically selected feature genes locate and also the variance explained plot
to take a glance
at the variance explained by principal components.
# second step: preprocess
dyn.pl.show_fraction(adata)
dyn.pp.recipe_monocle(adata, n_top_genes=3000)
dyn.pl.featureGenes(adata)
dyn.pl.variance_explained(adata)
Next you will want to learn the velocity values for all genes that pass some filters (default it is all the selected feature genes) across cells.
The dyn.tl.dynamics
does all the hard work for you. It automatically checks the data you have and learns the velocity vectors accordingly. For example,
if you have scSLAM-seq data (identified by the new
and total
or uu
, ul
, su
and sl
layers, etc.), dynamo will learn the transcription, splicing and degradation
rates for each gene. We developed two mode of estimation, either the steady_state
assumption (from the seminal RNA velocity work)
or the moment
model (see the full derivation of this model).
# third step: learning dynamics
dyn.tl.dynamics(adata)
You would like to also use different dimension reduction approaches to reduce your scRNA-seq into low dimensional embedding. By default, we use the novel
trimap dimension reduction method, which uses triplet constraints to form a low-dimensional embedding of a set of points.
trimap
is arguably more scalable and better at preserving the global structure of the data than UMAP
. Note that we also generalized diffusion map so we can
performing dimension reduction via diffusion while also considering drift. A class of structural learning based dimension reduction methods from us will also
be supported shortly.
# fourth step: reduce dimension
dyn.tl.reduceDimension(adata, velocity_key='velocity_S',reduction_method='trimap')
You love to see the velocity vector on low dimensional embedding. To get there, you want to first apply our improved transition matrix reconstruction method. In contrast to the "correlation kernel" from velocyto or scVelo, dynamo is powered by the Itô kernel that not only considers the correlation between the vector from any cell to its nearest neighbors and its velocity vector but also the corresponding distances. We expect this new kernel will enable us to visualize more intricate vector flow or steady states in low dimension. We also expect it will improve the calculation of the stationary distribution or source states of sampled cells.
# fifth step: project velocity
dyn.tl.cell_velocities(adata, vkey='pca', basis='trimap', method='analytical')
At this stage, you may already cannot wait to see how does the velocity vector looks like in low dimension space. Similar to velocyto and scvelo, we provide three plotting utilities that visualize the cell-wise velocity vector, velocity vector on a grid or the stream line plot that integrates along grid velocity vectors via fourth-order Runger-Kutta algorithm.
# sixth step: visualize vector field
basis='trimap'
gene_list = ['Rgs20', 'Eya1']
dyn.pl.cell_wise_velocity(adata, genes=gene_list, basis=basis, n_columns=3)
dyn.pl.grid_velocity(adata, genes=gene_list, basis=basis, n_columns=2, figsize=[8, 8], method='Gaussian')
dyn.pl.grid_velocity(adata, genes=gene_list, basis=basis, n_columns=1, figsize=[8, 8], color=['ClusterName'], method='Gaussian')
dyn.pl.stremline_plot(adata, genes=gene_list, n_columns=2, figsize=[8, 8], density=3, method='Gaussian')
Obviously we don't want to stop here. Let us move to the real exciting part of dynamo in the next section to learn the velocity vector in the full transcriptomic space and to map the potential landscape.
Dynamo aspires to learn a vector field function in the transcriptomic space. With the learnt vector field, you can then predict the cell fate in high dimension over arbitrary time scale from arbitrary initial cell states. You can experiment it via the dyn.tl.VectorField or dyn.tl.fate function.
# seventh step: learn vector field
dyn.tl.VectorField(adata)
dyn.pl.grid_velocity(adata, genes=gene_list, basis=basis, n_columns=2, figsize=[8, 8], method='SparseVFC')
dyn.pl.grid_velocity(adata, genes=gene_list, basis=basis, n_columns=1, figsize=[8, 8], color=['ClusterName'], method='SparseVFC')
dyn.pl.stremline_plot(adata, genes=gene_list, n_columns=2, figsize=[8, 8], density=3, method='SparseVFC')
dyn.pl.line_integral_conv(adata)
dyn.tl.Fate(adata)
Potential landscape is an intuitive concept that is widely used in various disciplines. It provides a global description of cell state stability. Once we learnt vector field, dynamo allows you
to map the potential landscape and the least action paths that convert from any cell types (states) to any other cell types (states) with the highest probability. You can experiment it via the dyn.tl.Potential
and
the dyn.tl.action
function.
# eigth step: map potential landscape
dyn.tl.action(adata)
dyn.pl.Potential(adata)
Note that this is our first alpha version of Dynamo (as of July 9th, 2019). Dynamo is still under active development. Stable version of Dynamo will be released when it is ready. Until then, please use Dynamo with caution. We welcome any bugs reports (via GitHub issue reporter) and especially code contribution (via GitHub pull requests) of Dynamo from users to make it an accessible, useful and extendable tool. For discussion about different usage cases, comments or suggestions related to our manuscript and questions regarding the underlying mathematical formulation of dynamo, we provided a google group goolge group. Dynamo developers can be reached by xqiu.sc@gmail.com. To install the newest version of dynamo, you can git clone our repo and then use:
git clone git@github.com:aristoteleo/dynamo-release.git
pip install dynamo-release/ --user
Note that --user
flag is used to install the package to your home directory instead, in cases you don't have the root privilege.
Alternatively, You can install Dynamo when you are in the dyname-release folder
git clone git@github.com:aristoteleo/dynamo-release.git
cd dynamo-release/
python setup.py install --user
from source, using the following script:
pip install git+https://github.com:aristoteleo/dynamo-release
Xiaojie Qiu, Yan Zhang, Dian Yang, Shayan Hosseinzadeh, Li Wang, Ruoshi Yuan, Song Xu, Yian Ma, Joseph Replogle, Spyros Darmanis, Jianhua Xing, Jonathan S Weissman (2019): Mapping Vector Field of Single Cells. BioRxiv
biorxiv link: https://www.biorxiv.org/content/10.1101/696724v1
For the vector field reconstruction and potential landscape mapping, please refer to our preprint. We also released the complete derivation of the matrix form of the moment generation functions for parameter estimation in full_derivation.pdf file in the dynamo-notebook GitHub repo.
The dynamo-notebook repo also provides tutorials on how to use dynamo for reconstructing vector field, calculating least action path and potential of cell states.