Add method to calculate embeddings for variable by distance aggregation #807

LLehner · 2024-03-04T22:56:08Z

Description

Adds a method in tools to calculate embeddings of variables by their counts aggregated by distance.

Example usage

import squidpy as sq

load example data set
adata = sq.datasets.seqfish()

Calculate distances of each observation to a specified anchor point (e.g. cell type or tissue location). Here we use cell type "Endothelium" in the annotation column "celltype_mapped_refined":
sq.tl.var_by_distance(adata, groups="Endothelium", cluster_key="celltype_mapped_refined")

The resulting distances are stored in adata.obsm["design_matrix"]. Now we can calculate the embeddings, which are returned as a new anndata object:
adata_new = sq.tl.var_embeddings(adata, group="Endothelium", design_matrix_key="design_matrix")

Note that by default the bin of distance 0, meaning the counts that belong to the anchor point, are excluded. This can be changed by setting include_anchor=True in sq.tl.var_embeddings().

adata_new.X contains the aggregated var x distance_bin count matrix.
adata_new.obs contains the variables as a categorical matrix, which is required to highlight them in plots.

TODO

Add a plotting function so this doesn't need to be done manually.
Allow flexible embedding calculations

for more information, see https://pre-commit.ci

codecov-commenter · 2024-03-04T23:04:27Z

Codecov Report

Attention: Patch coverage is 33.33333% with 24 lines in your changes are missing coverage. Please review.

Project coverage is 69.75%. Comparing base (df8e042) to head (8ee07ba).

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #807      +/-   ##
==========================================
- Coverage   69.99%   69.75%   -0.24%     
==========================================
  Files          39       40       +1     
  Lines        5525     5561      +36     
  Branches     1029     1037       +8     
==========================================
+ Hits         3867     3879      +12     
- Misses       1363     1387      +24     
  Partials      295      295

Files	Coverage Δ
src/squidpy/tl/_var_embeddings.py	`33.33% <33.33%> (ø)`

for more information, see https://pre-commit.ci

…se/squidpy into var_by_distance_clustering

giovp · 2024-04-22T07:42:03Z

hi @LLehner , thank you for this, would you mind elaborating a bit when this would be used? also, what if the embedding are pre-calculated, or the user would like to use something other than the UMAP, should that be an option? finally, I think a test would be required before we get this in, thanks!

for more information, see https://pre-commit.ci

timtreis · 2024-04-22T17:21:33Z

Hey @giovp, this feature was coming out of a discussion with @maiiashulman. We ran into a situation in which the "literature-curated" signature for hypoxia was either 20 or 4000 genes, the latter obviously being useless. So we wondered which other genes maybe show the same spatially variable pattern as a function of distance to a certain cell-type (e.g. epithelial). This is essentially a graphical method to see if a given set of genes (f.e. the 20 gene signature) even varies in a similar pattern.

But I agree with your points; if we see that it's actually doing something useful, we should make it a bit more flexible.

for more information, see https://pre-commit.ci

…se/squidpy into var_by_distance_clustering

for more information, see https://pre-commit.ci

LLehner · 2024-08-08T11:23:31Z

@timtreis this function now returns an anndata object, which is i think simplifies further processing, compared to storing the new count matrix somewhere in .varm or .uns. Because if we want to make us of already implemented dimreduction and clustering methods from scanpy, then the count matrix needs to be in .X and for visualization we need the variable names stored as categories in .obs. Doing all of this in the same anndata will just make things cluttered.

Additionally the question is whether a spatialdata object should be required as input instead of an anndataone, because then a new table could be added directly instead of having multiple disconnected tables.

The function call would change from:
adata_new = sq.tl.var_embeddings(adata, group="Endothelium", design_matrix_key="design_matrix")
to
sq.tl.var_embeddings(sdata, group="Endothelium", design_matrix_key="design_matrix")

Add method to calculate embeddings for variable by distance aggregation

5a52976

LLehner requested a review from timtreis March 4, 2024 22:56

pre-commit-ci bot and others added 4 commits March 4, 2024 22:57

[pre-commit.ci] auto fixes from pre-commit.com hooks

eb84518

for more information, see https://pre-commit.ci

Fix pre-commit

488da20

Fix pre-commit

8fce577

[pre-commit.ci] auto fixes from pre-commit.com hooks

0b72494

for more information, see https://pre-commit.ci

LLehner and others added 3 commits March 5, 2024 00:05

Update param name

edcca87

[pre-commit.ci] auto fixes from pre-commit.com hooks

4be2529

for more information, see https://pre-commit.ci

Merge branch 'var_by_distance_clustering' of https://github.com/scver…

f91c1af

…se/squidpy into var_by_distance_clustering

LLehner and others added 2 commits April 22, 2024 19:14

Remove duplicate code

cfe496c

[pre-commit.ci] auto fixes from pre-commit.com hooks

c4fca29

for more information, see https://pre-commit.ci

LLehner and others added 6 commits April 22, 2024 23:50

Improve performance, Update output

64e38df

Improve performance, Update output

3ab8467

[pre-commit.ci] auto fixes from pre-commit.com hooks

9eabd0d

for more information, see https://pre-commit.ci

Remove import

a40a8cf

Merge branch 'var_by_distance_clustering' of https://github.com/scver…

90108ad

…se/squidpy into var_by_distance_clustering

Remove import

09c72b0

LLehner marked this pull request as draft April 22, 2024 22:05

LLehner and others added 9 commits May 26, 2024 23:12

Update return

3396146

Merge branch 'var_by_distance_clustering' of https://github.com/scver…

a44f661

…se/squidpy into var_by_distance_clustering

Merge branch 'main' into var_by_distance_clustering

41a2ae4

Fix pre-commit

67bdd5c

Merge branch 'var_by_distance_clustering' of https://github.com/scver…

99b41b0

…se/squidpy into var_by_distance_clustering

[pre-commit.ci] auto fixes from pre-commit.com hooks

876c4ed

for more information, see https://pre-commit.ci

Fix pre-commit

8ee07ba

Fix pre-commit

d3cefff

Merge branch 'main' into var_by_distance_clustering

80e23fc

Merge branch 'main' into var_by_distance_clustering

f2b0e12

timtreis marked this pull request as ready for review July 9, 2024 21:02

timtreis added squidpy2.0 Everything releated to a Squidpy 2.0 release feature PR introduces a new feature labels Jul 9, 2024

Merge branch 'main' into var_by_distance_clustering

2a863a4

LLehner marked this pull request as draft August 8, 2024 10:22

Fix indices; Update return type

5729676

LLehner and others added 5 commits August 26, 2024 19:03

Add spatialdata as input

7dfa933

Merge branch 'main' into var_by_distance_clustering

bf1dcff

Update docstring

d6e5ecd

Merge branch 'main' into var_by_distance_clustering

6e724f0

Merge branch 'main' into var_by_distance_clustering

6e28662

LLehner marked this pull request as ready for review October 10, 2024 13:06

Merge branch 'main' into var_by_distance_clustering

1b1c05a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add method to calculate embeddings for variable by distance aggregation #807

Add method to calculate embeddings for variable by distance aggregation #807

LLehner commented Mar 4, 2024 •

edited

Loading

codecov-commenter commented Mar 4, 2024 •

edited

Loading

giovp commented Apr 22, 2024

timtreis commented Apr 22, 2024

LLehner commented Aug 8, 2024 •

edited

Loading

Add method to calculate embeddings for variable by distance aggregation #807

Are you sure you want to change the base?

Add method to calculate embeddings for variable by distance aggregation #807

Conversation

LLehner commented Mar 4, 2024 • edited Loading

Description

Example usage

TODO

codecov-commenter commented Mar 4, 2024 • edited Loading

Codecov Report

giovp commented Apr 22, 2024

timtreis commented Apr 22, 2024

LLehner commented Aug 8, 2024 • edited Loading

LLehner commented Mar 4, 2024 •

edited

Loading

codecov-commenter commented Mar 4, 2024 •

edited

Loading

LLehner commented Aug 8, 2024 •

edited

Loading