Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

There is no output even a warning, when I compute gene-gene distances with the function cal_ot_mat_from_numpy. #5

Open
Fufu-Hu opened this issue Apr 19, 2024 · 8 comments
Assignees

Comments

@Fufu-Hu
Copy link

Fufu-Hu commented Apr 19, 2024

Hi!

I installed module gene_trajectory with pip in a conda env.I can comput the gene-gene distances with the seurat data in GeneTrajectory tutorial and the progress _bar are showed in screen.
But when I comput my own seurat data(36077 features across 482 samples), there's nothing in screen. The number of gene used to compute gene-gene distances is 481 and meta-cells is 50.
I run "gene.dist.mat <- cal_ot_mat_from_numpy(ot_cost = cg_output[["graph.dist"]], gene_expr = cg_output[["gene.expression"]], num_iter_max = 50000, show_progress_bar = TRUE)" in R for at least 8 hours with no output even a progress_bar. Is there something I missed?

Hope receive a reply~

@fra-pcmgf
Copy link
Collaborator

Hi @Fufu-Hu,

I am not sure about what it could be.

  1. Can you check if anything is still running (e.g. using top or the Task Manager)?
  2. Can you let me know the size of the objects (e.g. dim(cg_output[["graph.dist"]]), dim(cg_output[["gene.expression"]]))? I don't think it should be that slow is the size is 481x50, but it may be if you are using the full matrix.
  3. Do you get any error or notifications when you start the cal_ot_mat_from_numpy function?

@panyuwen
Copy link

encounter similar problems.

it has been >4000 CPU hours, but without progress bar, for neither python or R. machine info: Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz

the program seems working on another machine for the same data, the progress bar appeared in ~30 CPU hours. machine info: AMD Opteron(tm) Processor 6344

@fra-pcmgf
Copy link
Collaborator

Hi @panyuwen,

It's hard to know what is going wrong in one machine when it works on another.

  • Can you run the tutorial on the machine where it doesn't work?
  • What kind of machine it is (linux / mac / win)?
  • Can you let me know the size of the input objects?

@panyuwen
Copy link

  • yes, I can run the human data tutorial on both machines. it takes about 10-20 CPU minutes from the beginning to the end of the gene.dist.mat step.
  • linux. centos7
  • about 50k cells x 10k genes, and default parameters.

@panyuwen
Copy link

using subset of my original data (17k cells x 10k genes), with default parameters, it takes about 2500 CPU hours from the beginning to the end of the gene.dist.mat step. the progress bar appeared during the final 6 mins (so only 6min recorded on the bar).

machine info: Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz ; centos7

@fra-pcmgf
Copy link
Collaborator

@panyuwen

Do you also select the top genes and coarse grain cells?
The reference steps in the tutorial are

genes = select_top_genes(adata, layer='counts')
gene_expression_updated, graph_dist_updated = coarse_grain_adata(adata, graph_dist=cell_graph_dist, features=genes, dims=10)

If so, what are the dimensions of gene_expression_updated and graph_dist_updated?

@panyuwen
Copy link

yes, I manually selected genes.

gene_expression_updated: (1000, 11352)
graph_dist_updated: (1000, 1000)

@fra-pcmgf
Copy link
Collaborator

11352 genes is a large number and calculating the earth mover distance is going to be very slow.
Try using ~2000 genes using select_top_genes or a similar approach

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants