-
Notifications
You must be signed in to change notification settings - Fork 603
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lets umap run in parallel #3295
base: main
Are you sure you want to change the base?
Changes from 2 commits
fbc2e49
9ce1770
812e630
ba6538d
d2ab85d
4eba4a8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -56,6 +56,7 @@ def umap( | |
key_added: str | None = None, | ||
neighbors_key: str = "neighbors", | ||
copy: bool = False, | ||
parallel: bool = False, | ||
) -> AnnData | None: | ||
"""\ | ||
Embed the neighborhood graph using UMAP :cite:p:`McInnes2018`. | ||
|
@@ -146,6 +147,8 @@ def umap( | |
:attr:`~anndata.AnnData.obsp`\\ ``[.uns[neighbors_key]['connectivities_key']]`` for connectivities. | ||
copy | ||
Return a copy instead of writing to adata. | ||
parallel | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I don't know where I got that this should error, but then we should definitely warn users about this sort of thing. Reproducibility is very important There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On the other hand: sc.tl.umap(adata, parallel=True, random_state=42) works so I think this needs a test + updated comment to reflect whatever is supposed to be going on here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As does sc.tl.umap(adata, parallel=True, random_state=np.random.RandomState(42)) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I added a test to see if it errors (it doesn't) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, but we should also warn users about this random state business. And check that the warning is raised every time There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok! But then we should definitely warn in that case. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I added a warning There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok there is in issue with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok the function is bugged on the umap side. If you force There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
Whether to run the computation using numba parallel. Running in parallel is non-deterministic, and is not used if a random seed has been set, to ensure reproducibility. | ||
|
||
Returns | ||
------- | ||
|
@@ -232,6 +235,7 @@ def umap( | |
densmap_kwds={}, | ||
output_dens=False, | ||
verbose=settings.verbosity > 3, | ||
parallel=parallel, | ||
) | ||
elif method == "rapids": | ||
msg = ( | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have the
n_jobs: int | None
convention for this (withNone
meaningsc.settings.N_JOBS
), butsimplicial_set_embedding
just passesparallel
on to numba.We should think about how the two integrate before we add this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is numba parallel, we can only set it to use everything you got or nothing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that’s why we should talk about the parameter name. IIRC that would be the first parallelization parameter not called
n_jobs
.