-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
parallel: Two types of 'cluster':s - one a subclass of the other #161
Comments
First of all, thanks for helpfully updating above that As the documentation efforts are already on Bugzilla, I actually think that this is your best solution rather than a solution in code. First of all:
And as to what I am currently thinking: Suggestion 3 (replaces 2 above)I had originally missed that As soon as it emerged through the documentation effort that the Then all that would be required is something like: isAddressableCluster <- function(x) inherits(x[1], “cluster”) or isAddressableCluster <- function(x) identical(class(x[1]), class(x)) As this is so trivial, I think this can simply be defined by a consuming package like your Suggestion 4 (for comparison only)I think the incentives are powerful enough in (3) above, and the advantage is clear in requiring no change in Implement This would require the developer of a new back-end to explicitly add its own method to return TRUE, if it wants to assert that it supports subsetting. This is much safer from a behavioural standpoint, and is exactly what the This was actually also in response to
|
Background
The parallel package provides 'cluster' functions for processing tasks in concurrent R processes. Here is an example showing how to create a cluster of two parallel workers running in the background, then we assign
x
to 1 for the first worker and2
for the second. Finally, we calculatex^2
across the workers and return the results in a list;Issue
For certain types of parallel frameworks, it might not be able to address a specific worker, but expects that the whole set of workers at once. An example of this
iswas the mirai 0.13.2 package. mirai provides its own type ofcluster
;With mirai (<= 0.13.2), if we repeat the above example, we would get the same results for:
UPDATE 2024-05-04: mirai (>= 1.0.0) tries to protect against this; we now get an error when calling
clusterEvalQ(cl[1], ...)
, becausecl[1]
no longer inherits fromcluster
- it's only alist
.However, if we attempt to retrieve the results from the parallel workers in reverse order, we get:
Note how we got (1,4) and not (4,1). This is because the mirai cluster treats
cl
the same asrev(cl)
. This is also the reason for why we may get different results when we keep usingcl[1]
, e.g.I'm not sure if this meets the original design intentions of the parallel package. Regardless, there might be a valid argument for why subsetting
cluster
objects, or changing the order of the cluster nodes, should not be supported. For instance, the underlying framework might not support addressing parallel workers this way. This is probably the reason why mirai clusters work they way they work.Suggestion 1
If subsetting should not/cannot be supported, then we need a way to declare that so that we don't use the cluster in a way where it is assumed to work. For example, in the mirai case, we should not assume we can address individual workers via subsetting or re-ording. My suggestion would be to introduce a special class for this, e.g. 'nonAddressableCluster'. This way, we could have:
instead of today's:
This way, we could programmatically check if the
cluster
object supports subsetting. For example, we can do:This puts the burden on the user and developer who uses the parallel functions to be aware of this.
Suggestion 2
An alternative solution proposed @shikokuchuo (cf. shikokuchuo/mirai#83 (comment)) would be to remove
[.cluster
from the parallel package and have parallel backends that support subsetting explicitly implement[
for the specificcluster
class. For example, for:we would have a
[.SOCKcluster
rather than[.cluster
as today.This would be a much bigger change, because it risks breaking existing code and packages.
The text was updated successfully, but these errors were encountered: