Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consider remove/rework provider tasks_per_node options #3617

Open
benclifford opened this issue Sep 14, 2024 · 1 comment
Open

consider remove/rework provider tasks_per_node options #3617

benclifford opened this issue Sep 14, 2024 · 1 comment

Comments

@benclifford
Copy link
Collaborator

this parameter has been hard-coded at 1 since the end of the IPyParallel executor:

all subsequent executors have wanted a single process per node, managing their "tasks per node" count separately (for example, in htex through max_workers_per_node and in Work Queue through per-task resource allocation)

separately, the number of core slots to request on a node is usually ignored because most people run on a whole node at once: see slurm's exclusive parameter to cause this to be an irrelevant value (but not PBS -- #3616)

and the hard-coded 1 is interpreted by the launcher layer to mean launch a single worker process per node.

The historical context is that IPyParallelExecutor wanted the launcher layer to launch one copy of its worker running per-core.

it doesn't make sense in the modern one-worker-per-node scenario to be coupling the provider layer "allocate slots" use of this value with the launcher layer "run this many copies" behaviour.

Periodically I encounter users trying to adjust this in the source code for one reason or another and this coupling of launcher and provider gets in the way: #3616, for example, is about allocating many ranks on a node for a batch job but still only running one copy of the worker.

I propose that the interface looks like:

  • remove provider layer tasks_per_node parameter to tasks per node
  • add a parameter to relevant providers to set the tasks_per_node in job requests. This sits well alongside slurm's exclusive parameter as a "this is how much node I want" value.
  • make launchers only launch 1 task per node. the possibility exists for launching more than 1 task per node and that value could be preserved but it is increasingly like unused cruft.

Comments and questions welcome

@benclifford
Copy link
Collaborator Author

any fiddling with the launcher API here should probably align with #3532

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant