How do you manage models in distributed Dask #416

Padarn · 2021-12-31T01:50:46Z

Hi guys, I really love using Dask as the backbone for this problem but I have a question:

If you use a GPU enabled model for both feature extraction and feature matching, how will the Dask workers manage the GPU memory required for these tasks?

For example, with superpoint https://github.com/borglab/gtsfm/blob/master/gtsfm/frontend/detector_descriptor/superpoint.py it looks like this class will be initialized on all workers? So do you run the risk of running out of GPU memory if your extractor and matcher GPU models are quite large?

Thanks again for the exciting project

johnwlambert · 2022-01-01T01:15:54Z

Hi @Padarn, thanks for your interest in our work.

Great question. Currently when a GPU is available, there are minimum GPU RAM requirements we expect from a user's hardware. For example, the GPU RAM must be sufficient to support at least inference with 1 model on 1 worker. Currently, a user must anticipate the amount of GPU memory each worker will use, when choosing the number of workers. In the future, we will automate this further.

However, all these networks can also run on the CPU. We've specifically looked for and are using models that have low RAM requirements (e.g. PatchmatchNet).

Padarn · 2022-01-01T07:58:21Z

Hi @johnwlambert, thanks for your response.

I guess maybe I am missing something from Dask, but how do you ensure that a single machine does not get assigned work to run tasks for more than one GPU model at the same time? Or workers don't complete tasks in parallel?

To clarify, the situation I am imagining is that a worker currently doing feature matching is assigned a feature extraction task (or more likely vice versa).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How do you manage models in distributed Dask #416

How do you manage models in distributed Dask #416

Padarn commented Dec 31, 2021

johnwlambert commented Jan 1, 2022

Padarn commented Jan 1, 2022 •

edited

Loading

How do you manage models in distributed Dask #416

How do you manage models in distributed Dask #416

Comments

Padarn commented Dec 31, 2021

johnwlambert commented Jan 1, 2022

Padarn commented Jan 1, 2022 • edited Loading

Padarn commented Jan 1, 2022 •

edited

Loading