You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If there are 8 GPUs on slurm nodes, there is typically a parameter that you can set to request only a subset, e.g. 1 or 2. This is very useful for inference with small models, since partitioning them across 8 GPUs is typically not necessary. While our current code supports doing this, I believe that we actually reserve the whole node still because we have https://github.com/Kipok/NeMo-Skills/blob/main/nemo_skills/pipeline/utils.py#L469.
Need to test if that's indeed the case (that we get the full node even if we request only a fraction). If that's true, need to try to remove that parameter and check if that still allows us to launch parallel srun jobs on a single node and solves the issue. Might need to read through the slurm documentation and experiment.
The text was updated successfully, but these errors were encountered:
If there are 8 GPUs on slurm nodes, there is typically a parameter that you can set to request only a subset, e.g. 1 or 2. This is very useful for inference with small models, since partitioning them across 8 GPUs is typically not necessary. While our current code supports doing this, I believe that we actually reserve the whole node still because we have https://github.com/Kipok/NeMo-Skills/blob/main/nemo_skills/pipeline/utils.py#L469.
Need to test if that's indeed the case (that we get the full node even if we request only a fraction). If that's true, need to try to remove that parameter and check if that still allows us to launch parallel srun jobs on a single node and solves the issue. Might need to read through the slurm documentation and experiment.
The text was updated successfully, but these errors were encountered: