Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛[BUG]: NCCL_ASYNC_ERROR_HANDLING deprecation warning #568

Open
simonbyrne opened this issue Jun 28, 2024 · 0 comments · May be fixed by #711
Open

🐛[BUG]: NCCL_ASYNC_ERROR_HANDLING deprecation warning #568

simonbyrne opened this issue Jun 28, 2024 · 0 comments · May be fixed by #711
Assignees
Labels
? - Needs Triage Need team to review and classify bug Something isn't working

Comments

@simonbyrne
Copy link
Contributor

simonbyrne commented Jun 28, 2024

Version

0.6.0

On which installation method(s) does this occur?

Docker

Describe the issue

NCCL_ASYNC_ERROR_HANDLING environment variable (set here) has been deprecated in favor of TORCH_NCCL_ASYNC_ERROR_HANDLING
pytorch/pytorch#114077

It looks like the docs haven't been updated though: https://pytorch.org/docs/main/notes/cuda.html#id5

Minimum reproducible example

No response

Relevant log output

[W Utils.hpp:135] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function getCvarInt)
[W Utils.hpp:135] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function getCvarInt)
[W Utils.hpp:135] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function getCvarInt)
[W Utils.hpp:135] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function getCvarInt)
[W Utils.hpp:135] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function getCvarInt)
[W Utils.hpp:135] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function getCvarInt)
[W Utils.hpp:135] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function getCvarInt)
[W Utils.hpp:135] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function getCvarInt)

Environment details

No response

@simonbyrne simonbyrne added ? - Needs Triage Need team to review and classify bug Something isn't working labels Jun 28, 2024
simonbyrne added a commit to simonbyrne/modulus that referenced this issue Nov 15, 2024
It looks like the patch from pytorch/pytorch#114077 landed in torch 2.2.0.

Fixes NVIDIA#568.
@simonbyrne simonbyrne linked a pull request Nov 15, 2024 that will close this issue
5 tasks
simonbyrne added a commit to simonbyrne/modulus that referenced this issue Nov 19, 2024
It looks like the patch from pytorch/pytorch#114077 landed in torch 2.2.0.

Fixes NVIDIA#568.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
? - Needs Triage Need team to review and classify bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant