You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
| distributed init (rank 0, world 1): env://
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -7) local_rank: 0 (pid: 21866) of binary: /workspace/conda/envs/minigpt/bin/python
Traceback (most recent call last):
File "/workspace/conda/envs/minigpt/bin/torchrun", line 8, in
sys.exit(main())
File "/workspace/conda/envs/minigpt/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, **kwargs)
File "/workspace/conda/envs/minigpt/lib/python3.9/site-packages/torch/distributed/run.py", line 794, in main
run(args)
File "/workspace/conda/envs/minigpt/lib/python3.9/site-packages/torch/distributed/run.py", line 785, in run
elastic_launch(
File "/workspace/conda/envs/minigpt/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 134, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/workspace/conda/envs/minigpt/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
train.py FAILED
Failures:
<NO_OTHER_FAILURES>
Root Cause (first observed failure):
[0]:
time : 2024-10-16_09:05:24
host : 887f31f30241
rank : 0 (local_rank: 0)
exitcode : -7 (pid: 21866)
error_file: <N/A>
traceback : Signal 7 (SIGBUS) received by PID 21866
The text was updated successfully, but these errors were encountered:
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
| distributed init (rank 0, world 1): env://
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -7) local_rank: 0 (pid: 21866) of binary: /workspace/conda/envs/minigpt/bin/python
Traceback (most recent call last):
File "/workspace/conda/envs/minigpt/bin/torchrun", line 8, in
sys.exit(main())
File "/workspace/conda/envs/minigpt/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, **kwargs)
File "/workspace/conda/envs/minigpt/lib/python3.9/site-packages/torch/distributed/run.py", line 794, in main
run(args)
File "/workspace/conda/envs/minigpt/lib/python3.9/site-packages/torch/distributed/run.py", line 785, in run
elastic_launch(
File "/workspace/conda/envs/minigpt/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 134, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/workspace/conda/envs/minigpt/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
train.py FAILED
Failures:
<NO_OTHER_FAILURES>
Root Cause (first observed failure):
[0]:
time : 2024-10-16_09:05:24
host : 887f31f30241
rank : 0 (local_rank: 0)
exitcode : -7 (pid: 21866)
error_file: <N/A>
traceback : Signal 7 (SIGBUS) received by PID 21866
The text was updated successfully, but these errors were encountered: