You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
使用temper中的config,更换为自己的数据集,报错RuntimeError: CUDA error: an illegal memory access was encountered terminate called after throwing an instance of 'c10::Error'
#13
Closed
CongYep opened this issue
Nov 19, 2023
· 0 comments
使用temper中的config,使用命令bash tools/dist_train.sh work_configs/tamper/tamper_convx_b_exp.py 2,更换为自己的数据集(nist16,casia等),报错RuntimeError: CUDA error: an illegal memory access was encountered terminate called after throwing an instance of 'c10::Error',请问如何解决
Traceback (most recent call last):
File "tools/train.py", line 181, in
main()
File "tools/train.py", line 177, in main
meta=meta)
File "/home/cong/my_project/mmsegmentation-tianchi_tamper/mmseg/apis/train.py", line 135, in train_segmentor
runner.run(data_loaders, cfg.workflow)
File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
epoch_runner(data_loaders[i], **kwargs)
File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train
self.run_iter(data_batch, train_mode=True, **kwargs)
File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter
**kwargs)
File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/parallel/distributed.py", line 52, in train_step
output = self.module.train_step(*inputs[0], **kwargs[0])
File "/home/cong/my_project/mmsegmentation-tianchi_tamper/mmseg/models/segmentors/base.py", line 138, in train_step
losses = self(**data_batch)
File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 128, in new_func
output = old_func(*new_args, **new_kwargs)
File "/home/cong/my_project/mmsegmentation-tianchi_tamper/mmseg/models/segmentors/base.py", line 108, in forward
return self.forward_train(img, img_metas, **kwargs)
File "/home/cong/my_project/mmsegmentation-tianchi_tamper/mmseg/models/segmentors/encoder_decoder.py", line 144, in forward_train
gt_semantic_seg)
File "/home/cong/my_project/mmsegmentation-tianchi_tamper/mmseg/models/segmentors/encoder_decoder.py", line 88, in _decode_head_forward_train
self.train_cfg)
File "/home/cong/my_project/mmsegmentation-tianchi_tamper/mmseg/models/decode_heads/decode_head.py", line 207, in forward_train
losses = self.losses(seg_logits, gt_semantic_seg)
File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 214, in new_func
output = old_func(*new_args, **new_kwargs)
File "/home/cong/my_project/mmsegmentation-tianchi_tamper/mmseg/models/decode_heads/decode_head.py", line 259, in losses
ignore_index=self.ignore_index)
File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/cong/my_project/mmsegmentation-tianchi_tamper/mmseg/models/losses/lovasz_loss.py", line 308, in forward
**kwargs)
File "/home/cong/my_project/mmsegmentation-tianchi_tamper/mmseg/models/losses/lovasz_loss.py", line 219, in lovasz_softmax flatten_probs(probs, labels, ignore_index),
File "/home/cong/my_project/mmsegmentation-tianchi_tamper/mmseg/models/losses/lovasz_loss.py", line 55, in flatten_probs
vprobs = probs[valid.nonzero().squeeze()]
RuntimeError: CUDA error: an illegal memory access was encountered
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: an illegal memory access was encountered
Exception raised from create_event_internal at /pytorch/c10/cuda/CUDACachingAllocator.cpp:687 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f9b166518b2 in /root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void) + 0xad2 (0x7f9b16a18952 in /root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7f9b1663cb7d in /root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #3: + 0x5ff66a (0x7f9baadff66a in /root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #4: + 0x5ff716 (0x7f9baadff716 in /root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #5: /root/anaconda3/envs/open-mmlab/bin/python() [0x4cb472]
frame #6: /root/anaconda3/envs/open-mmlab/bin/python() [0x4a0a87]
frame #7: /root/anaconda3/envs/open-mmlab/bin/python() [0x4b5cfb]
frame #8: /root/anaconda3/envs/open-mmlab/bin/python() [0x4b5cfb]
frame #9: /root/anaconda3/envs/open-mmlab/bin/python() [0x4b0858]
frame #10: /root/anaconda3/envs/open-mmlab/bin/python() [0x4c5b50]
frame #11: /root/anaconda3/envs/open-mmlab/bin/python() [0x4c5b66]
frame #12: /root/anaconda3/envs/open-mmlab/bin/python() [0x4c5b66]
frame #13: /root/anaconda3/envs/open-mmlab/bin/python() [0x4c5b66]
frame #14: /root/anaconda3/envs/open-mmlab/bin/python() [0x4c5b66]
frame #15: /root/anaconda3/envs/open-mmlab/bin/python() [0x4c5b66]
frame #16: /root/anaconda3/envs/open-mmlab/bin/python() [0x4c5b66]
frame #17: /root/anaconda3/envs/open-mmlab/bin/python() [0x4946f7]
frame #18: PyDict_SetItemString + 0x61 (0x499261 in /root/anaconda3/envs/open-mmlab/bin/python)
frame #19: PyImport_Cleanup + 0x89 (0x56f719 in /root/anaconda3/envs/open-mmlab/bin/python)
frame #20: Py_FinalizeEx + 0x67 (0x56b1a7 in /root/anaconda3/envs/open-mmlab/bin/python)
frame #21: /root/anaconda3/envs/open-mmlab/bin/python() [0x53fc79]
frame #22: _Py_UnixMain + 0x3c (0x53fb3c in /root/anaconda3/envs/open-mmlab/bin/python)
frame #23: + 0x29d90 (0x7f9bb37e5d90 in /usr/lib/x86_64-linux-gnu/libc.so.6)
frame #24: __libc_start_main + 0x80 (0x7f9bb37e5e40 in /usr/lib/x86_64-linux-gnu/libc.so.6)
frame #25: /root/anaconda3/envs/open-mmlab/bin/python() [0x53f9ee]
The text was updated successfully, but these errors were encountered:
使用temper中的config,使用命令bash tools/dist_train.sh work_configs/tamper/tamper_convx_b_exp.py 2,更换为自己的数据集(nist16,casia等),报错RuntimeError: CUDA error: an illegal memory access was encountered terminate called after throwing an instance of 'c10::Error',请问如何解决
Traceback (most recent call last):
File "tools/train.py", line 181, in
main()
File "tools/train.py", line 177, in main
meta=meta)
File "/home/cong/my_project/mmsegmentation-tianchi_tamper/mmseg/apis/train.py", line 135, in train_segmentor
runner.run(data_loaders, cfg.workflow)
File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
epoch_runner(data_loaders[i], **kwargs)
File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train
self.run_iter(data_batch, train_mode=True, **kwargs)
File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter
**kwargs)
File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/parallel/distributed.py", line 52, in train_step
output = self.module.train_step(*inputs[0], **kwargs[0])
File "/home/cong/my_project/mmsegmentation-tianchi_tamper/mmseg/models/segmentors/base.py", line 138, in train_step
losses = self(**data_batch)
File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 128, in new_func
output = old_func(*new_args, **new_kwargs)
File "/home/cong/my_project/mmsegmentation-tianchi_tamper/mmseg/models/segmentors/base.py", line 108, in forward
return self.forward_train(img, img_metas, **kwargs)
File "/home/cong/my_project/mmsegmentation-tianchi_tamper/mmseg/models/segmentors/encoder_decoder.py", line 144, in forward_train
gt_semantic_seg)
File "/home/cong/my_project/mmsegmentation-tianchi_tamper/mmseg/models/segmentors/encoder_decoder.py", line 88, in _decode_head_forward_train
self.train_cfg)
File "/home/cong/my_project/mmsegmentation-tianchi_tamper/mmseg/models/decode_heads/decode_head.py", line 207, in forward_train
losses = self.losses(seg_logits, gt_semantic_seg)
File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 214, in new_func
output = old_func(*new_args, **new_kwargs)
File "/home/cong/my_project/mmsegmentation-tianchi_tamper/mmseg/models/decode_heads/decode_head.py", line 259, in losses
ignore_index=self.ignore_index)
File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/cong/my_project/mmsegmentation-tianchi_tamper/mmseg/models/losses/lovasz_loss.py", line 308, in forward
**kwargs)
File "/home/cong/my_project/mmsegmentation-tianchi_tamper/mmseg/models/losses/lovasz_loss.py", line 219, in lovasz_softmax
flatten_probs(probs, labels, ignore_index),
File "/home/cong/my_project/mmsegmentation-tianchi_tamper/mmseg/models/losses/lovasz_loss.py", line 55, in flatten_probs
vprobs = probs[valid.nonzero().squeeze()]
RuntimeError: CUDA error: an illegal memory access was encountered
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: an illegal memory access was encountered
Exception raised from create_event_internal at /pytorch/c10/cuda/CUDACachingAllocator.cpp:687 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f9b166518b2 in /root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void) + 0xad2 (0x7f9b16a18952 in /root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7f9b1663cb7d in /root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #3: + 0x5ff66a (0x7f9baadff66a in /root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #4: + 0x5ff716 (0x7f9baadff716 in /root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #5: /root/anaconda3/envs/open-mmlab/bin/python() [0x4cb472]
frame #6: /root/anaconda3/envs/open-mmlab/bin/python() [0x4a0a87]
frame #7: /root/anaconda3/envs/open-mmlab/bin/python() [0x4b5cfb]
frame #8: /root/anaconda3/envs/open-mmlab/bin/python() [0x4b5cfb]
frame #9: /root/anaconda3/envs/open-mmlab/bin/python() [0x4b0858]
frame #10: /root/anaconda3/envs/open-mmlab/bin/python() [0x4c5b50]
frame #11: /root/anaconda3/envs/open-mmlab/bin/python() [0x4c5b66]
frame #12: /root/anaconda3/envs/open-mmlab/bin/python() [0x4c5b66]
frame #13: /root/anaconda3/envs/open-mmlab/bin/python() [0x4c5b66]
frame #14: /root/anaconda3/envs/open-mmlab/bin/python() [0x4c5b66]
frame #15: /root/anaconda3/envs/open-mmlab/bin/python() [0x4c5b66]
frame #16: /root/anaconda3/envs/open-mmlab/bin/python() [0x4c5b66]
frame #17: /root/anaconda3/envs/open-mmlab/bin/python() [0x4946f7]
frame #18: PyDict_SetItemString + 0x61 (0x499261 in /root/anaconda3/envs/open-mmlab/bin/python)
frame #19: PyImport_Cleanup + 0x89 (0x56f719 in /root/anaconda3/envs/open-mmlab/bin/python)
frame #20: Py_FinalizeEx + 0x67 (0x56b1a7 in /root/anaconda3/envs/open-mmlab/bin/python)
frame #21: /root/anaconda3/envs/open-mmlab/bin/python() [0x53fc79]
frame #22: _Py_UnixMain + 0x3c (0x53fb3c in /root/anaconda3/envs/open-mmlab/bin/python)
frame #23: + 0x29d90 (0x7f9bb37e5d90 in /usr/lib/x86_64-linux-gnu/libc.so.6)
frame #24: __libc_start_main + 0x80 (0x7f9bb37e5e40 in /usr/lib/x86_64-linux-gnu/libc.so.6)
frame #25: /root/anaconda3/envs/open-mmlab/bin/python() [0x53f9ee]
The text was updated successfully, but these errors were encountered: