Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

使用temper中的config,更换为自己的数据集,报错RuntimeError: CUDA error: an illegal memory access was encountered terminate called after throwing an instance of 'c10::Error' #13

Closed
CongYep opened this issue Nov 19, 2023 · 0 comments

Comments

@CongYep
Copy link

CongYep commented Nov 19, 2023

使用temper中的config,使用命令bash tools/dist_train.sh work_configs/tamper/tamper_convx_b_exp.py 2,更换为自己的数据集(nist16,casia等),报错RuntimeError: CUDA error: an illegal memory access was encountered terminate called after throwing an instance of 'c10::Error',请问如何解决

Traceback (most recent call last):
File "tools/train.py", line 181, in
main()
File "tools/train.py", line 177, in main
meta=meta)
File "/home/cong/my_project/mmsegmentation-tianchi_tamper/mmseg/apis/train.py", line 135, in train_segmentor
runner.run(data_loaders, cfg.workflow)
File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
epoch_runner(data_loaders[i], **kwargs)
File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train
self.run_iter(data_batch, train_mode=True, **kwargs)
File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter
**kwargs)
File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/parallel/distributed.py", line 52, in train_step
output = self.module.train_step(*inputs[0], **kwargs[0])
File "/home/cong/my_project/mmsegmentation-tianchi_tamper/mmseg/models/segmentors/base.py", line 138, in train_step
losses = self(**data_batch)
File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 128, in new_func
output = old_func(*new_args, **new_kwargs)
File "/home/cong/my_project/mmsegmentation-tianchi_tamper/mmseg/models/segmentors/base.py", line 108, in forward
return self.forward_train(img, img_metas, **kwargs)
File "/home/cong/my_project/mmsegmentation-tianchi_tamper/mmseg/models/segmentors/encoder_decoder.py", line 144, in forward_train
gt_semantic_seg)
File "/home/cong/my_project/mmsegmentation-tianchi_tamper/mmseg/models/segmentors/encoder_decoder.py", line 88, in _decode_head_forward_train
self.train_cfg)
File "/home/cong/my_project/mmsegmentation-tianchi_tamper/mmseg/models/decode_heads/decode_head.py", line 207, in forward_train
losses = self.losses(seg_logits, gt_semantic_seg)
File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 214, in new_func
output = old_func(*new_args, **new_kwargs)
File "/home/cong/my_project/mmsegmentation-tianchi_tamper/mmseg/models/decode_heads/decode_head.py", line 259, in losses
ignore_index=self.ignore_index)
File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/cong/my_project/mmsegmentation-tianchi_tamper/mmseg/models/losses/lovasz_loss.py", line 308, in forward
**kwargs)
File "/home/cong/my_project/mmsegmentation-tianchi_tamper/mmseg/models/losses/lovasz_loss.py", line 219, in lovasz_softmax
flatten_probs(probs, labels, ignore_index),
File "/home/cong/my_project/mmsegmentation-tianchi_tamper/mmseg/models/losses/lovasz_loss.py", line 55, in flatten_probs
vprobs = probs[valid.nonzero().squeeze()]
RuntimeError: CUDA error: an illegal memory access was encountered
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: an illegal memory access was encountered
Exception raised from create_event_internal at /pytorch/c10/cuda/CUDACachingAllocator.cpp:687 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f9b166518b2 in /root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void
) + 0xad2 (0x7f9b16a18952 in /root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7f9b1663cb7d in /root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #3: + 0x5ff66a (0x7f9baadff66a in /root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #4: + 0x5ff716 (0x7f9baadff716 in /root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #5: /root/anaconda3/envs/open-mmlab/bin/python() [0x4cb472]
frame #6: /root/anaconda3/envs/open-mmlab/bin/python() [0x4a0a87]
frame #7: /root/anaconda3/envs/open-mmlab/bin/python() [0x4b5cfb]
frame #8: /root/anaconda3/envs/open-mmlab/bin/python() [0x4b5cfb]
frame #9: /root/anaconda3/envs/open-mmlab/bin/python() [0x4b0858]
frame #10: /root/anaconda3/envs/open-mmlab/bin/python() [0x4c5b50]
frame #11: /root/anaconda3/envs/open-mmlab/bin/python() [0x4c5b66]
frame #12: /root/anaconda3/envs/open-mmlab/bin/python() [0x4c5b66]
frame #13: /root/anaconda3/envs/open-mmlab/bin/python() [0x4c5b66]
frame #14: /root/anaconda3/envs/open-mmlab/bin/python() [0x4c5b66]
frame #15: /root/anaconda3/envs/open-mmlab/bin/python() [0x4c5b66]
frame #16: /root/anaconda3/envs/open-mmlab/bin/python() [0x4c5b66]
frame #17: /root/anaconda3/envs/open-mmlab/bin/python() [0x4946f7]
frame #18: PyDict_SetItemString + 0x61 (0x499261 in /root/anaconda3/envs/open-mmlab/bin/python)
frame #19: PyImport_Cleanup + 0x89 (0x56f719 in /root/anaconda3/envs/open-mmlab/bin/python)
frame #20: Py_FinalizeEx + 0x67 (0x56b1a7 in /root/anaconda3/envs/open-mmlab/bin/python)
frame #21: /root/anaconda3/envs/open-mmlab/bin/python() [0x53fc79]
frame #22: _Py_UnixMain + 0x3c (0x53fb3c in /root/anaconda3/envs/open-mmlab/bin/python)
frame #23: + 0x29d90 (0x7f9bb37e5d90 in /usr/lib/x86_64-linux-gnu/libc.so.6)
frame #24: __libc_start_main + 0x80 (0x7f9bb37e5e40 in /usr/lib/x86_64-linux-gnu/libc.so.6)
frame #25: /root/anaconda3/envs/open-mmlab/bin/python() [0x53f9ee]

@CongYep CongYep closed this as completed Nov 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant