使用天池的数据集和work_configs里的配置文件修改了数据集路径,分类数改为2,训练时一直报RuntimeError: CUDA error: device-side assert triggered terminate called after throwing an instance of 'c10::Error' what(): CUDA error: device-side assert triggered #5
Labels
question
Further information is requested
请问你遇到过这个问题吗?
Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed./pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [14592,0,0], thread: [21,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed./pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [14592,0,0], thread: [22,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed./pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [14757,0,0], thread: [47,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed./pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [14757,0,0], thread: [48,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed./pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [14757,0,0], thread: [49,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed./pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [14757,0,0], thread: [50,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed./pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [14757,0,0], thread: [51,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed./pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [14757,0,0], thread: [52,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed./pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [14757,0,0], thread: [53,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed./pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [14757,0,0], thread: [54,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed./pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [14757,0,0], thread: [55,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed./pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [14432,0,0], thread: [17,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed./pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [14432,0,0], thread: [18,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed./pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [14432,0,0], thread: [19,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed./pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [14432,0,0], thread: [20,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed./pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [14432,0,0], thread: [21,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed./pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [14432,0,0], thread: [22,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.Traceback (most recent call last):
File "tools/train.py", line 180, in
main()
File "tools/train.py", line 176, in main
meta=meta)
File "/root/work/tamper/mmseg/apis/train.py", line 135, in train_segmentor
runner.run(data_loaders, cfg.workflow)
File "/root/.local/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 134, in run
iter_runner(iter_loaders[i], **kwargs)
File "/root/.local/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 61, in train
outputs = self.model.train_step(data_batch, self.optimizer, **kwargs)
File "/root/.local/lib/python3.7/site-packages/mmcv/parallel/data_parallel.py", line 75, in train_step
return self.module.train_step(*inputs[0], **kwargs[0])
File "/root/work/tamper/mmseg/models/segmentors/base.py", line 138, in train_step
losses = self(**data_batch)
File "/root/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/root/.local/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 128, in new_func
output = old_func(*new_args, **new_kwargs)
File "/root/work/tamper/mmseg/models/segmentors/base.py", line 108, in forward
return self.forward_train(img, img_metas, **kwargs)
File "/root/work/tamper/mmseg/models/segmentors/encoder_decoder.py", line 144, in forward_train
gt_semantic_seg)
File "/root/work/tamper/mmseg/models/segmentors/encoder_decoder.py", line 88, in _decode_head_forward_train
self.train_cfg)
File "/root/work/tamper/mmseg/models/decode_heads/decode_head.py", line 203, in forward_train
losses = self.losses(seg_logits, gt_semantic_seg)
File "/root/.local/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 214, in new_func
output = old_func(*new_args, *new_kwargs)
File "/root/work/tamper/mmseg/models/decode_heads/decode_head.py", line 240, in losses
seg_weight = self.sampler.sample(seg_logit, seg_label)
File "/root/work/tamper/mmseg/core/seg/sampler/ohem_pixel_sampler.py", line 56, in sample
sort_prob, sort_indices = seg_prob[valid_mask].sort()
RuntimeError: CUDA error: device-side assert triggered
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: device-side assert triggered
Exception raised from create_event_internal at /pytorch/c10/cuda/CUDACachingAllocator.cpp:687 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f6737b1f8b2 in /root/.local/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void) + 0xad2 (0x7f6737d71952 in /root/.local/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7f6737b0ab7d in /root/.local/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #3: + 0x5ff43a (0x7f66f16a043a in /root/.local/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #4: + 0x5ff4e6 (0x7f66f16a04e6 in /root/.local/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #22: __libc_start_main + 0xe7 (0x7f674b287bf7 in /lib/x86_64-linux-gnu/libc.so.6)
The text was updated successfully, but these errors were encountered: