-
To my understanding this error is caused by the code below that opencv can't get correct image.
But the path is correctly point to normal croped image.
Environment : Ubuntu 16 cuda10.1 RTX2080Ti and TITAN Xp 64GB Mem 6GB Swap
When the code get here ,it stucked. In the same time , through nvidia-smi, the processes on 2080Ti are stoped on by one,and only 3 processes on TITAN XP remained. When workers is set to 1, DataLoader worker was killed like feedback below
About dataset: About training process: It's my first issue and i try to present the problems completely. Sorry for the inconvenient! |
Beta Was this translation helpful? Give feedback.
Replies: 11 comments 12 replies
-
By the way, I can use pretrained model from zoo to test like VOT2016.
|
Beta Was this translation helpful? Give feedback.
-
About the training setting But another question occurred to me. About the dataset |
Beta Was this translation helpful? Give feedback.
-
It could be, have you checked if the path is correct?
Have you checked #399?
It's hard to tell.
TODO
Please refer to the other issue. |
Beta Was this translation helpful? Give feedback.
-
I'm actually updating this repor to PyTorch 1.5/1.6, so it should be solved by then. For now I have no idea how to solve it.
It could help, check if there is image before load it could be an excellent idea, but this may lead to some batch contains less image, and may requires some additional work to balance it |
Beta Was this translation helpful? Give feedback.
-
Did you mean by check every img_path before cv.imread in the getitem?
According to #399 , I set num_threads = 1 in par_crop.py for coco, the process a little bit slow. Thanks for the reply. |
Beta Was this translation helpful? Give feedback.
-
Yes
I have no idea either since I couldn't reproduce this error. So it could be possible that some image path is incorrect.
Great, let us know how it works.
No worries, glad I can help. |
Beta Was this translation helpful? Give feedback.
-
Thanks for the quick reply, After I check the image_path, if there really have some problem. Will simply delete the error path in the .json help? So the path lead incorrect image can't be loaded.Or there are some pair process i miss. And i noticed that you said:
I don't quite understand, I thought no matter how many image in the dataset , the dataloader can work correctly. |
Beta Was this translation helpful? Give feedback.
-
It could, but this may riase other issue like couldn't get correct search image.
If you modified the getitem in dataloader and let it pass if the image does not exist, dataloader would not load additional image to maintain the same batch size, but just return whatever it reads. |
Beta Was this translation helpful? Give feedback.
-
Does the training pair from yt_bb from the one folder like yt_bb/crop511/train0000/0/-0F2NokPzeQc? If so, will just delete these folder path in json help? And there are some other question Really sorry for question after qusetion, I'm the first in my lab whose research direction is SOT. There are not very much people around know about it. |
Beta Was this translation helpful? Give feedback.
-
It might help
Nope
It depends on if you want to set this env variable |
Beta Was this translation helpful? Give feedback.
-
So, no matter how many datasets i use, all i need to change is DATASET.NAMES in sianrpn_alex_dwxcorr_16gpu/config.yaml? As I said before, i find my training process getting slower and slower with every logger.info shows. |
Beta Was this translation helpful? Give feedback.
So, no matter how many datasets i use, all i need to change is DATASET.NAMES in sianrpn_alex_dwxcorr_16gpu/config.yaml?
As I said before, i find my training process getting slower and slower with every logger.info shows.
I noticed that during training memory-usage is about 2000+M/11019M for each GPU, but Volatile GPU-util is 0% for most of time? Is this status normal?
I wonder could this phenomenon cause my training process…