Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

loss does not converge #125

Open
DLC-jjj opened this issue Nov 5, 2022 · 25 comments
Open

loss does not converge #125

DLC-jjj opened this issue Nov 5, 2022 · 25 comments

Comments

@DLC-jjj
Copy link

DLC-jjj commented Nov 5, 2022

Hi, i have problem in training process of Pytorch version.I made no changes to the project and used the original BIPEDv2 dataset for training, and the parameters used the default parameters of the project. After training for 17 epochs, the loss barely changes. In the end, it can predict the image, but the effect is not very good. What could be the reason? Looking forward to your reply.
1667659366500
lena_std

@xavysp
Copy link
Owner

xavysp commented Nov 6, 2022

Hi can I see the tensorboad graph?

@LvGuangzu
Copy link

Have you solved this problem? I also have this problem, looking forward to your reply, if it is not solved we can communicate a bit.

@DLC-jjj
Copy link
Author

DLC-jjj commented Nov 7, 2022

Have you solved this problem? I also have this problem, looking forward to your reply, if it is not solved we can communicate a bit.

I'm very sorry that I haven't solved it yet. I asked one of my classmates to try this project and encountered the same problem, which has not been solved yet.

@DLC-jjj
Copy link
Author

DLC-jjj commented Nov 7, 2022

Hi can I see the tensorboad graph?

Very sorry I didn't have tensorboard installed. The loss fluctuates from 1.5 to 3.5 from the first epoch to the end, so there is nothing wrong with the loss curve. I asked one of my classmates to try this project and encountered the same problem, and the following person also encountered the same problem. I'm a little troubled.

@xavysp
Copy link
Owner

xavysp commented Nov 8, 2022

Well if you used dexined without changing the tensorboad part, maybe you have the data, you just need to see the graph. We need to see if there is an improvement by epochs. The labels In edge detection are very sensitive, in some training samples may be detected more edges than in the GT and you'll find diferen loss value but that does not mean dexined is not training. You should check the average loss of each epoch. It happen in DL based edge detectors.

@xavysp
Copy link
Owner

xavysp commented Nov 8, 2022

Hi can I see the tensorboad graph?

Very sorry I didn't have tensorboard installed. The loss fluctuates from 1.5 to 3.5 from the first epoch to the end, so there is nothing wrong with the loss curve. I asked one of my classmates to try this project and encountered the same problem, and the following person also encountered the same problem. I'm a little troubled.

How many training data do you have?

@LvGuangzu
Copy link

I used the data set BIPED in the project in the training process, and the training set contains 200 pictures. My loss dropped to 0.9 in the 34th round, but it rose to 2.1 in the 35th round, and the overall loss was around 2.

@LvGuangzu
Copy link

8a048c930d2f7e963a480bdc6c5ca31
Then I trained it again using the 0.9 pth file as my pretrained model and found that he started to oscillate around 1.7.

@LvGuangzu
Copy link

I read that your experiment in the paper was performed 150K times. I don't know if it is because I trained too few times. I trained for 100 rounds and found that there was no improvement.

@DLC-jjj
Copy link
Author

DLC-jjj commented Nov 8, 2022

嗨,我可以看到张量板图吗?

非常抱歉我没有张张。从第一个纪元结束到的,安装损失在 1.5 到 5 之间的波动量板,所以让项目失败没有任何问题。下面的人也遇到了同样的问题。我有点烦恼。

你有多少训练数据?

Thank you for your reply, I am using BIPEDv2 data, 200 training images. The training loss is similar to the loss function curve of the previous layer reply, and there is no tendency to converge.

@xavysp
Copy link
Owner

xavysp commented Nov 8, 2022

I used the data set BIPED in the project in the training process, and the training set contains 200 pictures. My loss dropped to 0.9 in the 34th round, but it rose to 2.1 in the 35th round, and the overall loss was around 2.

Did you change some hyperparameters? May you check with my lightweight model?
LDC: Lightweight Dense CNN for Edge Detection
I cannot find any error, just to make sure that the problem is not the data.

@LvGuangzu
Copy link

Yes, I changed some hyperparameters. I didn't change any hyperparameters on the first run, but found that the loss did not converge, I think it was a problem that the learning rate dropped too quickly, and then I modified the hyperparameters. First, is_testing=False; Then I modified the learning rate to drop 10x every 20 rounds.

@LvGuangzu
Copy link

Ok, I'll try to reproduce the LDC code over the next two days, and if it's not difficult to reproduce, I'll get back to you soon.

@LvGuangzu
Copy link

I used the data set BIPED in the project in the training process, and the training set contains 200 pictures. My loss dropped to 0.9 in the 34th round, but it rose to 2.1 in the 35th round, and the overall loss was around 2.

Did you change some hyperparameters? May you check with my lightweight model? LDC: Lightweight Dense CNN for Edge Detection I cannot find any error, just to make sure that the problem is not the data.

Hello, I have reproduced the LCD model, and I have modified some of the hyperparameters. Below I will list all the hyperparameters I have modified.

  1. is_testing=True -> is_testing=False
  2. epochs = 25 -> epochs = 50
  3. adjust_lr = [6,12,18] -> adjust_lr = [12,24,36]
  4. The BIPED dataset is still used for training, and resume=False is set, and the training is started from scratch.
    The final result shows signs of convergence.Below is the loss curve I got after training for 50 epochs.
    pic1

@LvGuangzu
Copy link

I used the data set BIPED in the project in the training process, and the training set contains 200 pictures. My loss dropped to 0.9 in the 34th round, but it rose to 2.1 in the 35th round, and the overall loss was around 2.

Did you change some hyperparameters? May you check with my lightweight model? LDC: Lightweight Dense CNN for Edge Detection I cannot find any error, just to make sure that the problem is not the data.

There is one place I don't understand, as shown in the figure below, what is the role of the seed here? Will it affect the results if I delete it? Maybe I was careless and didn't see the introduction of the relevant content, I hope you can help me answer it.
I} UVFO)YREHX~{W67@9U91

@xavysp
Copy link
Owner

xavysp commented Nov 8, 2022

I used the data set BIPED in the project in the training process, and the training set contains 200 pictures. My loss dropped to 0.9 in the 34th round, but it rose to 2.1 in the 35th round, and the overall loss was around 2.

Did you change some hyperparameters? May you check with my lightweight model? LDC: Lightweight Dense CNN for Edge Detection I cannot find any error, just to make sure that the problem is not the data.

Hello, I have reproduced the LCD model, and I have modified some of the hyperparameters. Below I will list all the hyperparameters I have modified.

  1. is_testing=True -> is_testing=False
  2. epochs = 25 -> epochs = 50
  3. adjust_lr = [6,12,18] -> adjust_lr = [12,24,36]
  4. The BIPED dataset is still used for training, and resume=False is set, and the training is started from scratch.
    The final result shows signs of convergence.Below is the loss curve I got after training for 50 epochs.
    pic1

Can a see the results? For example the edge-map of Lenna image.

@xavysp
Copy link
Owner

xavysp commented Nov 8, 2022

I used the data set BIPED in the project in the training process, and the training set contains 200 pictures. My loss dropped to 0.9 in the 34th round, but it rose to 2.1 in the 35th round, and the overall loss was around 2.

Did you change some hyperparameters? May you check with my lightweight model? LDC: Lightweight Dense CNN for Edge Detection I cannot find any error, just to make sure that the problem is not the data.

There is one place I don't understand, as shown in the figure below, what is the role of the seed here? Will it affect the results if I delete it? Maybe I was careless and didn't see the introduction of the relevant content, I hope you can help me answer it. I} UVFO)YREHX~{W67@9U91

It does not matter in a large scale, just trying to generalize the edge detection by changing seed

@LvGuangzu
Copy link

I used the data set BIPED in the project in the training process, and the training set contains 200 pictures. My loss dropped to 0.9 in the 34th round, but it rose to 2.1 in the 35th round, and the overall loss was around 2.

Did you change some hyperparameters? May you check with my lightweight model? LDC: Lightweight Dense CNN for Edge Detection I cannot find any error, just to make sure that the problem is not the data.

Hello, I have reproduced the LCD model, and I have modified some of the hyperparameters. Below I will list all the hyperparameters I have modified.

  1. is_testing=True -> is_testing=False
  2. epochs = 25 -> epochs = 50
  3. adjust_lr = [6,12,18] -> adjust_lr = [12,24,36]
  4. The BIPED dataset is still used for training, and resume=False is set, and the training is started from scratch.
    The final result shows signs of convergence.Below is the loss curve I got after training for 50 epochs.
    pic1

Can a see the results? For example the edge-map of Lenna image.
Of course, I use the pth file output from the 26th round, because his loss=3.56, which is the lowest within 50 rounds.The first is the avg picture, the second is the fuse picture.
avg_lena
fuse_lena

@LvGuangzu
Copy link

I used the data set BIPED in the project in the training process, and the training set contains 200 pictures. My loss dropped to 0.9 in the 34th round, but it rose to 2.1 in the 35th round, and the overall loss was around 2.

Did you change some hyperparameters? May you check with my lightweight model? LDC: Lightweight Dense CNN for Edge Detection I cannot find any error, just to make sure that the problem is not the data.

There is one place I don't understand, as shown in the figure below, what is the role of the seed here? Will it affect the results if I delete it? Maybe I was careless and didn't see the introduction of the relevant content, I hope you can help me answer it. I} UVFO)YREHX~{W67@9U91

It does not matter in a large scale, just trying to generalize the edge detection by changing seed

Can I understand that the seed here is basically useless? Can I delete it later for training and testing?Or when do I need to use seeds?

@xavysp
Copy link
Owner

xavysp commented Nov 10, 2022

Yes is useless

@xavysp
Copy link
Owner

xavysp commented Nov 10, 2022

Here the lenna from de fused module then the average one. Results from LDC
IMG-20221109-WA0018
IMG-20221109-WA0017

@LvGuangzu
Copy link

Here the lenna from de fused module then the average one. Results from LDC IMG-20221109-WA0018 IMG-20221109-WA0017

The effect looks very good. I would like to ask what is the final convergence loss of the BIPED dataset when you used the LDC model to train it. The catloss provided in the code I use

@xavysp
Copy link
Owner

xavysp commented Nov 14, 2022

Sorry I don't have access to my former lab, and I cannot take it. But I let you know whenever a have it

@zwz-append
Copy link

嗨,我可以看到张量板图吗?

非常抱歉我没有张张。从第一个纪元结束到的,安装损失在 1.5 到 5 之间的波动量板,所以让项目失败没有任何问题。下面的人也遇到了同样的问题。我有点烦恼。

你有多少训练数据?

Thank you for your reply, I am using BIPEDv2 data, 200 training images. The training loss is similar to the loss function curve of the previous layer reply, and there is no tendency to converge.

Hi, i have problem in training process of Pytorch version.I made no changes to the project and used the original BIPEDv2 dataset for training, and the parameters used the default parameters of the project. After training for 17 epochs, the loss barely changes. In the end, it can predict the image, but the effect is not very good. What could be the reason? Looking forward to your reply. 1667659366500 lena_std

Hello, I have a question about dataset.py when i use bipedv2 dataset for training. I have changed the 'data_dir' in main.py , and data_types= ['aug'] -> data_types= ['real'] in lin 322 and 330 of dataset.py. However, the system show me that NotADirectoryError: [WinError 267] 目录名称无效。: 'E:\SZUer\codes\CVweizhu\DexiNed-master\dataset-lists\BIPEDv2\BIPED\edges\imgs\train\rgbr\real\RGB_001.jpg'.
I have tried to test lin 368-374 of dataset.py , but it can't success.
However I can ues bipedv2 dataset for testing. I don't know how to make it.

@LvGuangzu
Copy link

Sorry I don't have access to my former lab, and I cannot take it. But I let you know whenever a have it

OK, thank you. I used the data enhancement operation you used. The current data volume is 288 * 200 images. Our laboratory can only use a 3090 gpu. It takes an hour and a half to train one epoch of the dexined model. So I want to ask how many times you converged at that time? I saw in Dexined's paper that you iterated 150k times when doing experiments. Is that 150 rounds or 150,000 rounds?
%4_R6$W061 9EH)D7LNR%ME

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants