Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

讨论:前向计算w做了归一化,而反向传播更新w的时候,是按照归一化前来计算梯度,是否会导致收敛困难? #29

Open
xialuxi opened this issue Jun 5, 2019 · 2 comments

Comments

@xialuxi
Copy link
Owner

xialuxi commented Jun 5, 2019

No description provided.

@xialuxi
Copy link
Owner Author

xialuxi commented Jun 10, 2019

1、我增加w的norm操作的反向传播,训练速度变慢了很多(原来的1/5左右),收敛同样慢,而且容易出现跑飞的现场。
2、就目前的实验结果,貌似w做了归一化,反向传播不计算,反而更好一点(计算量少了很多)。

@wu-ruijie
Copy link

我也做了w的norm操作的反向传播,发现效果还不如不做,另外传播更新w的时候采用归一化前的w或者归一化后的w,跑了一段时间发现没有差别。
ps:w的norm操作的反向传播的代码作者能贴出来吗?我想参考一下我是否写的有问题,非常感谢!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants