-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No mixed precision support with GradientAccumulateOptimizer? #113
Comments
I took another look at the source code of the By wrapping the optimizer with I'm quite preoccupied with finalizing my PhD atm, but if you have time, a PR would be of great value, @dPys :] |
Oh, after a new look at this, I am not so sure we should apply the same loss scaling naively, as this may result in numerical instabilities. I have observed the same also doing regular batch training with mixed precision when using loss scaling, so I would think you definitely would see the same in this scenario. I am not sure what the best solution for this is. Note that, as discussed in this paper, it is possible to do gradient clipping instead to avoid exploding gradients. We already have implemented a technique for this in the model wrapper, and if anyone wants to test this as an alternative way to avoid gradient explosions, you could try to add the following to your code: |
@andreped - I have a working reimplementation of the optimizer (along with the adaptive gradient clipping) that supports mixed-precision with the loss-scaling, among several other key upgrades. It has also purged eager ops in favor of pure TF, and includes @tf.function's where applicable. I've also done considerable work benchmarking to reduce the excessive graph tracing. If you wish to share ownership of your package, I'd be happy to submit it as a PR? |
Amazing, @dPys! Looking forward to it! 🎉 You will for sure be added as coauthor to the package on Zenodo, but let's first get the PR in, reviewed, and merged :] I can review it tomorrow. I was hoping of making a new release. Actually v1.0.0, so if all else goes well, you will be coauthor on that release :] EDIT: Also, if you wish to discuss the PR and contribution further, feel free to reach out on my LinkedIn: |
Sweet, sounds good. I've at least gotten the ball rolling: #131 |
Great, I made some initial remarks, which you need to address before you can run the full tests. Note that you can run most of the tests locally using |
Love this package!
Curious -- is the lack of mixed-precision support for GradientAccumulateOptimizer intentional (e.g. perhaps observed issues with stability?) or is this already something on the to-do list that just has yet to be implemented? if just the latter, I've put together a rough draft for how this might look (assuming the input optimizer is either SGD or ADAM and has already wrapped in the
LossScaleOptimizer
):curious to hear your thoughts!
The text was updated successfully, but these errors were encountered: