-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow for using other Learning Rate Schedulers and Optimizers #76
Comments
We could include an option to select the lr scheduler. That's easy since it's just swapping the pytorch lr scheduler and adapting the step call. If you have the code around feel free to push a PR and we can see how to include it! |
Hey @emanjavacas :) |
Coming back with new experiences, regarding Ranger vs Adam. I have been playing with single tasks models (which indeed improve when fine tuned correctly), and Ranger clearly yields results that are more stable: The second before last and the second are the same config, just the optimizer is changing (without finetuning optimizer hyperparams) |
Hey !
I started reading about some other optimizers, as things went through my news feed (stuff like this or that).
I ended up trying to implement it in pie but wanted to see first what would be the results. The test were done as follow: same training set (~500k words), same learning rate, same testing set (~ 63k tokens), cuda, 10 run per configuration. No optimization were done.
For optimizers, were tested Ranger and Adam. I did not try anything else
For learning rate, were tested ReduceLROnPlaeau, CosineAnnealing, Delayed(CosineAnnealing).
Patience overall is 15 steps with improvement. CosineAnnealing T0 is 40, Delay is 10.
Basically, Ranger does not outperform Adam (maybe with other parameters, who knows, as the beta is different from Adam) but Delay(CosineAnnealing) is reaching same results in 40% less time.
If you are okay, PR will be under way.
Results:
The text was updated successfully, but these errors were encountered: