Allow for using other Learning Rate Schedulers and Optimizers #76

PonteIneptique · 2020-12-06T09:33:23Z

Hey !
I started reading about some other optimizers, as things went through my news feed (stuff like this or that).

I ended up trying to implement it in pie but wanted to see first what would be the results. The test were done as follow: same training set (~500k words), same learning rate, same testing set (~ 63k tokens), cuda, 10 run per configuration. No optimization were done.

For optimizers, were tested Ranger and Adam. I did not try anything else
For learning rate, were tested ReduceLROnPlaeau, CosineAnnealing, Delayed(CosineAnnealing).
Patience overall is 15 steps with improvement. CosineAnnealing T0 is 40, Delay is 10.

Basically, Ranger does not outperform Adam (maybe with other parameters, who knows, as the beta is different from Adam) but Delay(CosineAnnealing) is reaching same results in 40% less time.

If you are okay, PR will be under way.

Results:

emanjavacas · 2020-12-06T16:32:59Z

We could include an option to select the lr scheduler. That's easy since it's just swapping the pytorch lr scheduler and adapting the step call. If you have the code around feel free to push a PR and we can see how to include it!

PonteIneptique · 2020-12-09T06:23:14Z

So, small update with my old branch, regarding Flat(Cosine)(Delay=10, CosineTmax=40, patience=11): I can definitely recommend it. On a corpus of 1.5M tokens (3 times the previous one), it's not only faster, it's a also scoring higher with less deviation:

PonteIneptique · 2020-12-11T08:40:14Z

Hey @emanjavacas :)
I was very bugged by the results on Ranger on the first batch, because I remembered running small trainings and having better results than with Adam. Then I remembered I read that Ranger takes a higher learning rate to start with, and that I did use a higher one for my preliminary tests.
So I did it as well with the LASLA corpus, and I scored better results (note that my Adam LR is fine tuned, after close to 100 run to find the best hyperparams), with a 10x higher LR than my Adam one:

PonteIneptique · 2020-12-11T08:41:49Z

I also found out I am using CosineAnnealing the wrong way, but it still perform better than Adam: instead of using T_max as the cycle for which you'd find a cosine curve of LR, I have been using it as a slope (the LR is badly offset, it should be 10 epochs on the right):

PonteIneptique · 2021-03-01T16:39:33Z

Coming back with new experiences, regarding Ranger vs Adam.

I have been playing with single tasks models (which indeed improve when fine tuned correctly), and Ranger clearly yields results that are more stable:

The second before last and the second are the same config, just the optimizer is changing (without finetuning optimizer hyperparams)

PonteIneptique mentioned this issue Dec 6, 2020

Allow for other Loss systems #77

Open

This was referenced Dec 6, 2020

(Feature) Add the ability to use other optimizers and LRScheduler #78

Closed

Feature: allow picking whatever pytorch lr_scheduler #79

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow for using other Learning Rate Schedulers and Optimizers #76

Allow for using other Learning Rate Schedulers and Optimizers #76

PonteIneptique commented Dec 6, 2020 •

edited

Loading

emanjavacas commented Dec 6, 2020

PonteIneptique commented Dec 9, 2020 •

edited

Loading

PonteIneptique commented Dec 11, 2020

PonteIneptique commented Dec 11, 2020

PonteIneptique commented Mar 1, 2021 •

edited

Loading

Allow for using other Learning Rate Schedulers and Optimizers #76

Allow for using other Learning Rate Schedulers and Optimizers #76

Comments

PonteIneptique commented Dec 6, 2020 • edited Loading

emanjavacas commented Dec 6, 2020

PonteIneptique commented Dec 9, 2020 • edited Loading

PonteIneptique commented Dec 11, 2020

PonteIneptique commented Dec 11, 2020

PonteIneptique commented Mar 1, 2021 • edited Loading

PonteIneptique commented Dec 6, 2020 •

edited

Loading

PonteIneptique commented Dec 9, 2020 •

edited

Loading

PonteIneptique commented Mar 1, 2021 •

edited

Loading