-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: allow picking whatever pytorch lr_scheduler #79
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before I try this version, there are a few comments and it's missing an implementation of the DelayerScheduler
.
pie/trainer.py
Outdated
self.lr_scheduler.num_bad_epochs, | ||
self.lr_scheduler.patience, | ||
self.lr_scheduler.threshold) | ||
res = '<LrScheduler lr="{:g}"'.format( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the stuff I added, we are missing the type of LrScheduler here.
self.lr_scheduler.threshold) | ||
res = '<LrScheduler lr="{:g}"'.format( | ||
self.lr_scheduler.optimizer.param_groups[0]['lr']) | ||
for key in dir(self.lr_scheduler): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are losing some manual filter on what we want to show here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, but otherwise you have to do it manually per class...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know... Couldn't we curate for specific scheduler ? And automatically show for others ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as long as it doesn't introduce too much clutter, feel free!
"lr_patience": 2, // patience for lr schedule | ||
|
||
"lr_scheduler": "ReduceLROnPlateau", | ||
"lr_scheduler_params": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely like this part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But, as I mentioned in my other PR, this basically breaks when initiating something else than a Adam
.
Another option would be to have
{
"lr_scheduler_params": {
"RecudeLROnPlateau": {
"mode": "max", ...
}
}
}
and this way provide defaults for things we tested ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But those parameters are not specific from the optimizer but from the scheduler, right? I have made some changes that should address this issue...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am sorry, I meant scheduler. I am a bit tired right now ^^. So I corrected the comment up there.
From the last discussion,t we concluded to leave out the stuff coming from 3D party libraries (including the delayed stuff) |
Delayer is not third party, I coded it :) And it's only with Delayer that Cosine outperforms significantly Adam :) |
Could you pin point what is it that the Delayer class is achieving? I see a lot of new code involved in printing stuff and so on. Perhaps there is an easier way to add what the Delayer class contributes. |
There is definitely. May I pr your PR ? |
Ye, go ahead. Only I'd rather really keep it simple. For instance if there is that class that delays the step, we don't need to add a bunch of new functions to adapt the printing code for that class. |
Check the last commit, that might solve the issues |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried it with ReduceLROnPlateau, Delay(ReduceLR...), Delay(Cosine).
It seems to work fine, delay is applied correctly (lr_scheduler.last_epoch is incremented only after delay is reached)
Great ! |
Any chance you could quickly replicate the results for the combination that
was on par with ReduceLROnPlateau but was faster? (No rush here)
…On Mon, Dec 7, 2020 at 6:25 PM Thibault Clérice ***@***.***> wrote:
Great !
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#79 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABPIPI4DR6VURKQ7RURLHH3STUFYFANCNFSM4UQORVYQ>
.
--
Enrique Manjavacas
|
Running: "lr_scheduler": "CosineAnnealingLR", // LR Scheduler to use: ReduceLROnPlateau,
"lr_scheduler_delay": 10,
"lr_scheduler_params": {
"T_max": 40,
"min_lr": 0.000001, // minimum learning rate
"lr_patience":7,
"factor":0.6,
}, |
You don't want me to rerun the 2x10 runs right ? |
It works: ran once on my personal GPU for 67 epochs (higher parts of the box in #76 (comment)) with score within the box 97.61. (previous version of comment was using dev score, because I am tired and can't read) |
Here is a draft for the lr_scheduler params (related to #78) and how to avoid breaking backwards compatibility. Could you use it to test the results of the cosineannealing that you got?