-
-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix cosine LR scheduler for warmup #2312
Conversation
@sinahmr this does look like a valid thing to fix, just pondering how to approach re backwards compatibility of old hparam sets... similar to warmup prefix. |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
@sinahmr so I looked at this more closely, it's a bit messy. Your fix improved the behaviour you were looking for w/ warmup_prefix=True, but it made warmup_prefix=False worse. It also appears to make the cycles unworkable. The current state of things actually works fine IF you run for warmup_epochs + num_epochs. So, I was thinking the most sensible fix is to adjust get_cycle_length() to add warmup epochs/steps if warmup_prefix=True. |
@rwightman Thanks for taking the time. You're right, sorry about those mistakes. I updated the code and I think it should resolve those problems. Can you please have a look? Below, I provided the plots to compare: |
@sinahmr I have an alternative PR that I feel addresses the issue adequately, as long as the # of epochs/steps the schedule is run for is extended by the warmup when warmup_prefix=True, the schedule will complete correctly without any additional alterations. See #2325 EDIT: I also feel that extending the schedule to finish vs squishing the first cycle is a less significant change for backwards compat of hparams (does not alter early training), but allows more time to pick off good checkpoints at the end (if train hasn't petered out by then). Hence, no old result would be worse only potentially better. |
@rwightman I agree that your proposal is more backward friendly. The only concern is that it might confuse users that the model runs for 330 epochs if they set |
Updated the the log in train, will merge other pr shortly so closing this, thanks!
|
I noticed that when using cosine scheduler with warmup (and
warmup_prefix = True
), the LR will not reachlr_min
, which can be problematic the largerwarmup_t
is. For example, forepochs, initial_lr, lr_min, warmup_lr, warmup_t = 500, 1e-4, 1e-7, 1e-7, 100
, we will have the following progression for LR:I propose to change the code in a way to generate the following:
Hope I have changed the correct lines in the code.