Fix cosine LR scheduler for warmup #2312

sinahmr · 2024-10-24T19:13:26Z

I noticed that when using cosine scheduler with warmup (and warmup_prefix = True), the LR will not reach lr_min, which can be problematic the larger warmup_t is. For example, for epochs, initial_lr, lr_min, warmup_lr, warmup_t = 500, 1e-4, 1e-7, 1e-7, 100, we will have the following progression for LR:

I propose to change the code in a way to generate the following:

Hope I have changed the correct lines in the code.

rwightman · 2024-10-29T16:15:55Z

@sinahmr this does look like a valid thing to fix, just pondering how to approach re backwards compatibility of old hparam sets... similar to warmup prefix.

HuggingFaceDocBuilderDev · 2024-10-29T16:17:27Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

rwightman · 2024-11-07T06:49:55Z

@sinahmr so I looked at this more closely, it's a bit messy. Your fix improved the behaviour you were looking for w/ warmup_prefix=True, but it made warmup_prefix=False worse. It also appears to make the cycles unworkable.

The current state of things actually works fine IF you run for warmup_epochs + num_epochs. So, I was thinking the most sensible fix is to adjust get_cycle_length() to add warmup epochs/steps if warmup_prefix=True.

sinahmr · 2024-11-07T18:05:55Z

@rwightman Thanks for taking the time. You're right, sorry about those mistakes. I updated the code and I think it should resolve those problems. Can you please have a look?
I also tried to fix the cycles, but since I don't have experience in using that, can you please have a closer look to make sure I didn't ruin them?

Below, I provided the plots to compare:
epochs, initial_lr, lr_min, warmup_lr, warmup_t, cycle_mul, cycle_decay, cycle_limit = 500, 1e-4, 1e-7, 1e-7, 100, 0.9, 0.5, 3

rwightman · 2024-11-07T18:27:05Z

@sinahmr I have an alternative PR that I feel addresses the issue adequately, as long as the # of epochs/steps the schedule is run for is extended by the warmup when warmup_prefix=True, the schedule will complete correctly without any additional alterations. See #2325

EDIT: I also feel that extending the schedule to finish vs squishing the first cycle is a less significant change for backwards compat of hparams (does not alter early training), but allows more time to pick off good checkpoints at the end (if train hasn't petered out by then). Hence, no old result would be worse only potentially better.

sinahmr · 2024-11-08T01:47:06Z

@rwightman I agree that your proposal is more backward friendly. The only concern is that it might confuse users that the model runs for 330 epochs if they set --epochs 300 --warmup-epochs 30. Maybe it should be documented that if --warmup-prefix is set, the user should adjust the --epochs value manually.
Thanks for taking the time to fix the issue, feel free to close this PR if #2325 is merged.

rwightman · 2024-11-08T18:59:50Z

Updated the the log in train, will merge other pr shortly so closing this, thanks!

Scheduled epochs: 28 (warmup_epochs + epochs + cooldown_epochs). Warmup added to total when warmup_prefix=True. LR stepped per update.

Scheduled epochs: 25 (epochs + cooldown_epochs). Warmup within epochs when warmup_prefix=False. LR stepped per update.

Fix cosine LR scheduler for warmup

9a9633e

Fix cosine LR scheduler for warmup and cycles

772e3b7

rwightman mentioned this pull request Nov 7, 2024

Extend train epoch schedule by warmup_epochs if warmup_prefix enabled #2325

Merged

rwightman closed this Nov 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix cosine LR scheduler for warmup #2312

Fix cosine LR scheduler for warmup #2312

sinahmr commented Oct 24, 2024

rwightman commented Oct 29, 2024

HuggingFaceDocBuilderDev commented Oct 29, 2024

rwightman commented Nov 7, 2024

sinahmr commented Nov 7, 2024

rwightman commented Nov 7, 2024 •

edited

Loading

sinahmr commented Nov 8, 2024 •

edited

Loading

rwightman commented Nov 8, 2024

Fix cosine LR scheduler for warmup #2312

Fix cosine LR scheduler for warmup #2312

Conversation

sinahmr commented Oct 24, 2024

rwightman commented Oct 29, 2024

HuggingFaceDocBuilderDev commented Oct 29, 2024

rwightman commented Nov 7, 2024

sinahmr commented Nov 7, 2024

rwightman commented Nov 7, 2024 • edited Loading

sinahmr commented Nov 8, 2024 • edited Loading

rwightman commented Nov 8, 2024

rwightman commented Nov 7, 2024 •

edited

Loading

sinahmr commented Nov 8, 2024 •

edited

Loading