We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
latest
Source
CorrDiff loss is scaled by hyper-parameter, therefore we could not make a hyper-parameter search, because each run cannot be compared to the others.
example:
batch_gpu_total
loss_accum
batch_size_gpu
why not just normalize it by batch_size_global? such as below
batch_size_global
for round_idx in range(num_accumulation_rounds): with ddp_sync(ddp, (round_idx == num_accumulation_rounds - 1)): ... loss = loss.sum().mul(loss_scaling / batch_gpu_total) loss_accum += loss / num_accumulation_rounds loss.backward() loss_sum = torch.tensor([loss_accum], device=device) if dist.world_size > 1: torch.distributed.all_reduce(loss_sum, op=torch.distributed.ReduceOp.SUM) average_loss = loss_sum / dist.world_size if dist.rank == 0: wb.log({"training loss": average_loss}, step=cur_nimg)
for round_idx in range(num_accumulation_rounds): with ddp_sync(ddp, (round_idx == num_accumulation_rounds - 1)): ... loss = loss.sum().mul(loss_scaling / batch_size_global) ### Modified loss_accum += loss ### Modified loss.backward() loss_sum = torch.tensor([loss_accum], device=device) if dist.world_size > 1: torch.distributed.all_reduce(loss_sum, op=torch.distributed.ReduceOp.SUM) average_loss = loss_sum / dist.world_size if dist.rank == 0: wb.log({"training loss": average_loss}, step=cur_nimg)
see README
example: - if `batch_gpu_total` = 1, `loss_accum` = L, when `batch_gpu_total` = 2, `loss_accum` = L/2 - if `batch_size_gpu` = 1, `loss_accum` = L, when `batch_size_gpu` = 2, `loss_accum` = 2*L
No response
The text was updated successfully, but these errors were encountered:
Hi @chychen , thanks for reporting the issue. I agree with the proposed modification. Could you please open a PR?
Sorry, something went wrong.
@chychen did you have a chance to make a PR for this modification?
this is an 3-month old issue, seems like the latest version has already solve this issue.
mnabian
No branches or pull requests
Version
latest
On which installation method(s) does this occur?
Source
Describe the issue
CorrDiff loss is scaled by hyper-parameter, therefore we could not make a hyper-parameter search, because each run cannot be compared to the others.
example:
batch_gpu_total
= 1,loss_accum
= L, whenbatch_gpu_total
= 2,loss_accum
= L/2batch_size_gpu
= 1,loss_accum
= L, whenbatch_size_gpu
= 2,loss_accum
= 2*Lwhy not just normalize it by
batch_size_global
? such as belowNow Implementation
Proposed Modification
Minimum reproducible example
Relevant log output
Environment details
No response
The text was updated successfully, but these errors were encountered: