-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug Report] Latest update to moreh_adamw has issue #14186
Comments
That is odd to say the least. I would not expect that to happen either since the program cache should be faster. It being slower would suggests runtime argument replacement is taking longer than recompiling from scratch which makes little sense to me |
Hi Roman, |
@mrshaw01 we mean training loss. |
Hi everyone, |
But if talk about cache. Somehow I don't see significant perf difference in cached vs non-cached :) |
Before cache fix, we get loss equal to 3.0 at iteration 8. |
We’ve got it. o2buzzle will debug this issue. |
@rfurko-tt That would suggest an issue with
|
@o2buzzle that was our idea because it kinda trained but like with 100x smaller lr. |
@dmakoviichuk-tt Would you mind providing some sample code that can reproduce the issue for reference? As it is I am not able to find anything that would stands out as buggy and the unit tests seems to be agreeing with that. |
#14243 here is the fix. |
fyi this dude (@mrshaw01) told me to add everything instead of just setting it to zero (something something code convention) |
Both approaches should work well. |
@mrshaw01 would be nice to find why it is buggy in fuutre. Is it hash or missed parameters? |
I find the odd thing here being that the kernel itself is reporting (through I originally study and re-implemented the hashing code from the example at Also, Shaw didn’t really reviewed how the hash is handled per se, he just checked if it worked correctly with the provided unit tests (which it did pass, oddly) and gave input on how I implement it, but otherwise didn’t really deal with that bit of change directly. Hashing details errors there are mine and mine only. |
* tenstorrent#14186: Fixed moreh_adam * #0: fixed adam too
Describe the bug
After updating to latest
tt-metal
main branch, we see that training loss goes down way slower. After turning off program cache, training loss goes as expected. We expect that bug is located in latest update tomoreh_adamw
cache functions. One of the ideas, that we aren't properly passingstep
now.Expected behavior
Speed of convergence should not be affected by program caching.
Please complete the following environment information:
The text was updated successfully, but these errors were encountered: