-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug Report] CPU and NPU produce different BFP8_B values from the same BF16 tensor. #14032
Comments
I have started investigating and should have an update by tomorrow. |
I've managed to trace the data through the HW path it takes, and it looks like the issue occurs in the Packer. I will be investigating this further |
The issue comes from how the packer handles bfp8_b datums, with rounding and shifting with rounding again. I'm looking into best course of action now. Can you please provide some context to why the values need to match exactly? Is this just a bug report or is the accuracy needed somewhere specific? One potential fix would come at the expense of perf due to not using full packer bandwidth, so it would help to know the application. |
|
Yes, the CPU is producing the more accurate values in this case. Ok I will look into how that can be done. |
@rdjogoTT any updates on this P0 bug? |
I have pushed a branch This raises the question of how to handle this tradeoff. The way to decide would be to try training with bfp8b and see whether or not the accuracy is needed/tradeoff is worth it. |
It looks that I cannot apply the accurate conversion in program-object-level. Could you make an option that toggles this feature? |
Ok, I will add an option ASAP. |
#14822 contains the new flag In the case of typecast, this is how it can be set to True:
|
Thank you! Could I check the feature in various input? |
The feature should enable fp16b or fp32 dest to be packed out accurately. Yes, please test it with various inputs. |
What is enabling fp16b? |
Sorry I should have clarified - typecast works by unpacking the fp16b to Dest and using the packer to convert to bfp8b. Similar with fp32 to bfp8b typecast. This bug was caused by that packer conversion, so this feature should enable accurate conversion now. |
Hi @namhyeong-kim, can you please share your tests with me when you make them. I would like to include them with my PR: #14822 before merging to main. |
Ok. I will share it |
I pushed the tests to my branch that is rebased on your branch.
Could you check these failed tests? |
I've found that the issue is again to do with the packer, will post further updates soon. |
Tensix team has confirmed that we have had issues in the past as well with these special values. Unfortunately nothing can be done from the HW/LLK side to remedy this. LLK recommendation is to make the checks for special values aware that NaN in Bfp8_b may be converted to -Inf. If there is a case that you must know if there is a NaN, maybe check before converting to Bfp8_b? |
I understood the limitation. I will check before conversion to bfp8_b as your recommendation if |
@namhyeong-kim can we close this P0 issue? |
@jvasilje I am working on adding the tests and merging the PR today. |
@rdjogoTT Please advise when this is fully complete so we can close out this issue, thanks. |
### Ticket [Link to Github Issue](#14032) ### Problem description bfp8 packing is inaccurate when pack_src_format is also bpf8, since this results in double rounding in the HW. First the gasket rounds to 7 bits, then rounding occurs again when the mantissas are being shifted in order to have common exponent. ### What's changed Add a flag to compute config called `bfp_pack_precise` which toggles the pack_src_format to either fp16 or fp32 (depending on fp32_mode_en) in order to get more accurate output. This however will half the packer bandwidth in the case of fp16, and reduce it to one quarter in the case of fp32.
This has been resolved with PR: #14822. Can now be closed |
Describe the bug
The CPU and NPU produce different BFP8_B values from the same BF16 tensor.
To Reproduce
Steps to reproduce the behavior:
namhyeong/....
Link to branchtests/ttnn/unit_tests/test_bfp8_bf16_conversion.py
Link to test filetest_typecast_bf16_to_bfp8_b
fails, whiletest_typecast_bfp8_b_to_bf16
passes.Expected behavior
The CPU and NPU should produce the same BFP8_B values.
Screenshots
Please complete the following environment information:
Additional context
No additional context
The text was updated successfully, but these errors were encountered: