[Bug Report] CPU and NPU produce different BFP8_B values from the same BF16 tensor. #14032

namhyeong-kim · 2024-10-21T09:29:34Z

Describe the bug
The CPU and NPU produce different BFP8_B values from the same BF16 tensor.

To Reproduce
Steps to reproduce the behavior:

go to branch namhyeong/.... Link to branch
run tests/ttnn/unit_tests/test_bfp8_bf16_conversion.py Link to test file
1. test_typecast_bf16_to_bfp8_b fails, while test_typecast_bfp8_b_to_bf16 passes.

Expected behavior
The CPU and NPU should produce the same BFP8_B values.

Screenshots

Please complete the following environment information:

OS: [Ubuntu 20.04]

Additional context
No additional context

The text was updated successfully, but these errors were encountered:

rdjogoTT · 2024-10-21T23:36:47Z

I have started investigating and should have an update by tomorrow.

rdjogoTT · 2024-10-22T21:10:41Z

I've managed to trace the data through the HW path it takes, and it looks like the issue occurs in the Packer. I will be investigating this further

rdjogoTT · 2024-10-24T18:40:53Z

The issue comes from how the packer handles bfp8_b datums, with rounding and shifting with rounding again. I'm looking into best course of action now.

Can you please provide some context to why the values need to match exactly? Is this just a bug report or is the accuracy needed somewhere specific? One potential fix would come at the expense of perf due to not using full packer bandwidth, so it would help to know the application.

namhyeong-kim · 2024-10-25T07:28:11Z

We will use BFP8_B for training large language models. I can't fully anticipate the model-level side effects caused by conversion inconsistencies between the CPU and NPU. Could you implement an option for precise conversion, even if it slightly reduces packer bandwidth?
CPU converts correctly, while the NPU converts approximately. Is that correct?

rdjogoTT · 2024-10-25T14:27:53Z

Yes, the CPU is producing the more accurate values in this case. Ok I will look into how that can be done.

jvasilje · 2024-10-30T15:16:35Z

@rdjogoTT any updates on this P0 bug?

rdjogoTT · 2024-10-30T16:32:12Z

I have pushed a branch origin/rd/bfp8b_pack with the workaround we found. It involves changing the pack src format for bfp8b to fp16b instead, which would cut packer bandwidth in half for WHB0 (but shouldn't affect BH).

This raises the question of how to handle this tradeoff. The way to decide would be to try training with bfp8b and see whether or not the accuracy is needed/tradeoff is worth it.

namhyeong-kim · 2024-11-04T22:32:06Z

It looks that I cannot apply the accurate conversion in program-object-level. Could you make an option that toggles this feature?

rdjogoTT · 2024-11-05T16:50:23Z

Ok, I will add an option ASAP.

rdjogoTT · 2024-11-06T21:54:58Z

#14822 contains the new flag bfp8_pack_precise which allows you to toggle the feature. Setting it to true in the ComputeConfig will result in the test above passing. I will work to merge this change quickly.

In the case of typecast, this is how it can be set to True:

auto eltwise_unary_kernel_group_1_id = tt::tt_metal::CreateKernel(
        program,
        "tt_metal/kernels/compute/eltwise_sfpu.cpp",
        core_group_1,
        tt::tt_metal::ComputeConfig{
            .math_fidelity = MathFidelity::HiFi4,
            .fp32_dest_acc_en = args.fp32_dest_acc_en,
            .unpack_to_dest_mode = unpack_to_dest_mode,
            .bfp8_pack_precise = true, // <----------- add this line
            .math_approx_mode = math_approx_mode,
            .compile_args = compute_kernel_args_group_1,
            .defines = unary_defines});

namhyeong-kim · 2024-11-07T01:04:02Z

Thank you! Could I check the feature in various input?
The test test_typecast_bf16_to_bfp8_b only demonstrates the existence of the bug; it doesn't have sufficient coverage.

rdjogoTT · 2024-11-07T01:11:15Z

The feature should enable fp16b or fp32 dest to be packed out accurately. Yes, please test it with various inputs.

namhyeong-kim · 2024-11-07T01:14:59Z

What is enabling fp16b?

rdjogoTT · 2024-11-07T01:20:37Z

Sorry I should have clarified - typecast works by unpacking the fp16b to Dest and using the packer to convert to bfp8b. Similar with fp32 to bfp8b typecast. This bug was caused by that packer conversion, so this feature should enable accurate conversion now.

rdjogoTT · 2024-11-07T16:45:16Z

Hi @namhyeong-kim, can you please share your tests with me when you make them. I would like to include them with my PR: #14822 before merging to main.

namhyeong-kim · 2024-11-08T00:34:38Z

Ok. I will share it

namhyeong-kim · 2024-11-08T01:19:22Z

I pushed the tests to my branch that is rebased on your branch.
Every test in test_typecast_bf16_to_bfp8_b_various_input passes.
But when insert_nan is true in test_typecast_bf16_to_bfp8_b_with_inf_nan, the test fails.
cpu makes nan but npu makes -inf.

idx=(tensor(0), tensor(1)) cpu=nan npu=-inf
idx=(tensor(0), tensor(4)) cpu=nan npu=-inf
idx=(tensor(0), tensor(18)) cpu=nan npu=-inf
idx=(tensor(0), tensor(29)) cpu=nan npu=-inf
idx=(tensor(0), tensor(35)) cpu=nan npu=-inf

Could you check these failed tests?

rdjogoTT · 2024-11-08T19:18:22Z

I've found that the issue is again to do with the packer, will post further updates soon.

rdjogoTT · 2024-11-08T21:03:08Z

Tensix team has confirmed that we have had issues in the past as well with these special values. Unfortunately nothing can be done from the HW/LLK side to remedy this.

LLK recommendation is to make the checks for special values aware that NaN in Bfp8_b may be converted to -Inf. If there is a case that you must know if there is a NaN, maybe check before converting to Bfp8_b?

namhyeong-kim · 2024-11-09T00:14:37Z

I understood the limitation. I will check before conversion to bfp8_b as your recommendation if nan should be handled.
Lastly please double check the tests that I provided have enough coverage.

jvasilje · 2024-11-12T16:10:47Z

@namhyeong-kim can we close this P0 issue?

rdjogoTT · 2024-11-12T16:12:01Z

@jvasilje I am working on adding the tests and merging the PR today.

zzigler-tt · 2024-11-13T05:58:45Z

@rdjogoTT Please advise when this is fully complete so we can close out this issue, thanks.

### Ticket [Link to Github Issue](#14032) ### Problem description bfp8 packing is inaccurate when pack_src_format is also bpf8, since this results in double rounding in the HW. First the gasket rounds to 7 bits, then rounding occurs again when the mantissas are being shifted in order to have common exponent. ### What's changed Add a flag to compute config called `bfp_pack_precise` which toggles the pack_src_format to either fp16 or fp32 (depending on fp32_mode_en) in order to get more accurate output. This however will half the packer bandwidth in the case of fp16, and reduce it to one quarter in the case of fp32.

rdjogoTT · 2024-11-13T19:09:29Z

This has been resolved with PR: #14822. Can now be closed

namhyeong-kim added bug Something isn't working moreh moreh contribution labels Oct 21, 2024

razorback3 added the P1 label Oct 21, 2024

ayerofieiev-tt assigned rdjogoTT Oct 21, 2024

rdjogoTT added WH LLK labels Oct 22, 2024

razorback3 added P0 and removed P1 labels Oct 24, 2024

rdjogoTT added the blackhole label Oct 25, 2024

rdjogoTT added a commit that referenced this issue Oct 30, 2024

#14032: change bfp8 pack src format to fp16

eb378e3

rdjogoTT added a commit that referenced this issue Nov 6, 2024

#14032: change bfp8 pack src format to fp16

5209f9d

rdjogoTT added a commit that referenced this issue Nov 6, 2024

#14032: add bfp_pack_precise flag to compute config

73b2dd2

rdjogoTT mentioned this issue Nov 6, 2024

Increase packer precision for bfp8 formats #14822

Merged

6 tasks

rdjogoTT added a commit that referenced this issue Nov 12, 2024

#14032: change bfp8 pack src format to fp16

1f1fef4

rdjogoTT added a commit that referenced this issue Nov 12, 2024

#14032: add bfp_pack_precise flag to compute config and tests

1a7fb41

rdjogoTT added a commit that referenced this issue Nov 12, 2024

#14032: move new tests with typecast test

cac7612

rdjogoTT added a commit that referenced this issue Nov 12, 2024

#14032: change bfp8 pack src format to fp16

a3ac7ad

rdjogoTT added a commit that referenced this issue Nov 12, 2024

#14032: add bfp_pack_precise flag to compute config and tests

280abd0

rdjogoTT added a commit that referenced this issue Nov 12, 2024

#14032: move new tests with typecast test

0e546b0

rdjogoTT added a commit that referenced this issue Nov 12, 2024

#14032: set bfp8_pack_precise based on op and dtype

bd4a5e9

rdjogoTT added a commit that referenced this issue Nov 13, 2024

#14032: change bfp8 pack src format to fp16

148e8db

rdjogoTT added a commit that referenced this issue Nov 13, 2024

#14032: add bfp_pack_precise flag to compute config and tests

7bc8274

rdjogoTT added a commit that referenced this issue Nov 13, 2024

#14032: move new tests with typecast test

3a3b034

rdjogoTT added a commit that referenced this issue Nov 13, 2024

#14032: set bfp8_pack_precise based on op and dtype

abcbdf0

rdjogoTT added a commit that referenced this issue Nov 13, 2024

#14032: init bfp8_pack_precise to false

9582534

rdjogoTT added a commit that referenced this issue Nov 13, 2024

#14032: change bfp8 pack src format to fp16

0501fe4

rdjogoTT added a commit that referenced this issue Nov 13, 2024

#14032: add bfp_pack_precise flag to compute config and tests

c4b37c7

rdjogoTT added a commit that referenced this issue Nov 13, 2024

#14032: move new tests with typecast test

9f8c896

rdjogoTT added a commit that referenced this issue Nov 13, 2024

#14032: set bfp8_pack_precise based on op and dtype

ee9dd90

rdjogoTT added a commit that referenced this issue Nov 13, 2024

#14032: init bfp8_pack_precise to false

eeb5e52

rdjogoTT closed this as completed Nov 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug Report] CPU and NPU produce different BFP8_B values from the same BF16 tensor. #14032

[Bug Report] CPU and NPU produce different BFP8_B values from the same BF16 tensor. #14032

namhyeong-kim commented Oct 21, 2024

rdjogoTT commented Oct 21, 2024

rdjogoTT commented Oct 22, 2024

rdjogoTT commented Oct 24, 2024

namhyeong-kim commented Oct 25, 2024

rdjogoTT commented Oct 25, 2024

jvasilje commented Oct 30, 2024

rdjogoTT commented Oct 30, 2024 •

edited

Loading

namhyeong-kim commented Nov 4, 2024

rdjogoTT commented Nov 5, 2024

rdjogoTT commented Nov 6, 2024

namhyeong-kim commented Nov 7, 2024

rdjogoTT commented Nov 7, 2024

namhyeong-kim commented Nov 7, 2024 •

edited

Loading

rdjogoTT commented Nov 7, 2024

rdjogoTT commented Nov 7, 2024

namhyeong-kim commented Nov 8, 2024

namhyeong-kim commented Nov 8, 2024 •

edited

Loading

rdjogoTT commented Nov 8, 2024

rdjogoTT commented Nov 8, 2024

namhyeong-kim commented Nov 9, 2024 •

edited

Loading

jvasilje commented Nov 12, 2024

rdjogoTT commented Nov 12, 2024

zzigler-tt commented Nov 13, 2024

rdjogoTT commented Nov 13, 2024

[Bug Report] CPU and NPU produce different BFP8_B values from the same BF16 tensor. #14032

[Bug Report] CPU and NPU produce different BFP8_B values from the same BF16 tensor. #14032

Comments

namhyeong-kim commented Oct 21, 2024

rdjogoTT commented Oct 21, 2024

rdjogoTT commented Oct 22, 2024

rdjogoTT commented Oct 24, 2024

namhyeong-kim commented Oct 25, 2024

rdjogoTT commented Oct 25, 2024

jvasilje commented Oct 30, 2024

rdjogoTT commented Oct 30, 2024 • edited Loading

namhyeong-kim commented Nov 4, 2024

rdjogoTT commented Nov 5, 2024

rdjogoTT commented Nov 6, 2024

namhyeong-kim commented Nov 7, 2024

rdjogoTT commented Nov 7, 2024

namhyeong-kim commented Nov 7, 2024 • edited Loading

rdjogoTT commented Nov 7, 2024

rdjogoTT commented Nov 7, 2024

namhyeong-kim commented Nov 8, 2024

namhyeong-kim commented Nov 8, 2024 • edited Loading

rdjogoTT commented Nov 8, 2024

rdjogoTT commented Nov 8, 2024

namhyeong-kim commented Nov 9, 2024 • edited Loading

jvasilje commented Nov 12, 2024

rdjogoTT commented Nov 12, 2024

zzigler-tt commented Nov 13, 2024

rdjogoTT commented Nov 13, 2024

rdjogoTT commented Oct 30, 2024 •

edited

Loading

namhyeong-kim commented Nov 7, 2024 •

edited

Loading

namhyeong-kim commented Nov 8, 2024 •

edited

Loading

namhyeong-kim commented Nov 9, 2024 •

edited

Loading