TTNN pow - fail when input tensor BFLOAT8_B and exponent is scalar float #8593

npetrovic-tenstorrent · 2024-05-17T08:53:47Z

ttnn.pow operation fails in cases when the first argument is BFLOAT8_B tensor and second is float scalar:

TT_FATAL 
@ ../tt_eager/tt_dnn/op_library/bcast/bcast_op.cpp:94: input_tensor_a.get_dtype() == input_tensor_b.get_dtype()

To Reproduce
Steps to reproduce the behavior:
Checkout main branch and run unit test test_eltwise_pow_float.py (or others) using this command pattern:

pytest tests/ttnn/python_api_testing/non_working_unit_tests/grayskull/test_eltwise_pow_float.py

Expected behavior
There are three test cases presented in the unit test -two of them where the input tensor is BFLOAT8_B type fail: tests/tt_eager/python_api_testing/non_working_unit_tests/grayskull/test_eltwise_pow_float.py and they are expected to fail with error

../tt_eager/tt_dnn/op_library/bcast/bcast_op.cpp:94: input_tensor_a.get_dtype() == input_tensor_b.get_dtype()

Getting Additional info for the operation under test and its behavior
To get additional information and results for different combinations of input shapes, types, layouts and memory configs for which this operation was tested you can also run locally sweep test:

tests/ttnn/python_api_testing/sweep_tests/test_configs/ci_sweep_tests_broken/grayskull/ttnn_eltwise_pow_float_test.yaml

To do this you should:

Follow the Getting Started page to setup the repo, environment variables and python-env
Activate source build/python_env/bin/activate
Run sweeps by using pytest tests/ttnn/python_api_testing/sweep_tests/run_sweep_test.py --input-path tests/ttnn/python_api_testing/sweep_tests/test_configs/ci_sweep_tests_broken/grayskull/ttnn_eltwise_pow_float_test.yaml --input-method cli --cli-input results_pow_float_broken
After the run is completed all test sweeps results should be available inside specified output directory (in this case ./result-sweeps). There you will find .csv which holds all executed sweeps, among which you can also find the ones that failed and were recreated by the unit test, which you can get by searching unique data_seed field.

The text was updated successfully, but these errors were encountered:

umadevimcw · 2024-10-16T13:53:17Z

@npetrovic-tenstorrent @eyonland I tested this in the recent main , it is passing in grayskull and WHB0.

When I repeated the test n times, I observed inconsistencies: out of 100 tests, sometimes 3 tests fail, while other times only 1 fails. To analyze this, I’ve hardcoded specific input and scale values that cause the failures in the code below for testing purposes.

def run_pow_tests(input_shape, dtype, dlayout, in_mem_config, output_mem_config, data_seed, device):
    torch.manual_seed(data_seed)

    x = torch.Tensor(size=input_shape[0]).uniform_(-100, 100).to(torch.bfloat16)
    y = 7.003021497060542 #random.uniform(0, 10)
    
    x.fill_(-59.50000)
    
    print("===================================================>>>>>>>>> scale  >>>>>>>>>>>>>>>>>>>>>>>...", y)                  ########## print 0
    print("Torch result......",torch.pow(torch.tensor(-59.50000), y)).    ########## print 1
    try:
        # get ref result
        ref_value = torch.pow(x, y)                   

        x = ttnn_ops.setup_ttnn_tensor(x, device, dlayout[0], in_mem_config, dtype[0])

        tt_result = ttnn.pow(x, y)
        tt_result = ttnn_ops.ttnn_tensor_to_torch(tt_result, output_mem_config) 

    except Exception as e:
        logger.warning(f"Operation execution crashed")
        raise e

    assert len(tt_result.shape) == len(ref_value.shape)
    assert tt_result.shape == ref_value.shape
    torch.set_printoptions(sci_mode=False)
    print("reference torch result", ref_value)  ########## print 2
    print("TT result ", tt_result)  ########## print 3
    print("input.....", x)  ########## print 4
    assert_with_pcc(ref_value, tt_result, 0.99)

During my analysis, I noticed that the Torch results are inconsistent across different runs, while the TT results are as expected. The expression pow(-x) produces a NaN in the TT results, but in the Torch reference values, we get an unexpectedly large value. Please see the attached image for reference.

Note: In the image above, you can correlate the ######## print x in the code with the -------> print x in the image for clarity.

Even though ref_value = torch.pow(x, y) and torch.pow(torch.tensor(-59.50000), y) are performing the same operation on the same values, the results differ, TT result returns NaN for negative numbers (which is the expected result), which contributes to the drop in PCC.

It is observed on both WHB0 and GS .

umadevimcw · 2024-10-16T14:24:08Z

@eyonland @npetrovic-tenstorrent

The reason for undefined behaviour is the bfloat conversion that we are doing for data generation

Please find the below image for reference.

umadevimcw · 2024-10-16T15:31:37Z

#13874 - Please find the PR here

umadevimcw · 2024-10-24T08:10:56Z

PR merged to main. Hence closing this issue

npetrovic-tenstorrent added bug Something isn't working GS WH labels May 17, 2024

umadevimcw added the op_cat: eltwise label Jul 24, 2024

eyonland assigned umadevimcw Sep 30, 2024

eyonland mentioned this issue Oct 15, 2024

Eltwise Master Tracking #13795

Open

umadevimcw added a commit that referenced this issue Oct 16, 2024

#8593: Fix pow pcc drop issue

ed19ce0

umadevimcw mentioned this issue Oct 16, 2024

Power op PCC drop issue #13874

Merged

1 task

umadevimcw added a commit that referenced this issue Oct 16, 2024

#8593: Fix pow pcc drop issue

588b982

umadevimcw added a commit that referenced this issue Oct 21, 2024

#8593: Fix pow pcc drop issue

c7ea982

umadevimcw added a commit that referenced this issue Oct 22, 2024

#8593: Fix pow pcc drop issue

86f84f6

umadevimcw added a commit that referenced this issue Oct 24, 2024

#8593: Fix pow pcc drop issue

bea5725

umadevimcw added a commit that referenced this issue Oct 24, 2024

#8593: Fix pow pcc drop issue

576b9ed

umadevimcw added a commit that referenced this issue Oct 24, 2024

#8593: Fix pow pcc drop issue

fe9560b

umadevimcw closed this as completed Oct 24, 2024

KalaivaniMCW mentioned this issue Nov 4, 2024

#14336: add bfloat8b support for _like creation ops #14346

Merged

6 tasks

ct-clmsn pushed a commit to ct-clmsn/tt-metal that referenced this issue Nov 12, 2024

tenstorrent#8593: Fix pow pcc drop issue

3b875ad

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TTNN pow - fail when input tensor BFLOAT8_B and exponent is scalar float #8593

TTNN pow - fail when input tensor BFLOAT8_B and exponent is scalar float #8593

npetrovic-tenstorrent commented May 17, 2024 •

edited

Loading

umadevimcw commented Oct 16, 2024 •

edited

Loading

umadevimcw commented Oct 16, 2024 •

edited

Loading

umadevimcw commented Oct 16, 2024

umadevimcw commented Oct 24, 2024

TTNN pow - fail when input tensor BFLOAT8_B and exponent is scalar float #8593

TTNN pow - fail when input tensor BFLOAT8_B and exponent is scalar float #8593

Comments

npetrovic-tenstorrent commented May 17, 2024 • edited Loading

umadevimcw commented Oct 16, 2024 • edited Loading

umadevimcw commented Oct 16, 2024 • edited Loading

umadevimcw commented Oct 16, 2024

umadevimcw commented Oct 24, 2024

npetrovic-tenstorrent commented May 17, 2024 •

edited

Loading

umadevimcw commented Oct 16, 2024 •

edited

Loading

umadevimcw commented Oct 16, 2024 •

edited

Loading