Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need support for ttnn.max_pool2d to accept block and width sharded input. #12810

Closed
punithsekar opened this issue Sep 18, 2024 · 9 comments
Closed
Assignees

Comments

@punithsekar
Copy link
Contributor

Describe the bug
ttnn.max_pool2d supports only height_sharded input tensor. Need support for block_sharded and width_sharded input.

To Reproduce
Steps to reproduce the behavior:

  1. Checkout to branch punith/maxpool_issue
  2. Run command pytest tests/ttnn/integration_tests/yolov4/test_ttnn_neck.py

Expected behavior
To accept Block_sharded and width_sharded layout.

Screenshots

E       RuntimeError: TT_FATAL @ ../ttnn/cpp/ttnn/operations/pool/maxpool/max_pool2d.cpp:58: shard_scheme == TensorMemoryLayout::HEIGHT_SHARDED
E       info:
E       Only height sharded tensors are supported.
E       backtrace:
E        --- /home/ubuntu/punith/tt-metal/ttnn/ttnn/_ttnn.so(+0x458ec9) [0x7f090cd77ec9]

Please complete the following environment information:

  • Device - WH-n150

Additional context
The input shape which we pass to maxpool is 1,10,10,512[NHWC]. Since the Channels is higher it should happen in width or block sharding to increasing the performance.

Current values when we use height sharding,

pool_1 = ttnn.max_pool2d(
            input_tensor=output_tensor,
            batch_size=1,
            input_h=10,
            input_w=10,
            channels=512,
            kernel_size=[5, 5],
            stride=[1, 1],
            padding=[2, 2],
            dilation=[1, 1],
            device=device,
        )

Attributes:
{'memory_config_':'MemoryConfig(memory_layout=TensorMemoryLayout::HEIGHT_SHARDED;buffer_type=BufferType::L1;shard_spec=ShardSpec(grid={[(x=0;y=0) - (x=3;y=0)]};shape={25; 0};orientation=ShardOrientation::ROW_MAJOR;halo=0))'; 'output_dtype_': 'DataType::BFLOAT16'; 'sliding_window_config_': 'SlidingWindowConfig(batch_size=1; input_hw=(10;10); window_hw=(5;5); stride_hw=(1;1); pad_hw=(2;2); dilation_hw=(1;1); num_cores_nhw=4; core_range_set_={[(x=0;y=0) - (x=3;y=0)]})'}

Core_count: 4

Kernel duration: 1077197 ns

@punithsekar punithsekar added bug Something isn't working op_cat: maxpool2D yolov4 mcw_cst tasks done for mcw_cst collaboration labels Sep 18, 2024
@punithsekar punithsekar changed the title ttnn.max_pool2d only support height_sharded input tensor Need support for ttnn.max_pool2d to accept block and width sharded input. Sep 18, 2024
@punithsekar
Copy link
Contributor Author

fyi @saichandax

@dvartaniansTT
Copy link
Contributor

@mywoodstock is there a plan to support this towards yolov4 optimization efforts? cc: @mbahnasTT

@mywoodstock
Copy link
Contributor

@mywoodstock is there a plan to support this towards yolov4 optimization efforts? cc: @mbahnasTT

Yes, the PR is nearly ready to be merged

@mywoodstock
Copy link
Contributor

This is now in main

@dvartaniansTT
Copy link
Contributor

thanks for the update @mywoodstock ! great news! We will test this on yolov4 and once confirmed we can close this issue.
@punithsekar please test this asap and let's see how it improves perf for yolov4.

@punithsekar
Copy link
Contributor Author

punithsekar commented Oct 24, 2024

@mywoodstock @dvartaniansTT, I am able to pass block-sharded input to the maxpool, and the execution is happening without any issue. However, the PCC of output coming from maxpool is very low(~0.055). I have create separate issue #14206 for it.

@dvartaniansTT
Copy link
Contributor

@mywoodstock is this on your radar?
@punithsekar does this mean we are running at almost 0 pcc end to end now?

@mywoodstock
Copy link
Contributor

@dvartaniansTT Yes, its being worked on: #14249

@punithsekar
Copy link
Contributor Author

punithsekar commented Oct 30, 2024

@dvartaniansTT , Yes, we are getting almost 0 pcc. The bug is tracked in #14249 issue as Abhinav mentioned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants