Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pow operation needs support for Tensor exponent variant #13857

Open
Tracked by #13521 ...
KalaivaniMCW opened this issue Oct 16, 2024 · 13 comments
Open
Tracked by #13521 ...

Pow operation needs support for Tensor exponent variant #13857

KalaivaniMCW opened this issue Oct 16, 2024 · 13 comments
Assignees
Labels

Comments

@KalaivaniMCW
Copy link
Contributor

KalaivaniMCW commented Oct 16, 2024

At present, ttnn.pow supports Tensor input , scalar exponent.
For Pytorch tracing #13373, we need support for ttnn.pow with Tensor input , Tensor exponent.
https://github.com/tenstorrent/pytorch2.0_ttnn/blob/main/docs/operations/aten.pow.Tensor_Tensor.md

Is this support possible ?

cc: @eyonland

@eyonland
Copy link
Contributor

@rtawfik01 , we need a llk for power to support having the exponents in a tile. I think we need to have #13857 where we get another destination register perhaps?

@rtawfik01
Copy link
Contributor

rtawfik01 commented Oct 21, 2024

fyi @ttmtrajkovic , this is similar to the issue of: #13582
Where an SFPU functions needs to support the ability to work on 2 inputs.
@eyonland please assign a priority for this, also can you please make a master issue linking all of these issues together?
So far I see these 3 ops:

  1. binary pow operation: Pow operation needs support for Tensor exponent variant #13857
  2. Binary bitwise operation: Binary Bitwise ops #13582
  3. Binary shift operations: Binary Shift operators #10034

These all have the issue issue of needing SFPU to work with binary inputs

@KalaivaniMCW
Copy link
Contributor Author

This work is critical for models being developed from Pytorch2 and hence the P0 status

@ttmtrajkovic
Copy link
Contributor

@KalaivaniMCW, @eyonland,

Please list explicitly which ops is this blocking. I don't mind the P0 status but we need to differentiate between levels of blocking - is this blocking an entire project or several ops?

@eyonland
Copy link
Contributor

We are blocked specifically on this op here: https://docs.tenstorrent.com/tt-metalium/latest/tt_metal/apis/kernel_apis/compute/power_tile.html
Notice that it does takes the second argument as a scalar. We need instead to have another dst register argument. We need to raise the values of the first tensor to the power of the values in the second tensor.

@umadevimcw umadevimcw added the Op Generalization Generalization and relaxations of requirements in Ops label Oct 29, 2024
@KalaivaniMCW
Copy link
Contributor Author

Hi @ttmtrajkovic ,
We are blocked on ttnn.pow op for Pytorch sweep tracing and Op generality
https://github.com/tenstorrent/pytorch2.0_ttnn/blob/main/docs/operations/aten.pow.Scalar.md
https://github.com/tenstorrent/pytorch2.0_ttnn/blob/main/docs/operations/aten.pow.Tensor_Tensor.md
At present the pass % is 0 since we have no support for Tensor exponent.

@jvasilje
Copy link
Collaborator

jvasilje commented Nov 4, 2024

hey @ttmtrajkovic, we are reviewing eltwise blockers for generality. It's looking like this will be the top blocker, and we likely want to escalate the need for this feature to get unblocked. Just heads up. We will update here again in a few hours.

@ttmtrajkovic ttmtrajkovic assigned rdjogoTT and unassigned rdjogoTT and ttmtrajkovic Nov 6, 2024
@rdjogoTT
Copy link
Contributor

rdjogoTT commented Nov 7, 2024

I've started my assessment for this issue. We know that this depends on a new op class - Binary SFPU OPs, and that we will also need to work on a new algorithm for implementing eltwise-pow with tensor base and exponent. More updates to follow shortly

@rdjogoTT
Copy link
Contributor

With the latest 2 commits on my branch rd/binary_sfpu_pow I've added the supporting LLK code for general binary SFPU OPs, as well as implemented the LLKs for a few OPs. These include Add, Sub, Mul, and I've also included the preliminary work for Pow (gives PCC ~= 0.998 for non-negative operands, negative case not handled yet). Also, in the second commit, I include the changes to tt-metal that I made to be able to test my code, including a modified python test, program factory file, and compute kernel. I basically hijacked the regular eltwise binary ops we have, to run them on SFPU instead.

The LLK APIs for the basic binary OPs (Add, Sub, Mul) are ready in tt_metal/include/compute_kernel_api/eltwise_binary_sfpu.h, but more work is needed for Pow before it is ready. To get this completed faster, I think someone from the tt-metal side should help implement the necessary changes to support this new OP class, while I focus on the Pow algorithm. They can test functionality using the basic OPs like Add, or even with the current Pow. @eyonland @jvasilje, do either of you know who I can ask to take over the higher level changes required for SFPU Binary OP support?

@eyonland
Copy link
Contributor

@KalaivaniMCW , @umadevimcw & @VirdhatchaniKN , let's plan this out.

@KalaivaniMCW
Copy link
Contributor Author

KalaivaniMCW commented Nov 22, 2024

Hi @rdjogoTT , @eyonland ,
The current eltwise binary does not use SFPU ( I guess it uses FPU ? ), but it does have other additional implementations like pre and post activation scaling for input/output.

Should we modify the current implementation in ttnn/operations/eltwise/binary or create a new implementation to use the new binary SFPU OPs ? something like ttnn/operations/eltwise/binary_sfpu and test it out and see how it goes with the models ?

Do we modify this ttnn/cpp/ttnn/operations/eltwise/binary/device/kernels/compute/eltwise_binary_kernel.cpp like done in rd/binary_sfpu_pow (and include the additional implementations) or write a new one for binary SFPU OPs ?

@rdjogoTT
Copy link
Contributor

Yes, current eltwise binary OPs are all on FPU. I don't know the answer to your questions, I think the decisions need to be made from the tt-metal side depending on the requirements for this new OP class.

One additional question that may be relevant to deciding how to implement this is how to handle users calling one of the eltwise binary OPs like ttnn.add with FP32 dataformat. How will we decide to call on the binary SFPU implementation which can support full FP32 accuracy while being slower, versus calling the FPU implementation which loses precision due to converting to TF32.

@rdjogoTT
Copy link
Contributor

rdjogoTT commented Nov 26, 2024

Pow is moving along well, I get PCC of >0.99998 for x in [0,10] and y in [-5, 5] for x^y. I am now looking into the special cases that occur when x<0, which sometimes causes NaN for torch.pow(). @KalaivaniMCW @eyonland do either of you know which spec I should follow for these special cases of pow? i.e. when dealing with a negative base or NaN or INF base/exponent? Also do you have an ETA for the tt-metal side changes?

Also, @KalaivaniMCW can I ask that as part of your change you also add the changes needed for doing the SPFU binary Pow OP? I think it would be called as ttnn.pow(x,y) but with tensor for y.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants