Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use tag::any for int8 matmul weight desc to create pd #155

Open
wants to merge 1 commit into
base: ideep_dev
Choose a base branch
from

Conversation

Xia-Weiwen
Copy link

Summary
An issue was found that int8 matmul runs into ref:any kernel, which is very slow. It was found with stock PyTorch + onednn 3.0.

It is because dst scales have an impact on pd creation. When prepacking weight, dst scales are not set to create pd (int8 and fp32 share the same expected_weight_desc function). Then we can create a pd that gives weight desc in layout A.
But at runtime, dst scales are set and we specify weight layout A to create pd. Onednn may find that layout A is improper, and it finally runs into ref:jit kernel.

Now we use tag::any for weight desc to create pd at runtime regardless of the layout of prepacked weight. Then pd can give a better layout for weight. The prepacked weight will be reordered again on the first run.

Previously:

  • Prepack:
    • Create pd with weight desc in tag::any and without info of src/dst scales/zero points.
    • Get expected weight desc in layout A.
    • Reorder weight to layout A.
  • Runtime (first run):
    • Create pd with weight desc in layout A and with info of src/dst scales/zero points.
    • Get expected weight desc still in layout A.
    • Reorder not needed
    • Weight layout is not expected for the case. Slow ref:any kernel is used.

Now:

  • Prepack (unchanged):
    • Create pd with weight desc in tag::any and without info of src/dst scales/zero points.
    • Get expected weight desc in layout A.
    • Reorder weight to layout A.
  • Runtime (first run):
    • Create pd with weight desc in tag::any and with info of src/dst scales/zero points.
    • Get expected weight desc in layout B.
    • Reorder weight from layout A to B if A and B are different.
    • Weight layout is expected. A proper kernel is used.

Weight is only reordered on the first run. Later on, weight is always in layout B, which is expected.

Test plan

  • PyTorch UT.
  • INT8 OOB benchmark.

@jgong5 @XiaobingSuper @yanbing-j @leslie-fang-intel Please review. Thanks!

@jgong5
Copy link

jgong5 commented Jan 10, 2023

Weight is only reordered on the first run. Later on, weight is always in layout B, which is expected.

We didn't change the weight at runtime. Now, we have to do that? PyTorch code has to be adapted too, right?

@Xia-Weiwen
Copy link
Author

Weight is only reordered on the first run. Later on, weight is always in layout B, which is expected.

We didn't change the weight at runtime. Now, we have to do that? PyTorch code has to be adapted too, right?

We are checking weight desc and reordering weight if needed after preparing now: https://github.com/pytorch/pytorch/blob/3726d232191088e8e7a9c1a2ab3244cdd9250bf2/aten/src/ATen/native/quantized/cpu/qlinear.cpp#L851
So, we do not have to change PT code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants