Use tag::any for int8 matmul weight desc to create pd #155

Xia-Weiwen · 2023-01-10T07:12:54Z

Summary
An issue was found that int8 matmul runs into ref:any kernel, which is very slow. It was found with stock PyTorch + onednn 3.0.

It is because dst scales have an impact on pd creation. When prepacking weight, dst scales are not set to create pd (int8 and fp32 share the same expected_weight_desc function). Then we can create a pd that gives weight desc in layout A.
But at runtime, dst scales are set and we specify weight layout A to create pd. Onednn may find that layout A is improper, and it finally runs into ref:jit kernel.

Now we use tag::any for weight desc to create pd at runtime regardless of the layout of prepacked weight. Then pd can give a better layout for weight. The prepacked weight will be reordered again on the first run.

Previously:

Prepack:
- Create pd with weight desc in tag::any and without info of src/dst scales/zero points.
- Get expected weight desc in layout A.
- Reorder weight to layout A.
Runtime (first run):
- Create pd with weight desc in layout A and with info of src/dst scales/zero points.
- Get expected weight desc still in layout A.
- Reorder not needed
- Weight layout is not expected for the case. Slow ref:any kernel is used.

Now:

Prepack (unchanged):
- Create pd with weight desc in tag::any and without info of src/dst scales/zero points.
- Get expected weight desc in layout A.
- Reorder weight to layout A.
Runtime (first run):
- Create pd with weight desc in tag::any and with info of src/dst scales/zero points.
- Get expected weight desc in layout B.
- Reorder weight from layout A to B if A and B are different.
- Weight layout is expected. A proper kernel is used.

Weight is only reordered on the first run. Later on, weight is always in layout B, which is expected.

Test plan

PyTorch UT.
INT8 OOB benchmark.

@jgong5 @XiaobingSuper @yanbing-j @leslie-fang-intel Please review. Thanks!

jgong5 · 2023-01-10T07:57:21Z

Weight is only reordered on the first run. Later on, weight is always in layout B, which is expected.

We didn't change the weight at runtime. Now, we have to do that? PyTorch code has to be adapted too, right?

Xia-Weiwen · 2023-01-10T08:02:03Z

Weight is only reordered on the first run. Later on, weight is always in layout B, which is expected.

We didn't change the weight at runtime. Now, we have to do that? PyTorch code has to be adapted too, right?

We are checking weight desc and reordering weight if needed after preparing now: https://github.com/pytorch/pytorch/blob/3726d232191088e8e7a9c1a2ab3244cdd9250bf2/aten/src/ATen/native/quantized/cpu/qlinear.cpp#L851
So, we do not have to change PT code.

Use tag::any for int8 matmul weight desc to create pd

336b332

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use tag::any for int8 matmul weight desc to create pd #155

Use tag::any for int8 matmul weight desc to create pd #155

Xia-Weiwen commented Jan 10, 2023

jgong5 commented Jan 10, 2023

Xia-Weiwen commented Jan 10, 2023

Use tag::any for int8 matmul weight desc to create pd #155

Are you sure you want to change the base?

Use tag::any for int8 matmul weight desc to create pd #155

Conversation

Xia-Weiwen commented Jan 10, 2023

jgong5 commented Jan 10, 2023

Xia-Weiwen commented Jan 10, 2023