Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GEMM kernel complied by AM execute incorrectly in XS-GEM5 #183

Open
DCliuzhe opened this issue Oct 12, 2024 · 3 comments
Open

GEMM kernel complied by AM execute incorrectly in XS-GEM5 #183

DCliuzhe opened this issue Oct 12, 2024 · 3 comments

Comments

@DCliuzhe
Copy link

I have implemented a GEMM kerenl using RVV and complie it into a bare metal using AM. Before simulation, I deleted the function calls that were not aligned with the RTL and depended on the vector destination register fake data in the issue_queue.cc file. However, the output matrix elements in GEM5 simulation results are all 0, which are expected to 128. I also simulated it on the original GEM5, and the result was correct. My source code are as followed:

#include <riscv_vector.h>
#include <stdio.h>
#include <stdlib.h>
#include <klib.h>

void matmul(float *a, float *b, float *c, int M, int N, int K) {
  for (int i = 0; i < M; ++i) {
    for (int j = 0; j < N; ++j) {
        int k = 0;
        c[i * N + j] = 0;
        for(size_t vl; k < K; ){
            vl = __riscv_vsetvl_e32m1(K - k);  //动态获取向量长度
            vfloat32m1_t vec_a = __riscv_vle32_v_f32m1(&a[i * K + k], vl);  
            vfloat32m1_t vec_b = __riscv_vle32_v_f32m1(&b[j * K + k], vl);  //加载a向量和b向量
            vfloat32m1_t vec_s = __riscv_vfmul_vv_f32m1(vec_a, vec_b, vl); //做向量点乘
            vfloat32m1_t vsum = __riscv_vfredusum_vs_f32m1_f32m1(vec_s, __riscv_vfmv_s_f_f32m1(0.0f, vl), vl); //进行向量规约和
            float sum = __riscv_vfmv_f_s_f32m1_f32(vsum); //获得部分和结果
            printf("sum : %f \n", sum);
            c[i * N + j] += sum;
            k += vl;
        }
    }
  }
}

int main() {
    int M = 16;
    int N = 16;
    int K = 64;

    float *a = (float*)malloc(M * K * sizeof(float));
    float *b = (float*)malloc(N * K * sizeof(float));
    float *c = (float*)malloc(M * N * sizeof(float));

    for(int i = 0; i < M * K; i ++)
        a[i] = 1.0f;
    
    for(int i = 0; i < N * K; i ++)
        b[i] = 2.0f;
    
    
    matmul(a, b, c, M, N, K);

    for(int i = 0; i < M * N; i ++)
        printf("%f ", c[i]);
    
    return 0;
}

Through staged debugging, I determined that the problem occurred in the step of vector reduction. How can I fix it?

@DCliuzhe
Copy link
Author

DCliuzhe commented Oct 14, 2024

I opened difftest and tested it, and found that GEM5 made an error when executing the vfredusum instruction. GEM5 split it into two micro_ops for execution. The result was completely determined by the second micro_op, and the wrong result was obtained. I would like to ask why it was split into two micro_ops and the execution was wrong.

heap start = 8000a000
build/RISCV/cpu/base.cc:970: warn: Inst [sn:18228] pc: 0x800001b8, msg: [sn:18228 pc:0x800001b8] vfredusum_vs_micro v24, v25,vtmp0, res: 0000000000000000_0000000000000000
build/RISCV/cpu/base.cc:972: warn: May be diff at v24
 Ref  value: ffffffffffffffff_ffffffff41000000
 GEM5 value: 0000000000000000_0000000000000000
build/RISCV/cpu/base.hh:699: warn: In CPU0: NEMU PC: 0x800001b8, GEM5 PC: 0x800001b8, inst: vfredusum_vs_micro v24, v25,vtmp0
  $0: 0x0000000000000000   ra: 0x0000000080000268   sp: 0x0000000080009fa0   gp: 0x0000000000000000 
  tp: 0x0000000000000000   t0: 0x0000000000000000   t1: 0x000000008000a000   t2: 0x0000000000000000 
  s0: 0x0000000000000000   s1: 0x0000000000000010   a0: 0x000000008000c000   a1: 0x0000000000000040 
  a2: 0x000000008000a000   a3: 0x000000008000b000   a4: 0x0000000000000000   a5: 0x0000000000000004 
  a6: 0x0000000000000000   a7: 0x0000000000000000   s2: 0x000000008000c000   s3: 0x0000000000000010 
  s4: 0x0000000000000010   s5: 0x0000000000000000   s6: 0x0000000000000000   s7: 0x0000000000000000 
  s8: 0x0000000000000000   s9: 0x0000000000000000  s10: 0x0000000000000000  s11: 0x0000000000000000 
  t3: 0x000000008000b000   t4: 0x0000000000000000   t5: 0x000000008000c040   t6: 0x0000000000000040 
 ft0: 0xffffffff00000000  ft1: 0xffffffff00000000  ft2: 0xffffffff00000000  ft3: 0xffffffff00000000 
 ft4: 0xffffffff00000000  ft5: 0xffffffff00000000  ft6: 0xffffffff00000000  ft7: 0xffffffff00000000 
 fs0: 0xffffffff00000000  fs1: 0xffffffff00000000  fa0: 0xffffffff00000000  fa1: 0xffffffff00000000 
 fa2: 0xffffffff00000000  fa3: 0xffffffff00000000  fa4: 0xffffffff00000000  fa5: 0xffffffff00000000 
 fa6: 0xffffffff00000000  fa7: 0xffffffff00000000  fs2: 0xffffffff00000000  fs3: 0xffffffff00000000 
 fs4: 0xffffffff00000000  fs5: 0xffffffff00000000  fs6: 0xffffffff00000000  fs7: 0xffffffff00000000 
 fs8: 0xffffffff00000000  fs9: 0xffffffff00000000 fs10: 0xffffffff00000000 fs11: 0xffffffff00000000 
 ft8: 0xffffffff00000000  ft9: 0xffffffff00000000 ft10: 0xffffffff00000000 ft11: 0xffffffff00000000 
pc: 0x00000000800001bc mstatus: 0x8000000a00006600 mcause: 0x0000000000000000 mepc: 0x0000000000000000
                       sstatus: 0x8000000200006600 scause: 0x0000000000000000 sepc: 0x0000000000000000
satp: 0x0000000000000000
mip: 0x0000000000000000 mie: 0x0000000000000000 mscratch: 0x0000000000000000 sscratch: 0x0000000000000000
mideleg: 0x0000000000000000 medeleg: 0x0000000000000000
mtval: 0x0000000000000000 stval: 0x0000000000000000 mtvec: 0x0000000000000000 stvec: 0x0000000000000000
fcsr: 0x0000000000000000
privilege mode:3
pmp: 16 entries active, details:
 0: cfg:0x00 addr:0x0000000000000000| 1: cfg:0x00 addr:0x0000000000000000
 2: cfg:0x00 addr:0x0000000000000000| 3: cfg:0x00 addr:0x0000000000000000
 4: cfg:0x00 addr:0x0000000000000000| 5: cfg:0x00 addr:0x0000000000000000
 6: cfg:0x00 addr:0x0000000000000000| 7: cfg:0x00 addr:0x0000000000000000
 8: cfg:0x00 addr:0x0000000000000000| 9: cfg:0x00 addr:0x0000000000000000
10: cfg:0x00 addr:0x0000000000000000|11: cfg:0x00 addr:0x0000000000000000
12: cfg:0x00 addr:0x0000000000000000|13: cfg:0x00 addr:0x0000000000000000
14: cfg:0x00 addr:0x0000000000000000|15: cfg:0x00 addr:0x0000000000000000
v0 : 0x0000000000000000_0000000000000000  v1 : 0x0000000000000000_0000000000000000  
v2 : 0x0000000000000000_0000000000000000  v3 : 0x0000000000000000_0000000000000000  
v4 : 0x0000000000000000_0000000000000000  v5 : 0x0000000000000000_0000000000000000  
v6 : 0x0000000000000000_0000000000000000  v7 : 0x0000000000000000_0000000000000000  
v8 : 0x0000000000000000_0000000000000000  v9 : 0x0000000000000000_0000000000000000  
v10: 0x0000000000000000_0000000000000000  v11: 0x0000000000000000_0000000000000000  
v12: 0x0000000000000000_0000000000000000  v13: 0x0000000000000000_0000000000000000  
v14: 0x0000000000000000_0000000000000000  v15: 0x0000000000000000_0000000000000000  
v16: 0x0000000000000000_0000000000000000  v17: 0x0000000000000000_0000000000000000  
v18: 0x0000000000000000_0000000000000000  v19: 0x0000000000000000_0000000000000000  
v20: 0x0000000000000000_0000000000000000  v21: 0x0000000000000000_0000000000000000  
v22: 0x0000000000000000_0000000000000000  v23: 0x0000000000000000_0000000000000000  
v24: 0xffffffffffffffff_ffffffff41000000  v25: 0x0000000000000000_0000000000000000  
v26: 0x4000000040000000_4000000040000000  v27: 0x0000000000000000_0000000000000000  
v28: 0x0000000000000000_0000000000000000  v29: 0x0000000000000000_0000000000000000  
v30: 0x0000000000000000_0000000000000000  v31: 0x0000000000000000_0000000000000000  
vtype: 0x00000000000000d0 vstart: 0x0000000000000000 vxsat: 0x0000000000000000
vxrm: 0x0000000000000000 vl: 0x0000000000000004 vcsr: 0x0000000000000000
build/RISCV/cpu/base.cc:1334: warn: gem5-rRegsDisplay : 
  $0 :                0   ra :         80000268   sp :         80009fa0   gp :                0 
  tp :                0   t0 :                0   t1 :         8000a000   t2 :                0 
  s0 :                0   s1 :               10   a0 :         8000c000   a1 :               40 
  a2 :         8000a000   a3 :         8000b000   a4 :                0   a5 :                4 
  a6 :                0   a7 :                0   s2 :         8000c000   s3 :               10 
  s4 :               10   s5 :                0   s6 :                0   s7 :                0 
  s8 :                0   s9 :                0  s10 :                0  s11 :                0 
  t3 :         8000b000   t4 :                0   t5 :         8000c040   t6 :               40 
build/RISCV/cpu/base.cc:1347: warn: gem5-fRegsDisplay : 
 ft0 : ffffffff00000000  ft1 : ffffffff00000000  ft2 : ffffffff00000000  ft3 : ffffffff00000000 
 ft4 : ffffffff00000000  ft5 : ffffffff00000000  ft6 : ffffffff00000000  ft7 : ffffffff00000000 
 fs0 : ffffffff00000000  fs1 : ffffffff00000000  fa0 : ffffffff00000000  fa1 : ffffffff00000000 
 fa2 : ffffffff00000000  fa3 : ffffffff00000000  fa4 : ffffffff00000000  fa5 : ffffffff00000000 
 fa6 : ffffffff00000000  fa7 : ffffffff00000000  fs2 : ffffffff00000000  fs3 : ffffffff00000000 
 fs4 : ffffffff00000000  fs5 : ffffffff00000000  fs6 : ffffffff00000000  fs7 : ffffffff00000000 
 fs8 : ffffffff00000000  fs9 : ffffffff00000000 fs10 : ffffffff00000000 fs11 : ffffffff00000000 
 ft8 : ffffffff00000000  ft9 : ffffffff00000000 ft10 : ffffffff00000000 ft11 : ffffffff00000000 
build/RISCV/cpu/base.cc:1398: warn: gem5-CsrDisplay : 
pc :         800001b8      mstatus :        a00000000 mcause :                0 mepc    :                0
                           sstatus :        200000000 scause :                0 sepc    :                0
satp    :                0
mip     :                0 mie     :                0 mscratch:                0 sscratch:                0
mideleg :                0 medeleg :                0
mtval   :                0 stval   :                0 mtvec   :                0 stvec   :                0
privilege mode : 3
build/RISCV/cpu/base.cc:1423: warn: gem5-VectorDisplay : 
v00 : 0000000000000000_0000000000000000 v01 : 0000000000000000_0000000000000000
v02 : 0000000000000000_0000000000000000 v03 : 0000000000000000_0000000000000000
v04 : 0000000000000000_0000000000000000 v05 : 0000000000000000_0000000000000000
v06 : 0000000000000000_0000000000000000 v07 : 0000000000000000_0000000000000000
v08 : 0000000000000000_0000000000000000 v09 : 0000000000000000_0000000000000000
v10 : 0000000000000000_0000000000000000 v11 : 0000000000000000_0000000000000000
v12 : 0000000000000000_0000000000000000 v13 : 0000000000000000_0000000000000000
v14 : 0000000000000000_0000000000000000 v15 : 0000000000000000_0000000000000000
v16 : 0000000000000000_0000000000000000 v17 : 0000000000000000_0000000000000000
v18 : 0000000000000000_0000000000000000 v19 : 0000000000000000_0000000000000000
v20 : 0000000000000000_0000000000000000 v21 : 0000000000000000_0000000000000000
v22 : 0000000000000000_0000000000000000 v23 : 0000000000000000_0000000000000000
v24 : 0000000000000000_0000000000000000 v25 : 0000000000000000_0000000000000000
v26 : 4000000040000000_4000000040000000 v27 : 0000000000000000_0000000000000000
v28 : 0000000000000000_0000000000000000 v29 : 0000000000000000_0000000000000000
v30 : 0000000000000000_0000000000000000 v31 : 0000000000000000_0000000000000000
vtype   :               d0 vstart   :                0  vxsat   :                0
vxrm    :                0 vl       :                4  vcsr    :                0


build/RISCV/cpu/base.hh:702: warn: start dump last 20 committed msg
build/RISCV/cpu/base.hh:705: warn: V [sn:18198 pc:0x8000017c] c_li t4, 0, res: 0
build/RISCV/cpu/base.hh:705: warn: V [sn:18199 pc:0x8000017e] fsw fa3, 0(a0), paddr: 0x8000c000
build/RISCV/cpu/base.hh:705: warn: V [sn:18200 pc:0x80000182] bge zero, a1, 92
build/RISCV/cpu/base.hh:705: warn: V [sn:18212 pc:0x80000186] flw fa5, 0(a0), res: 0xffffffff00000000, paddr: 0x8000c000
build/RISCV/cpu/base.hh:705: warn: V [sn:18213 pc:0x8000018a] addiw a6, t4, 0, res: 0
build/RISCV/cpu/base.hh:705: warn: V [sn:18214 pc:0x8000018e] c_li a4, 0, res: 0
build/RISCV/cpu/base.hh:705: warn: V [sn:18215 pc:0x80000190] addw a2, a7, a4, res: 0
build/RISCV/cpu/base.hh:705: warn: V [sn:18216 pc:0x80000194] addw a3, a6, a4, res: 0
build/RISCV/cpu/base.hh:705: warn: V [sn:18217 pc:0x80000198] c_slli a2, 2, res: 0
build/RISCV/cpu/base.hh:705: warn: V [sn:18218 pc:0x8000019a] c_slli a3, 2, res: 0
build/RISCV/cpu/base.hh:705: warn: V [sn:18219 pc:0x8000019c] subw a5, a1, a4, res: 0x40
build/RISCV/cpu/base.hh:705: warn: V [sn:18220 pc:0x800001a0] c_add a2, t1, res: 0x8000a000
build/RISCV/cpu/base.hh:705: warn: V [sn:18221 pc:0x800001a2] c_add a3, t3, res: 0x8000b000
build/RISCV/cpu/base.hh:705: warn: V [sn:18222 pc:0x800001a4] vsetvli a5, a5, e32, m1, ta, ma, res: 0x4
build/RISCV/cpu/base.hh:705: warn: V [sn:18223 pc:0x800001a8] vle32_v_micro v24, 0(a2), zero, res: 3f8000003f800000_3f8000003f800000, paddr: 0x8000a000
build/RISCV/cpu/base.hh:705: warn: V [sn:18224 pc:0x800001ac] vle32_v_micro v26, 0(a3), zero, res: 4000000040000000_4000000040000000, paddr: 0x8000b000
build/RISCV/cpu/base.hh:705: warn: V [sn:18225 pc:0x800001b0] vmv_v_i_micro v25, v0, 0, res: 0000000000000000_0000000000000000
build/RISCV/cpu/base.hh:705: warn: V [sn:18226 pc:0x800001b4] vfmul_vv_micro v24, v24, v26, res: 4000000040000000_4000000040000000
build/RISCV/cpu/base.hh:705: warn: V [sn:18227 pc:0x800001b8] vfredusum_vs_micro v24, v24, res: 0000000000000000_0000000041200000
build/RISCV/cpu/base.hh:705: warn: V [sn:18228 pc:0x800001b8] vfredusum_vs_micro v24, v25,vtmp0, res: 0000000000000000_0000000000000000
build/RISCV/cpu/base.cc:1304: panic: Difftest failed!

@tastynoob
Copy link
Collaborator

It seem like vectorReduceFloatFormat has bug, because of add vectorOldVDElim
May you avoid using vector reduce inst type?

@DCliuzhe
Copy link
Author

Thanks for your reply, I will try not to use vectorReduce next time. Please mention it in your commit logs after you fix the bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants