Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compiler generates incorrect code when doing loop unrolling. #6

Open
atgutier opened this issue Mar 6, 2015 · 0 comments
Open

Compiler generates incorrect code when doing loop unrolling. #6

atgutier opened this issue Mar 6, 2015 · 0 comments

Comments

@atgutier
Copy link

atgutier commented Mar 6, 2015

I have encountered a problem with some code that is generated when a loop is unrolled. For each iteration of the loop, the compiler is pre-computing the the condition and storing it in the spill stack, then for each iteration it loads the variable and performs a conditional branch cbr based on that variable. For some reason, in the code for the second iteration of the loop, and only this iteration, the compiler is inverting the condition variable with a not operation. This leads to the data not being stored in memory when the condition is supposed to execute, and a buffer overflow when it is not supposed to execute.

Unrolling the loop by hand works and the correct output is observed, which matches a previous version of HSAIL-HLC-Stable.

Here is an example of the loop and how it is unrolled.

CL code:
for (int i = 0; i < 16; i++) {
if (16 * tid + i < length)
array[16 * tid + i] = sum[0] + array2[16 * lid + i];
}

HSAIL disassembled code:

//iteration 1
@BB7_49:
// %if.end29
barrier;
ld_spill_align(4)_u32 $s1, [%__spillStack][24];
cvt_b1_u32 $c0, $s1;
cbr_b1 $c0, @BB7_51;
// BB#50: // %if.then39
cvt_s64_s32 $d3, $s21;
shl_u64 $d3, $d3, 2;
add_u64 $d19, $d1, $d3;
cvt_u32_u64 $s1, $d2;
ld_group_align(4)_u32 $s1, [$s1];
ld_group_align(4)_width(WAVESIZE)_u32 $s3, [%__hsa_replaced_Kernel_sum_0_0];
add_u32 $s1, $s1, $s3;
st_global_align(4)_u32 $s1, [$d19];

//iteration 2
@BB7_51:
// %for.inc50
ld_spill_align(4)_u32 $s1, [%__spillStack][28];
cvt_b1_u32 $c0, $s1;
not_b1 $c0, $c0; //incorrect inversion
cbr_b1 $c0, @BB7_53;
// BB#52: // %if.then39.1
cvt_s64_s32 $d3, $s22;
shl_u64 $d3, $d3, 2;
add_u64 $d19, $d1, $d3;
cvt_u32_u64 $s1, $d4;
ld_group_align(4)_u32 $s1, [$s1];
ld_group_align(4)_width(WAVESIZE)_u32 $s3, [%__hsa_replaced_Kernel_sum_0_0];
add_u32 $s1, $s1, $s3;
st_global_align(4)_u32 $s1, [$d19];

//iteration 3
@BB7_53:
// %for.inc50.1
ld_spill_align(4)_u32 $s1, [%__spillStack][32];
cvt_b1_u32 $c0, $s1;
cbr_b1 $c0, @BB7_55;
// BB#54: // %if.then39.2
cvt_s64_s32 $d3, $s23;
shl_u64 $d3, $d3, 2;
add_u64 $d19, $d1, $d3;
cvt_u32_u64 $s1, $d5;
ld_group_align(4)_u32 $s1, [$s1];
ld_group_align(4)_width(WAVESIZE)_u32 $s3, [%__hsa_replaced_Kernel_sum_0_0];
add_u32 $s1, $s1, $s3;
st_global_align(4)_u32 $s1, [$d19];

etc...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant