You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have encountered a problem with some code that is generated when a loop is unrolled. For each iteration of the loop, the compiler is pre-computing the the condition and storing it in the spill stack, then for each iteration it loads the variable and performs a conditional branch cbr based on that variable. For some reason, in the code for the second iteration of the loop, and only this iteration, the compiler is inverting the condition variable with a not operation. This leads to the data not being stored in memory when the condition is supposed to execute, and a buffer overflow when it is not supposed to execute.
Unrolling the loop by hand works and the correct output is observed, which matches a previous version of HSAIL-HLC-Stable.
Here is an example of the loop and how it is unrolled.
CL code:
for (int i = 0; i < 16; i++) {
if (16 * tid + i < length)
array[16 * tid + i] = sum[0] + array2[16 * lid + i];
}
I have encountered a problem with some code that is generated when a loop is unrolled. For each iteration of the loop, the compiler is pre-computing the the condition and storing it in the spill stack, then for each iteration it loads the variable and performs a conditional branch cbr based on that variable. For some reason, in the code for the second iteration of the loop, and only this iteration, the compiler is inverting the condition variable with a not operation. This leads to the data not being stored in memory when the condition is supposed to execute, and a buffer overflow when it is not supposed to execute.
Unrolling the loop by hand works and the correct output is observed, which matches a previous version of HSAIL-HLC-Stable.
Here is an example of the loop and how it is unrolled.
CL code:
for (int i = 0; i < 16; i++) {
if (16 * tid + i < length)
array[16 * tid + i] = sum[0] + array2[16 * lid + i];
}
HSAIL disassembled code:
//iteration 1
@BB7_49:
// %if.end29
barrier;
ld_spill_align(4)_u32 $s1, [%__spillStack][24];
cvt_b1_u32 $c0, $s1;
cbr_b1 $c0, @BB7_51;
// BB#50: // %if.then39
cvt_s64_s32 $d3, $s21;
shl_u64 $d3, $d3, 2;
add_u64 $d19, $d1, $d3;
cvt_u32_u64 $s1, $d2;
ld_group_align(4)_u32 $s1, [$s1];
ld_group_align(4)_width(WAVESIZE)_u32 $s3, [%__hsa_replaced_Kernel_sum_0_0];
add_u32 $s1, $s1, $s3;
st_global_align(4)_u32 $s1, [$d19];
//iteration 2
@BB7_51:
// %for.inc50
ld_spill_align(4)_u32 $s1, [%__spillStack][28];
cvt_b1_u32 $c0, $s1;
not_b1 $c0, $c0; //incorrect inversion
cbr_b1 $c0, @BB7_53;
// BB#52: // %if.then39.1
cvt_s64_s32 $d3, $s22;
shl_u64 $d3, $d3, 2;
add_u64 $d19, $d1, $d3;
cvt_u32_u64 $s1, $d4;
ld_group_align(4)_u32 $s1, [$s1];
ld_group_align(4)_width(WAVESIZE)_u32 $s3, [%__hsa_replaced_Kernel_sum_0_0];
add_u32 $s1, $s1, $s3;
st_global_align(4)_u32 $s1, [$d19];
//iteration 3
@BB7_53:
// %for.inc50.1
ld_spill_align(4)_u32 $s1, [%__spillStack][32];
cvt_b1_u32 $c0, $s1;
cbr_b1 $c0, @BB7_55;
// BB#54: // %if.then39.2
cvt_s64_s32 $d3, $s23;
shl_u64 $d3, $d3, 2;
add_u64 $d19, $d1, $d3;
cvt_u32_u64 $s1, $d5;
ld_group_align(4)_u32 $s1, [$s1];
ld_group_align(4)_width(WAVESIZE)_u32 $s3, [%__hsa_replaced_Kernel_sum_0_0];
add_u32 $s1, $s1, $s3;
st_global_align(4)_u32 $s1, [$d19];
etc...
The text was updated successfully, but these errors were encountered: