-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add infrastructure to debug deadlocks of the AXI connections #234
Comments
I just ran into this issue with a stencil PE synthesized by Vivado HLS that deadlocks the whole system. The code in question looks like this: void tlf(float *buf1, float *buf2, ...) {
for (int i = 0; ...) {
for (int j = 0; ...) {
for (int k = 0; ...) {
buf1[calcIdx(i,j,k)] = buf2[calcIdx(i,j,k)] + buf2[...] + buf3 + ...;
}
}
}
} Am I using the HLS wrong (i.e., is this a case of RTFM) or ist it a bug in Vivado HLS that it generates AXI master bus requests that deadlock the system? If so, could you reference some information about how to workaround this issue? |
Hi @forflo, so far we have mainly observed this behavior with two HDL-implemented PEs. Usually, the problem is that one PE has an ongoing write request (e.g., AXI4 burst write) that it cannot complete until it reads further elements. At the same time, another PE has an ongoing read request (e.g., a burst read) that it cannot complete before it can store some results. This situation then leads to a deadlock, as neither PE can resume operation and complete the ongoing request. For your case: Have you tried to (partially) buffer the inputs in some internal array (= on-chip memory) and see if the problem persists? How many PEs do you have in your design? You should also make sure to not read or write past the boundaries of your input/output arrays, this can also cause the memory interface to stop operating. |
Hi @sommerlukas, thank you for your quick reply.
Just one, syntheisized with Vivado HLS 2018.2 used in a TaPaSCo composition (current master). I also tried Vivado HLS 2020.1 with the same behavior.
I know that this could have been the reason for the problem, so I first tried to find out-of-bound accesses with In the Vivado HLS log file, I noticed that it infers an AXI burst for the writes. Due to the structure of the 3D stencil, the write accesses are sequential while the reads are not. So what you describe by
might actually be the behavior of my IP. Can it be an issue that TaPaSCo instructs the HLS to use a different port ( Is there a flag in TaPaSCo to bundle ports? |
Hi @forflo, I think for the kind of deadlock we mainly had in mind for this issue, one would need two write ports. From the code snippet above there only seems to be one write port. At least up to the MIG, read and write are separate signals on the AXI bus, so they should not interfere with each other, although it's hard to definitely tell what the Xilinx MIG will make of that. You could try to see if this behavior is caused by merging the ports by removing the If you need to dig deeper, you could also give TaPaSCo's debug feature a try, as described here. By attaching an ILA to the master of the PE, you should be able to see the AXI transactions and maybe find the cause of the deadlock. Usage of the feature might be easier through a |
Hi @forflo, |
Thank you @wirthjohannes and @sommerlukas for your help. I will investigate that issue further and will add some of my insights here. |
Okay, short summary what I have found out so far: In multi port designs, such as I modified the code and forced the three events writereq, write, and writeresp (in this order) to the end of the instruction chain (which did not change otherwise). The new schedule does not dead-lock the hardware. |
Hi @forflo, thanks for investigating this, very interesting insights! Just out of curiosity and maybe also as a future reference for other users: How did you enforce the write-request to be scheduled at the end of the instruction sequence, via a |
I first tried out Then I replaced array subscripts by calls to these two functions: void my_write(volatile float *A, int i, float val) { A[i] = val; }
float my_read(volatile float *A, int i) { return A[i]; } That is, I converted something like |
Right now it is very easy to deadlock the whole system if a PE e.g. reads from the DDR and writes to the DDR but never deals with the result of either request. Right now such a situation is quite hard to debug. Most often this scenario happens when the DMA engine is active at the same time.
Right now I have to possible solutions:
The text was updated successfully, but these errors were encountered: