-
Notifications
You must be signed in to change notification settings - Fork 722
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmdLineTester_criu_jitPostRestore Test -Xnojit -Xnoaot hang #20663
Comments
Issue Number: 20663 |
https://openj9-jenkins.osuosl.org/job/Test_openjdk23_j9_sanity.functional_s390x_linux_Nightly_testList_0/116 - rh8-390-2 |
@JasonFengJ9 Please take a look |
50x grinder - https://openj9-jenkins.osuosl.org/job/Grinder/3998/ |
The grinder #20663 (comment) 40/50 passed in 4 machines, 10/10 failed at |
Can rh8-390-2 be reboot? FYI @AdamBrousseau |
Done |
3x grinder on the same machine - https://openj9-jenkins.osuosl.org/job/Grinder/3999/ - all failed |
It also occurs on rh8-390-1 as per the latest failure, which I'm sure I've seen before, but didn't spend much time looking for it. |
Reproduced the hang at a fyre machine,
Attach gdb to the hang java process and the native stacktrace are
FYI @dsouzai |
Hm, I spent a good chunk of time trying to reason about this and I'm not sure what the root cause could be. Thread 22 above isn't really doing anything wrong. However, Thread 2 I don't know why it's calling I'll have to repro the problem myself to get my hands on more context. |
Set up my own fyre machine and was able to repro the issue similar to Jason. I think I know what's going on. The hang, I believe, happens because the thread is stuck in a loop of trying to trigger a recompile (which does not succeed because of Normally what happens is, if the recompile request fails, the start PC gets patched to jump to the interpreter. In this case though, things are more complicated because of
This scenario can never happen in a normal run because if a similar situation happened because of say HCR, no one would be able to invoke the old To solve this, in I suppose depending on what we want the semantics of [1] openj9/runtime/compiler/control/rossa.cpp Lines 326 to 331 in 395aab5
|
https://openj9-jenkins.osuosl.org/job/Test_openjdk11_j9_sanity.functional_ppc64le_linux_Nightly_testList_0/658 - ubu22-ppc64le-5
|
HCR guards only protect inlined code, not calls. When a class is redefined, we invalidate its JIT bodies: openj9/runtime/compiler/control/HookedByTheJit.cpp Lines 2582 to 2611 in 395aab5
There could be direct calls to these bodies from others Preexistence invalidation can do the same thing I have changes in the works to make class unloading do the same thing as well, when a surviving JIT body has unloaded methods inlined into it (for #16616)
Since a JIT body could be invalidated for other reasons, and there could be direct calls to it, I don't think this will work. Is it maybe possible to fix it in step 7? It seems to me that the problem is that the invalidated body fails to forward to the body that is supposed to have replaced it. If it did still forward, then we'd call the old body, which would jump to the new body, which would in turn jump to the interpreter |
I happened to be looking at this recently, and I thought I'd mention that under FSD there's a completely separate path branched from the conditional at
|
Why wouldn't this work? For context, this is related to Options Processing Post-Restore; if we see If post-restore some method makes a direct call to the old body AFTER
the forwarding helper of the old body will not be able to find the new body; it doesn't matter that the new body's startPC was appropriately patched to transition to the interpreter. This is due to the fact that calling So, in the case where I want to prevent execution to the extent possible in the Post-Restore Options Processing code when we see I'm not suggesting to do this anywhere else; it only makes sese to do so in this very specific case; location [1] for context. [1] openj9/runtime/compiler/control/OptionsPostRestore.cpp Lines 439 to 442 in 395aab5
|
I've seen this multiple times in the last few days, always on zlinux.
https://openj9-jenkins.osuosl.org/job/Test_openjdk21_j9_sanity.functional_s390x_linux_Nightly_testList_0/358 - rh8-390-2
cmdLineTester_criu_jitPostRestore_3
The text was updated successfully, but these errors were encountered: