-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Add global search option to 3-opt layout #110089
base: main
Are you sure you want to change the base?
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
/azp run runtime-coreclr outerloop, Fuzzlyn, Antigen |
Azure Pipelines successfully started running 3 pipeline(s). |
cc @dotnet/jit-contrib, @AndyAyersMS PTAL. I plan to disable this before merging it in, but the diffs show this is expensive TP-wise, but not nearly as expensive as I initially found it to be (I suspect my first prototype was doing something naive with the cost model computation). I don't think we'll have much trouble enabling this for some production scenarios. Antigen and Fuzzlyn failures are unrelated. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
What sort of score improvements do you see from this? Maybe surface the score as a jit metric so you can easily compare score changes across many methods?
One notable source of diffs has to do with the movement of backward jumps. The greedy implementation takes a pseudo 4-opt approach to backward jumps by considering partitioning before the destination block, before the source block, and directly after the source block. The global implementation won't consider such partition shapes, and thus isn't considering plenty of backward jump moves that the greedy implementation does try. For the diffs I looked at, the greedy approach sometimes finds a less costly layout according to the model (though the layout itself looks far less canonical). This divergence in behavior between these approaches raises two concerns for me:
I think we should try to resolve this divergence in behavior around loop backedges, though I don't have an ideal solution yet. One thing we can try is to not consider backedges at all in the greedy variant, and introduce a new pass after 3-opt that uses the loop data structures we computed for the RPO traversal to move backedges around, though adding yet another phase (even if it replaces something like |
Follow-up to #103450, part of #107749. This is enabled for now just to get a CI run in; I intend to disable this by default since it's prohibitively expensive to run for most cases. In a follow-up, we can explore turning this on for sufficiently small regions, and/or when we think the current block layout is close to optimal.