Geometry API Rethink #102

Yurlungur · 2022-04-12T01:16:52Z

This issue is motivated by a discussion with @jdolence . A few related questions arose about geometry:

In certain parts of the code, it would be nice to be able to fill geometry in a "pencil" in X1. For example, in the Riemann solver, this would be powerful as far as enabling vectorization.
In the current way we use the caching, we currently order the arrays GEOM_ELEM,k,j,i. But perhaps given the way we use the geometry package, k,j,i,GEOM_ELEM would be more performant.
If caching can be made performant, should we ever use analytic geometries? Or should we always do everything on the grid? In the latter case, grid-based optimizations can be added.

I don't think we currently have any answers, so we decided to punt for now and move forward. But I wanted to have these questions written down so we can revisit them later.

The text was updated successfully, but these errors were encountered:

bprather · 2022-04-12T19:39:13Z

Hey, these sound like my kind of questions! Would love to be in the loop if there are further discussions here.

Would this mean filling up a 'scratch' array before running a performant inner loop? I've had mixed luck trying this with different variables but I have no idea what's going on with my caches at hardware level, maybe I'm ordering the scratch spaces wrong.
KHARMA uses the latter ordering, fwiw. It would be pretty easy to modify coordinates.{hpp,cpp} to reorder the underlying array and test this without reordering the calling convention.
My limited tests of analytic geometry definitions have been uniformly bad. Not just that they're slower (which may just be my janky variant-based object code), but also that performant implementations would give up advantages that are more useful than a bit of extra performance, e.g. easy addition of new spacetimes/transformations and runtime-selected geometry. A cache has predictable performance, if not optimal, which is still useful.

Yurlungur · 2022-04-12T20:04:53Z

Definitely be interested to talk more! It might be worth having a telecon with interested parties at some point... although also at some point you'll just be here. ;)

Would this mean filling up a 'scratch' array before running a performant inner loop? I've had mixed luck trying this with different variables but I have no idea what's going on with my caches at hardware level, maybe I'm ordering the scratch spaces wrong.

Yes, exactly. Similar to how Athena++ writes its loops over k,j and then an inner loop over a pencil. We've had mixed success with this too. On GPU doing things this way introduces a dependence on meshblock size, since you need to saturate the GPU.

KHARMA uses the latter ordering, fwiw. It would be pretty easy to modify coordinates.{hpp,cpp} to reorder the underlying array and test this without reordering the calling convention.

Yes---I actually did a little experiment found the latter to be about a factor of 2 faster... what's not clear is how much of a speedup that translates to in Phoebus. It also requires writing some variable packing machinery in phoebus because I currently leverage MeshBlockPack and VariablePack in parthenon. The Phoebus geometry cache uses parthenon variables so that geometry gets automatically generated and managed when you do AMR.

My limited tests of analytic geometry definitions have been uniformly bad. Not just that they're slower (which may just be my janky variant-based object code), but also that performant implementations would give up advantages that are more useful than a bit of extra performance, e.g. easy addition of new spacetimes/transformations and runtime-selected geometry. A cache has predictable performance, if not optimal, which is still useful.

I'm pretty proud of the analytic machinery I've come up with... I think performance with it is pretty good, and it's easy to add new spacetimes/transformations thanks to a template-based hierarchy. But you're right that predictable performance may be more valuable than optimal performance---I think that's the real reason to move away from analytic coordinate systems.

jti-lanl · 2022-04-12T20:10:59Z

I am interested in listening to the discussion (at least), if this becomes a meeting or con-call.

I don't know if it's relevant, but I had an experience in early GPGPU days, where I discovered that I got significantly better performance doing redundant computations, rather than trying to cache intermediate values. That was a specific thing, though; I wouldn't assume it applies to your code.

Yurlungur · 2022-04-12T20:18:08Z

Yeah I think that may be the case, @jti-lanl currently I don't think the cached machinery is much faster than the analytic formulae.

Yurlungur added help wanted Extra attention is needed question Further information is requested labels Apr 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Geometry API Rethink #102

Geometry API Rethink #102

Yurlungur commented Apr 12, 2022

bprather commented Apr 12, 2022

Yurlungur commented Apr 12, 2022

jti-lanl commented Apr 12, 2022

Yurlungur commented Apr 12, 2022

Geometry API Rethink #102

Geometry API Rethink #102

Comments

Yurlungur commented Apr 12, 2022

bprather commented Apr 12, 2022

Yurlungur commented Apr 12, 2022

jti-lanl commented Apr 12, 2022

Yurlungur commented Apr 12, 2022