Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Geometry API Rethink #102

Open
Yurlungur opened this issue Apr 12, 2022 · 4 comments
Open

Geometry API Rethink #102

Yurlungur opened this issue Apr 12, 2022 · 4 comments
Labels
help wanted Extra attention is needed question Further information is requested

Comments

@Yurlungur
Copy link
Collaborator

This issue is motivated by a discussion with @jdolence . A few related questions arose about geometry:

  1. In certain parts of the code, it would be nice to be able to fill geometry in a "pencil" in X1. For example, in the Riemann solver, this would be powerful as far as enabling vectorization.
  2. In the current way we use the caching, we currently order the arrays GEOM_ELEM,k,j,i. But perhaps given the way we use the geometry package, k,j,i,GEOM_ELEM would be more performant.
  3. If caching can be made performant, should we ever use analytic geometries? Or should we always do everything on the grid? In the latter case, grid-based optimizations can be added.

I don't think we currently have any answers, so we decided to punt for now and move forward. But I wanted to have these questions written down so we can revisit them later.

@Yurlungur Yurlungur added help wanted Extra attention is needed question Further information is requested labels Apr 12, 2022
@bprather
Copy link
Collaborator

Hey, these sound like my kind of questions! Would love to be in the loop if there are further discussions here.

  1. Would this mean filling up a 'scratch' array before running a performant inner loop? I've had mixed luck trying this with different variables but I have no idea what's going on with my caches at hardware level, maybe I'm ordering the scratch spaces wrong.

  2. KHARMA uses the latter ordering, fwiw. It would be pretty easy to modify coordinates.{hpp,cpp} to reorder the underlying array and test this without reordering the calling convention.

  3. My limited tests of analytic geometry definitions have been uniformly bad. Not just that they're slower (which may just be my janky variant-based object code), but also that performant implementations would give up advantages that are more useful than a bit of extra performance, e.g. easy addition of new spacetimes/transformations and runtime-selected geometry. A cache has predictable performance, if not optimal, which is still useful.

@Yurlungur
Copy link
Collaborator Author

Definitely be interested to talk more! It might be worth having a telecon with interested parties at some point... although also at some point you'll just be here. ;)

Would this mean filling up a 'scratch' array before running a performant inner loop? I've had mixed luck trying this with different variables but I have no idea what's going on with my caches at hardware level, maybe I'm ordering the scratch spaces wrong.

Yes, exactly. Similar to how Athena++ writes its loops over k,j and then an inner loop over a pencil. We've had mixed success with this too. On GPU doing things this way introduces a dependence on meshblock size, since you need to saturate the GPU.

KHARMA uses the latter ordering, fwiw. It would be pretty easy to modify coordinates.{hpp,cpp} to reorder the underlying array and test this without reordering the calling convention.

Yes---I actually did a little experiment found the latter to be about a factor of 2 faster... what's not clear is how much of a speedup that translates to in Phoebus. It also requires writing some variable packing machinery in phoebus because I currently leverage MeshBlockPack and VariablePack in parthenon. The Phoebus geometry cache uses parthenon variables so that geometry gets automatically generated and managed when you do AMR.

My limited tests of analytic geometry definitions have been uniformly bad. Not just that they're slower (which may just be my janky variant-based object code), but also that performant implementations would give up advantages that are more useful than a bit of extra performance, e.g. easy addition of new spacetimes/transformations and runtime-selected geometry. A cache has predictable performance, if not optimal, which is still useful.

I'm pretty proud of the analytic machinery I've come up with... I think performance with it is pretty good, and it's easy to add new spacetimes/transformations thanks to a template-based hierarchy. But you're right that predictable performance may be more valuable than optimal performance---I think that's the real reason to move away from analytic coordinate systems.

@jti-lanl
Copy link
Collaborator

I am interested in listening to the discussion (at least), if this becomes a meeting or con-call.

I don't know if it's relevant, but I had an experience in early GPGPU days, where I discovered that I got significantly better performance doing redundant computations, rather than trying to cache intermediate values. That was a specific thing, though; I wouldn't assume it applies to your code.

@Yurlungur
Copy link
Collaborator Author

Yeah I think that may be the case, @jti-lanl currently I don't think the cached machinery is much faster than the analytic formulae.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants