Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transfer array from Singe #48

Merged
merged 1 commit into from
Aug 29, 2024
Merged

Transfer array from Singe #48

merged 1 commit into from
Aug 29, 2024

Conversation

BrendanKKrueger
Copy link
Collaborator

PR Summary

An implementation of std::array that works on both the CPU and the GPU.

PR Checklist

  • Any changes to code are appropriately documented.
  • Code is formatted.
  • Install test passes.
  • Docs build.
  • If preparing for a new release, update the version in cmake.

@BrendanKKrueger BrendanKKrueger added the enhancement New feature or request label Aug 26, 2024
@BrendanKKrueger BrendanKKrueger self-assigned this Aug 26, 2024
Copy link
Collaborator

@Yurlungur Yurlungur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big picture I think this is a great addition and would like to see this included. Given discussion with @mauneyc-LANL and @rbberger I think we cna merge without CI actually running, so long as you ran tests locally and we can wait for the great test rewrite from @rbberger .

A few comments/questions/requested changes below. Feel free to push back on anything.

ports-of-call/array.hpp Show resolved Hide resolved
ports-of-call/array.hpp Show resolved Hide resolved
ports-of-call/array.hpp Show resolved Hide resolved
ports-of-call/array.hpp Outdated Show resolved Hide resolved
test/test_array.hpp Outdated Show resolved Hide resolved
test/test_portsofcall.cpp Outdated Show resolved Hide resolved
@Yurlungur
Copy link
Collaborator

Oh also there should be at least a mention this exists in the docs.

@mauneyc-LANL
Copy link
Collaborator

I was digging around a bit and found this

NVIDIA/libcudacxx#51 (comment)

TLDR: 1) --expt-relaxed-constexpr & -std=c++17 seems to provide sufficient functionality for std::array on device, 2.) libcu++ rolls a std::array in cuda/std/array (as of 2023).

Would this make a hand-rolled std::array necessary?

@Yurlungur
Copy link
Collaborator

I was digging around a bit and found this

NVIDIA/libcudacxx#51 (comment)

TLDR: 1) --expt-relaxed-constexpr & -std=c++17 seems to provide sufficient functionality for std::array on device, 2.) libcu++ rolls a std::array in cuda/std/array (as of 2023).

Would this make a hand-rolled std::array necessary?

👍 This is what we do in parthenon. But is that fully portable?

@BrendanKKrueger
Copy link
Collaborator Author

I was digging around a bit and found this
NVIDIA/libcudacxx#51 (comment)
TLDR: 1) --expt-relaxed-constexpr & -std=c++17 seems to provide sufficient functionality for std::array on device, 2.) libcu++ rolls a std::array in cuda/std/array (as of 2023).
Would this make a hand-rolled std::array necessary?

👍 This is what we do in parthenon. But is that fully portable?

Most of std::array works on GPUs with relaxed constexpr, but not all. In particular, std::array::fill and std::array::swap aren't constexpr until C++20 and I believe Singe code uses std::array:fill. Once we move to C++20 (hopefully much faster than the 14-to-17 transition), then this code becomes redundant if we're comfortable relying on relaxed constexpr to get the STL onto the GPU.

@Yurlungur
Copy link
Collaborator

I was digging around a bit and found this
NVIDIA/libcudacxx#51 (comment)
TLDR: 1) --expt-relaxed-constexpr & -std=c++17 seems to provide sufficient functionality for std::array on device, 2.) libcu++ rolls a std::array in cuda/std/array (as of 2023).
Would this make a hand-rolled std::array necessary?

👍 This is what we do in parthenon. But is that fully portable?

Most of std::array works on GPUs with relaxed constexpr, but not all. In particular, std::array::fill and std::array::swap aren't constexpr until C++20 and I believe Singe code uses std::array:fill. Once we move to C++20 (hopefully much faster than the 14-to-17 transition), then this code becomes redundant if we're comfortable relying on relaxed constexpr to get the STL onto the GPU.

that's a satisfying explanation for me. 👍

@BrendanKKrueger
Copy link
Collaborator Author

This is ready for another review

@BrendanKKrueger BrendanKKrueger marked this pull request as ready for review August 29, 2024 16:58
Copy link
Collaborator

@Yurlungur Yurlungur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Only one concern in the tests otherwise I think this is good to go.

test/test_array.cpp Outdated Show resolved Hide resolved
#include <iterator>
#include <tuple>

namespace PortsOfCall {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yurlungur, is this namespacing correct?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep 👍

@Yurlungur Yurlungur merged commit f3775d1 into main Aug 29, 2024
4 checks passed
@Yurlungur Yurlungur deleted the bkk_array branch August 29, 2024 19:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants