-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NVHPC Support #693
base: master
Are you sure you want to change the base?
NVHPC Support #693
Conversation
Not actualyl blocked, by Mostly seems to work, other than release RTC test suite failures. Debug is fine which makes tracing the fault more interesting. Release RTC examples also work, so it's test suite specific in some way? Vis works, but the vis repo needs CMake changes to address warnings (the same as the main repo + some extras). Segfault notes
#include "flamegpu/flamegpu.h"
#include "gtest/gtest.h"
namespace flamegpu {
namespace tests {
namespace test_nvhpc {
FLAMEGPU_AGENT_FUNCTION(cudacxx_test_func, flamegpu::MessageNone, flamegpu::MessageNone) {
return flamegpu::ALIVE;
}
const char* rtc_test_func = R"###(
FLAMEGPU_AGENT_FUNCTION(rtc_test_func, flamegpu::MessageNone, flamegpu::MessageNone) {
return flamegpu::ALIVE;
}
)###";
TEST(testNVHPC, RTCElapsedTime) {
ModelDescription m("m");
AgentDescription &agent = m.newAgent("agent");
// Using newRTCFunction and newFunction in the same compilation unit appears to cause the segfault within newRTCFunction.
// Comment out either call to remove the segfault.
agent.newFunction("cudacxx_test_func", cudacxx_test_func);
AgentFunctionDescription &func = agent.newRTCFunction("rtc_test_func", rtc_test_func);
}
} // namespace test_nvhpc
} // namespace tests
} // namespace flamegpu After chucking a bunch of |
If using GCC 8's stdlib rather than GCC 9'this builds ok in Release mode (nvc++ 21.7-0, ubuntu 21.04). |
Currently working on reproducing this with a simpler use case. Currently leaning towards or more of the following:
Will conitnue to work on the MWE a little, but if it doesn't reproduce soon it'll just get dumped into a gist for future reference. Running gcc and nvhpc builds through valgrind (with an appropriate cuda suppressions list) would be good and generally worthwhile on the whole. I tried enabling |
NVHPC repackages the location of curand compared to standalone nvcc. Prior to nvhpc 22.3 this is not correctly reflected by the include path during compilation via cmake when using nvhpc installed nvcc, but gcc as the host compiler. We may be able to resoilve this by requiring curand as a dependency in cmake, otherwise we might need to expliclty add an edge case to cmake to ensure this include path is set. |
after some horrible cmake additions to explicitly add the non-symlinkg math_libs include directory to include path(s) if required, curand is now found when using nvhpc installed nvcc, and nvhpc as the host compiler. However, this then exposes an issue with include path ordering and the finding / use of cub and thrust. The cub/thrust version mismatch check is identifying that they do not agree. locally using a cuda 11.8 nvhpc 22.11 which ships with cub/thrust 1.15, this is conflicting with the explicitly added cub/thrust 1.17 we fetch. This will be due to include directories and precedent. It might not be the case for all cmake/nvhpc combos, so i will force CI to investigate for me (once the outage ends?) some commits are wip, as cmake 3.18 needs to use a differnet method for symlink resolution compared to 3.19, which is not tested. |
some NVHPC builds via containers which fail to configure CMake are erroring due to:
This is with a version of nvcc distributed with the version of nvhpc which is apparently incompatible. |
It uses GCC's stdlib, so requires the same linker arguments to access std::experimental::filesystem
Swig 4.0.2 does not appear to build from source with NVHPC/Clang by default
… exposes an issue with thrust.
Includes CMP0152 which in CMake >= 3.28 changes symlink resolution behaviour, relevant to nvhpc workarounds.
cudaMemset takes an int not a uint64, so 0xfffffff was triggering an implicit cast sign change.
…mem issue with 23.11
…n be added to suppressions
nvcc believes it is incompatible with the versions of nvhpc it was distributed with...
std::experimental::filesystem
linker errors.stdc++fs
is required forstd::experimental::filesystem
pyflamegpu
/ swigCXX
. The build scripts set some flags which are not known (-ansi
).-devel-cudaXX.YY-
images for nvhpc versions.21.7-devel-cuda11.4-ubuntu20.04
21.7-devel-cuda11.4-centos7
Closes #977.