forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Github] Remove PULL_REQUEST_TEMPALTE since we allow PR's. #54
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
tarunprabhu
added a commit
to tarunprabhu/kitsune
that referenced
this pull request
Aug 29, 2024
Co-authored-by: Tarun Prabhu <tarun.prabhu@gmail.com>
pmccormick
added a commit
to pmccormick/kitsune
that referenced
this pull request
Sep 4, 2024
that now includes AMDGPU/HIP (compile and runtime updates), addresses a few build issues that have popped up for the team during testing the branch, and a few other odds-and-ends for cleanup and general (small'ish) improvements. Also includes: -- the matmult example from Jaeyoung. -- tweaks for cmake configuration of the experiments make file infrastructure. -- updates to make a closer match between the concepts and approach used by both the cuda and hip targets/runtimes. -- some tweaks / changes to the experiments. - Fixed some code generation details related to rocm 6.1.2 (e.g., xnack behaviors changed as did ecc details -- basically becoming part of the target binary vs. just an attribute). Apparnetly, this makes binary images incompatible for a given gpu config. Dropped in a kitsune (runtime) specific clang tidy file to keep it from taken all of LLVM's core setttings. A bit more cleanup and dropped a clang tidy config file to avoid kitsune's runtime code style details from getting beat up by the (toplevel) LLVM details. [Github] Remove PULL_REQUEST_TEMPALTE since we allow PR's. (lanl#54) Co-authored-by: Tarun Prabhu <tarun.prabhu@gmail.com> [kitsune] Fix peculiar build-time behavior (lanl#53) Because of the way the build system was set up, the targets and kitrt would be "installed" at build time i.e. when running ninja/make as opposed to ninja install/make install. This fixes that behavior and only installs during ninja build/ninja install. When building, there will be messages that suggest that those targets are actually being installed, but they are being "installed" to a subdirectory within the build directory. Co-authored-by: Tarun Prabhu <tarun.prabhu@gmail.com> Revert "[kitsune] Fix list of statically linked libraries" (lanl#55) Reverts lanl#51 dev/18.x linking error fixes (lanl#49) clang/lib/Frontend has a call to a Tapir << operator, which means it has to be linked against TapirOpts. There's unguarded checks to Value::dump in HipABI, which is disabled for release builds, so I've replaced them with LLVM_DEBUG calls. Note this means you don't have that output before assertion failure for release builds. Other options if that's important. Co-authored-by: George Stelle
tarunprabhu
added a commit
to tarunprabhu/kitsune
that referenced
this pull request
Oct 1, 2024
Co-authored-by: Tarun Prabhu <tarun.prabhu@gmail.com>
tarunprabhu
added a commit
to tarunprabhu/kitsune
that referenced
this pull request
Oct 1, 2024
Co-authored-by: Tarun Prabhu <tarun.prabhu@gmail.com>
tarunprabhu
added a commit
to tarunprabhu/kitsune
that referenced
this pull request
Oct 2, 2024
Co-authored-by: Tarun Prabhu <tarun.prabhu@gmail.com>
tarunprabhu
added a commit
to tarunprabhu/kitsune
that referenced
this pull request
Oct 21, 2024
LLVM 19.x. Credit for the work goes to the individuals listed in the commit messages below. commit bfd4fc3089ef5d3c0c51b112813e80ec24396294 Author: Tarun Prabhu <tarun@lanl.gov> Date: Thu Oct 3 12:52:03 2024 -0600 Merge with 19.x commit bc67a96ed1ed18eee5b88679fd54c40fe3a73073 Author: jsarrao <43554622+jsarrao@users.noreply.github.com> Date: Tue Sep 24 10:57:15 2024 -0700 [kitrt] Fixes for numpy extension module (#57) * [kitrt] Fixes for numpy extension module - Renamed kitrt.c to kitrt.cpp - Removed system allocators from extension module - Fixed typo in mem_realloc method name for both cuda and hip - Fixed signature for enable/disable mem handler * Unconditionally include kitrt.h. Fixed whitespace errors. commit ea97153b1b00a1ae3d974c8c6abb429882b1a96d Author: George Stelle <stelleg@gmail.com> Date: Thu Aug 29 10:21:26 2024 -0600 dev/18.x linking error fixes (#49) clang/lib/Frontend has a call to a Tapir << operator, which means it has to be linked against TapirOpts. There's unguarded checks to Value::dump in HipABI, which is disabled for release builds, so I've replaced them with LLVM_DEBUG calls. Note this means you don't have that output before assertion failure for release builds. Other options if that's important. commit ffdfdfe8a2ad1eb7e7666717790a5d62baca4976 Author: George Stelle <stelleg@gmail.com> Date: Thu Aug 29 10:20:22 2024 -0600 Revert "[kitsune] Fix list of statically linked libraries" (#55) Reverts lanl/kitsune#51 commit df767d40267d121829c3c715b119b1889e723a80 Author: Tarun Prabhu <tarunprabhu@gmail.com> Date: Thu Aug 29 08:02:06 2024 -0600 [kitsune] Fix peculiar build-time behavior (#53) Because of the way the build system was set up, the targets and kitrt would be "installed" at build time i.e. when running ninja/make as opposed to ninja install/make install. This fixes that behavior and only installs during ninja build/ninja install. When building, there will be messages that suggest that those targets are actually being installed, but they are being "installed" to a subdirectory within the build directory. Co-authored-by: Tarun Prabhu <tarun.prabhu@gmail.com> commit 3a4350e4e98883d873825dd31b83c3ed25b07634 Author: Tarun Prabhu <tarunprabhu@gmail.com> Date: Thu Aug 29 08:00:27 2024 -0600 [Github] Remove PULL_REQUEST_TEMPALTE since we allow PR's. (#54) Co-authored-by: Tarun Prabhu <tarun.prabhu@gmail.com> commit 1dbbc9236e7c95eedeebe1077a104d110cf89139 Author: Tarun Prabhu <tarunprabhu@gmail.com> Date: Thu Aug 29 08:00:01 2024 -0600 [NFC] Cleanup code (#50) Run clang-format on Kitsune-specific files. Remove trailing whitespace and excess newlines from elsewhere. Co-authored-by: Tarun Prabhu <tarun.prabhu@gmail.com> commit cde8fc71ec2f425dfbaac8b0328766429b086e2e Author: Joseph Sarrao <josephsarrao@gmail.com> Date: Tue Aug 27 14:18:17 2024 -0600 [kitsune] Fix list of statically linked libraries commit 613ee9d2b9466b32b021658dcbf238b9c921f8c1 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Wed Jul 24 13:58:13 2024 -0600 This is a squash of the commits below First cut at some improvments to the numpy module. Working on AMD/HIP fixes. More woes and struggles with HIP builds in the 18.x refactoring/overhaul... Oh the woes... Fixing a bad merge. More work on HIP integration into 18.x (bitcode search paths, build fixes, etc.). Added matmult example from Jaeyoung. WIP: Merge with LLVM 18.x WIP: Merge with LLVM 18.x and HIP target fixes. Oh the woes... Fixing a bad merge. More tweaks for a bit more stability with HIP enabled... Hopefully... Basic experiments working for full suite of cuda+hip+opencilk. Tweaks for cmake configure support in the experiments. Missed new intrinsics file. Fighting darwin specific woes. Working on HIP support for 18.x... - AMDGPU targets appear to require full feature strings now to be "appropriate" (e.g., sramecc and xnack settings much match what is reported on command line via rocminfo). - squashed a cmake bug Just minor clean up. Updates to get runtime concepts aligned between hip and cuda (hip now matching cuda design wrt stream creation). Forgot to clean up some debugging details... Some build tweaking for the experiments. Missed a function signature update. More benchmark/experiment tweaks. runtime fixes for new stream model (hip now matching cuda) more hip debugging... more verbose support for hip runtime. Fixing both runtime and code gen issues around hip stream changes. debugging hip. hip, hip, not hooray Trying to get default threads per block value to work. chasing performance issues. working on more runtime debugging details. Hopefully finished the initial/final hip support for 18.x... - runtime updated to manage streams better (less overhead, matches cuda design) - hipabi changed to reflect runtime interface changes. . A few more tweaks to the hip runtime details. - some code tweaks (deviation from the cuda code) due to missing hip functionality. - enabled auto-launch parameter settings as the default (likely far from perfect but better than hip provided hueristics on our examples/experiments. shame, shame... missed a include <string> (some compilers happy, others not so much...) Squashed commit for hip support in dev/18.x. High-level summary: - Fixed some code generation details related to rocm 6.1.2 (e.g., xnack behaviors changed as did ecc details -- basically becoming part of the target binary vs. just an attribute). Apparnetly, this makes binary images incompatible for a given gpu config. - hip runtime closer in functionality and design to cuda. This includes stream management (simplified) and launch parameter determination based on simple multi-proc load determination. Launch parameters still need a lot of work in concert with what the compiler can provide. - Some random bug fixes related to the details above. - Tweaked some of the experiment details to match the new kitsune build configuration for dev/18.x (llvm 18.x). A bit more cleanup and dropped a clang tidy config file to avoid kitsune's runtime from all LLVM code base restrictions. commit 7d125dff632e43dda3916146f04775e30f92445d Author: Tarun Prabhu <tarun.prabhu@gmail.com> Date: Wed Feb 21 10:40:29 2024 -0700 This is a squash of the commits below. commit 4a9db43abe38ce7a840d3f8ad830a69148af243c Author: Tarun Prabhu <tarun.prabhu@gmail.com> Date: Thu Aug 1 17:19:51 2024 -0600 Fix issues introduced after merge with intersect commit f29421607d362fd431ed3fc029cf32afce15a049 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Tue May 7 09:22:58 2024 -0600 moved the intersect experiment to its own directory. commit b1aad99111f3145b8f6e1581ad0776501b31a4a6 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Mon May 6 16:30:46 2024 -0600 Tweak makefile to match details and remove hard-coded gpu target. commit 9c74e29e278039a9ec4724966e1861fc1c9f45c2 Author: Danny Shevitz <shevitz@lanl.gov> Date: Mon May 6 11:32:30 2024 -0600 cleaned up intersect commit 2fcd7517b0dd3aab1e1972edb60600b79c3f96da Author: Danny Shevitz <shevitz@lanl.gov> Date: Thu Apr 11 09:56:01 2024 -0600 prior to merge uncommented in intersect commit f02cc1a94be95a2521af8266616ab7398c228acd Author: Danny Shevitz <shevitz@lanl.gov> Date: Fri Mar 29 09:38:53 2024 -0600 trapping the multi-target cuda stream error commit a6e6f766db0c97336015585380eb0e634262c329 Author: Danny Shevitz <shevitz@lanl.gov> Date: Wed Mar 27 11:10:03 2024 -0600 intersect is sort of working commit efbe047fa1878bed3084af8d7afdaa05f1d57c41 Author: Danny Shevitz <shevitz@lanl.gov> Date: Wed Mar 6 09:33:09 2024 -0700 At the moment, no LTO on intersect commit f31f2d4c0b4a974a633da1818ca6703727c21592 Author: Danny Shevitz <shevitz@lanl.gov> Date: Tue Mar 5 10:41:39 2024 -0700 prior to pulling, trying to get intersect working with LTO commit 2d8f485d641ccf04b3d3120a076f0ea534c30caf Author: Danny Shevitz <shevitz@lanl.gov> Date: Wed Feb 21 13:30:36 2024 -0700 modified the kokkos makefile so it finds the patched kokkos and added support for intersect by changing the recognized has kokkos flag commit 3e498f87a96b1fe32408cc74d54d54980462b45b Author: Danny Shevitz <shevitz@lanl.gov> Date: Mon Feb 12 11:38:34 2024 -0700 working on make multi-target/intersect build commit c0f771099c52a81cd6a41a5e7701fb1e9aa0b6b5 Author: Danny Shevitz <shevitz@lanl.gov> Date: Thu Feb 1 11:08:32 2024 -0700 Revert "fixed a typo in the makefiles" This reverts commit 0404e2fc5cbcd5ed8bc20a194ed056ef6bd06521. commit 227d32de1adbd813d645466b1258bc69be5e5c93 Author: Danny Shevitz <shevitz@lanl.gov> Date: Thu Feb 1 10:58:43 2024 -0700 updated cuda.mk commit 592a8ec6c79b2ad4f519d24d00cb229dced06b34 Author: Danny Shevitz <shevitz@lanl.gov> Date: Wed Oct 18 13:39:31 2023 -0600 fixed a typo in the makefiles commit 3588c8d357a2bdff0284724f1c9eea65711cad1d Author: Danny Shevitz <shevitz@lanl.gov> Date: Thu Feb 1 09:13:32 2024 -0700 finished merge with 16.x commit 474d653e39d478ed58c66e2e73058861941be3fc Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Fri Apr 12 10:35:38 2024 -0600 Disabled sync region optiizations (merging) due to issues with multi-target code. While this could have performance implications in some situations it is the only way we can avoid errors with mixed threaded and GPU code. More bugs may be lurking. Also includes updates to the runtime to deal with exposing GPU streams to the calling stack frame for correctly handling continuations. commit 35d70b993a1960e7c0399821f8f4ada89f1c25aa Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Thu Apr 4 16:30:40 2024 -0600 First attempt at fixing stream assignment from the runtime in a manner that GPU streams can be better captured (e.g., opencilk continuations) and GPU work can be launched and sync'ed by different host threads; this addresses a bug (flawed assumption) in the runtime when it comes to multi-target support and interoperability. commit 9e44d1189672d0c03c5744c873da51dba64b6588 Author: Patrick McCormick <> Date: Thu Mar 28 16:43:36 2024 -0600 fix bad context mistake -- relevant to multi-target thread-streams debugging... this is a temporary workaround and not a correctness guarantee for behaving well when opencilk and cuda targets are intermixed (it most certainly can also have performance implications). commit b74b28f5847b1d736870d7e6e50e33389518a363 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Tue Mar 26 17:10:34 2024 -0600 extra verbose mode details on thread-stream creation. commit e8d59a79e712d638ae11513bb466a1674653dfe0 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Tue Mar 26 17:04:16 2024 -0600 Quick thread-stream tweak (warning message update and context-based sync fallback). A few other odds and ends of cleanup. commit 6edc88e43175194bcf44b2846b92d2e20a8212ce Author: Tarun Prabhu <tarun.prabhu@gmail.com> Date: Thu Aug 1 16:44:25 2024 -0600 Undo change introduced by cherry picking commit. commit 0685a47c9d8a28dd96489203a227dc79b955392b Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Mon Mar 4 14:39:32 2024 -0700 A bit more rt feedback about libdl and some testing with rpath stuff in cmake. commit 37dbed14e0c5d04c36e6fc0fdc00d6336308602c Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Thu Feb 29 15:54:08 2024 -0700 Small fixes build logic (for no profiling) and nvidia cuda compute versions at runtime. commit 10f30e22788d3e9de75ce8129adab22139c8dd7f Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Wed Feb 21 10:40:29 2024 -0700 LTO fixes (opencilk bitcode file and auto-link args for tapir opencilk targets). Removed pure-kokkos tests as part of the default target set from all the experiments. Misc. clean up w/ experiments (e.g., makefiles), added LTO test, etc. commit 7732266f87f3efa46ce7cdbac5bbeef7a9b9c878 Author: Tarun Prabhu <tarun@lanl.gov> Date: Fri Feb 16 14:54:28 2024 -0700 Merge with LLVM 18.x commit f33cffebb4c2a85983d23480884802b12034d583 Author: Tarun Prabhu <tarun@lanl.gov> Date: Thu Feb 8 15:57:22 2024 -0700 This is a squash of all Kitsune commits to date. All credit goes to the individuals listed in the commit messages below. commit 5377f48ce1adf0d5c7fe1e7c65f66c768b8be669 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Wed Feb 14 13:32:36 2024 -0700 Tapir target tweaks, LTO touch-ups, etc. commit dbfc195996db5cc7deb5232cb791cf69f9acb179 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Tue Feb 13 16:57:54 2024 -0700 Fixes for LTO... commit f9094d35d3ce797ef70e7572a9aab621c444f275 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Mon Feb 12 14:31:23 2024 -0700 Chasing a bug in the LoopSpawning pass... commit 4ddb9d13f799ec072d4416d17e7a3779f50bcab4 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Mon Feb 12 11:09:20 2024 -0700 Runtime tweaks for refactoring launch parameters (for cuda). Tweaks to multi-file (LTO) euler3d experiment. commit f8ff7c53d0ebd493231ed0161f3aa753b772e8d7 Author: Patrick McCormick <> Date: Wed Feb 7 16:32:38 2024 -0700 More launch explorations. commit d720cedc84e9e793836bab99e6b53991d4152288 Author: Patrick McCormick <> Date: Wed Feb 7 10:10:59 2024 -0700 Tweaks on launch heuristics. commit ab61a3c9ca80361077ea69512970dfb4c7f0b3e5 Author: Patrick McCormick <> Date: Tue Feb 6 13:20:20 2024 -0700 working on experiments for benchmarking. commit f510df149aee441f1b4b5eac5022e58fe79f152b Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Tue Feb 6 13:31:53 2024 -0700 A bit more verobse output. commit 3eec9c0991178346a616f2ddde4a84bc34bda290 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Tue Feb 6 13:12:41 2024 -0700 Tweaks for launch heuristics (hacks). commit 75895a7167f448945291cf4137a0881d15e10272 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Fri Feb 2 08:37:20 2024 -0700 More launch and compiler related tweaks and tests. Fix a mistake in the error reporting for the runtime's dylib handling... commit e8ee550c232c317eace239ad8b211233016af2de Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Thu Feb 1 13:01:21 2024 -0700 Experimenting with launch details and some nvvm metadata. commit 87fb4e4c85e21d914b0c25d5b5d8ad82ee1c1ae2 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Mon Jan 29 11:22:56 2024 -0700 Tweak to force environment variable to override occupancy-based launch parameter settings. commit 721f9f9fe0c922b589536187d17e118a30a82266 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Mon Jan 29 09:23:19 2024 -0700 Tweaks for attribute support (launch parameters) and runtime auto-adjustment to launch parameters. commit 82c37acdc9e77067254a3c243bda9e354211ced6 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Tue Jan 23 11:56:22 2024 -0700 Small touch-ups on build details in experiments. Still finding some issues with kokkos, latest cuda (13.x), and other details (e.g., host compiler). commit c65d80ca725fe7ea8fc168277eda148b77565463 Merge: 1241ae086c7c acc3dfb18799 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Tue Jan 23 11:00:07 2024 -0700 Merge remote-tracking branch 'origin/multi' into dev/16.x commit acc3dfb187998cd8a53eca47c75340f9b22967ff Author: Tarun Prabhu <tarun@lanl.gov> Date: Thu May 25 10:35:36 2023 -0600 A squash of many commits covering a broad scope: 1. Address some bugs/details/features introduced with the 16.x merge. - includes some minor tweaks for 16.x testing but this needs more work. - clang's sema probably needs to be revisited and improved. 2. A significant overhaul of the runtime to support: - binding of calling threads to unique (gpu) streams - removal of a lot of crufty code that was no longer being used. - simplified kernel launch options/interface - occupancy-based launch parameters (can cause performance regressions) - better environment variable support for tweaking behaviors and more flexibility for experimentation, testing, and debugging. 3. In alignment with #2 portions of the transforms for CUDA and HIP have been cleaned up and simplified (in particular kernel launch details are much cleaner now). 4. Some bug fixes for attempts at post-processing code w/out parallel constructs. New "experiment" introduced to catch this as a regression. 5. Some runtime building blocks for driving prefetch operations. 6. Some new experiments/test codes. 7. Fix for nested outlining -- assumed dead-code elimination pass cleanup but fails with separate host and gpu code transformation modules. Had to introduce dead-code removal prior to gpu module passes (otherwise, the verifier pass fails). 8. Runtime entry points for numpy allocation entry points (e.g., calloc, realloc, etc.). TODO: Potentially some room here for GPU-side operations to improve performance. 9. Attribute support (e.g., target) for Kokkos 'statements'. 10. General code cleanup -- removing warnings, unused code, etc. 11. New support for launch parameter exploration within the experiments code base. 12. Some work on -ffast-math crashes and issues. TODO: This code needs to be further developed (expanded support for double-precision, additional entry points, etc.). There are also some issues here in what is specified on the command line can impact code from the host side but does not have a similar match on the GPU code of code transformation. TODO: ABI and other issues need to further explored. 13. Multiple target support within a code base is supported (e.g., run opencilk cpu threads and cuda-targeted forall loops). 14. Fixes around mutli-thread entry points within the runtime components. 15. Testing and feature support for H100; sync'ing CUDA and PTX version info, etc. commit 1241ae086c7c83a3319127661af076169a8a9ca5 Author: Patrick McCormick <> Date: Fri Dec 8 16:50:13 2023 -0700 Dealing with some crufty system libraries on Darwin... This will likely break on newer installs (e.g., Arch). commit c73442567dda8f1cdc1bce39d77cd1f7b5f4b12a Author: Patrick McCormick <> Date: Fri Dec 8 15:23:21 2023 -0700 Missed cleaning up some debug statements in last commit... TODO: -ffastmath stuff... commit 6d81192c1e0e28df58ee0e4b2d0978706b51aa70 Author: Patrick McCormick <> Date: Fri Dec 8 15:00:12 2023 -0700 Some testing on H100. commit a7b07c0f13d450db6aec24f5a15801ef5e43aa6b Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Fri Dec 8 14:58:00 2023 -0700 Cuda runtime tweaks for multi-target and multi-threads. Likely still extremely buggy under duress... commit a8bbeebd763b094e0e3d6aec94a341387e1ef969 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Fri Dec 8 12:47:46 2023 -0700 Quick memory allocation/free mutex for multi-device use cases. commit ea7a1b897287b775d3484b5518fdae3a8c360fca Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Tue Dec 5 12:59:32 2023 -0700 More work on regressions, fast-math mode, hip performance, etc. commit 40365ba12a309e9bed09b572d3bd5cfeef5e3f5b Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Tue Dec 5 13:02:21 2023 -0700 More work on regressions, fast-math mode, hip performance, etc. commit 9b75e1182b49417fe33d5deeae1985dae996d126 Author: Patrick McCormick <> Date: Tue Dec 5 08:52:30 2023 -0700 Working on some issues surrounding --ffast-math: 1. ABI conflicts between the host stage and our module offload generation (e.g., host side passes generate vectorized code that is not supported on GPU backend(s). 2. Host architecture-centric tweaks occur before our GPU transform. That leads to addressing host architecture specific details as part of the transform (e.g., aarch64 and x86_64 will generate different calls vs. sticking with llvm intrinsics). A combo of ABI issues and/or the fact we're too late in the pass pipeline to address this with the current design means more work lies ahead... commit 2668be237f4c59e7b33a3e27117324db876e486f Author: Patrick McCormick <> Date: Mon Nov 27 15:09:29 2023 -0700 work on hip performance details. commit 243ff1146491b06e29484ac48641d2b3fe48b03c Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Tue Nov 28 12:52:52 2023 -0700 Testing streams and odd stalls (UVM?). This version seems to remove the stalls but also on a system with a newer kernel drop... CUDA only at this point. commit e7d0c0985a446b33e4a84efb1dee3f4067a5e8bc Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Mon Nov 27 15:02:47 2023 -0700 Working on some runtime tweaks and clean up. Traced a new crash to the use of a ptxas whole-program optimization flag. commit 9512eb5fc842ca6582fe2eff885db824a5c6f728 Author: Patrick McCormick <> Date: Fri Nov 17 08:45:58 2023 -0700 More work to setup the tests for better HIP and CUDA target flexiblity; including some reduced complexity the command line arg details in the makefile(s) (e.g., strip mining flags for GPU targets moved into the config files vs. being necessary in the makefile setup). Added better (correct) AMDGPU target attribute selection based on multiple target options (prior version was too hard-coded for gfx90a). commit afdb9c2d28d738f95c51107acaccce5c766363f2 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Thu Nov 16 12:59:11 2023 -0700 A bit more verbose and shared cuda and hip feature management (e.g., streaming modes). commit 47979332e0e1487b79b5334d77dfb2705db721f4 Author: Patrick McCormick <> Date: Thu Nov 16 12:58:38 2023 -0700 Bug fixes for new prefetch feature set. commit 6875ccffb64df68de22272c135379efd8b9a51e3 Author: Patrick McCormick <> Date: Thu Nov 16 11:05:27 2023 -0700 More work on HIP performance debugging... commit 3ac2d66f2a6009dc286a32c7b8cecf61fa34e96b Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Thu Nov 16 11:01:31 2023 -0700 First cut at CUDA prefetch streams support. Needs testing... commit 7e3a8a24519f207fd488a821a992ee44bc7d62a1 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Thu Nov 16 08:14:07 2023 -0700 Some refactoring for HIP details, bug chasing, etc. commit 3f1e09e7d2210da432350fdfeb19b4a9c0df3454 Author: Patrick McCormick <> Date: Wed Nov 15 09:40:07 2023 -0700 Some hacking for trying to debug AMD HIP code gen/runtime issues. A few new environment variables to make chasing (our tails) easier... - KITRT_THREADS_PER_BLOCK=1024 (default 256) - KITRT_MAX_NUM_PREFETCH_STREAMS=2 (default 4: size of round-robin stream queue for concurrent prefetch calls) - KITRT_DEVICE_ID=5 (default 0: change the default GPU selection) - KITRT_MIN_WARPS_PER_EXEC_UNIT=1 (default 1: reducing resource usage per warp -- impacts register allocation, etc.) The prefetch stream queue is enabled via the command line with "-mllvm -hipabi-streams". commit d3f74a006f866895fe326c3d87824712944ebda4 Author: Patrick McCormick <> Date: Wed Nov 8 20:43:08 2023 -0700 Some cleanup and work to try and chase down HIP target runtime variabilty. commit c2bb71e9dbee9404c7607cb2682936369bf2c657 Author: Patrick McCormick <> Date: Thu Nov 2 13:04:11 2023 -0600 chasing build issues/warnings/errors. commit b4bafb426e7fe01ec25132a0df3f240086b3ac0e Author: Patrick McCormick <> Date: Thu Nov 2 09:04:25 2023 -0600 Chasing bugs... commit 92997086a8d2fafcd26cead1b7e814565fc3d3e9 Author: Patrick McCormick <> Date: Tue Oct 31 16:23:02 2023 -0600 working on benchmarks commit 74cb34fc1975487810284cce3fd4eefc347b1a82 Author: Patrick McCormick <> Date: Wed Oct 25 14:31:56 2023 -0600 Exploring full kokkos builds w/ clang. commit 4dc3221636d7aa8ecacb5ed29c4d0873c7c76ae2 Author: Patrick McCormick <> Date: Tue Jun 27 14:13:50 2023 -0600 Some cleanup and small tweaks. commit 5136b2416af025d5e1fb38344d87e82ca9d4aa70 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Wed Nov 8 20:27:27 2023 -0700 Attempt at a quick multi-stream prefetch feature. commit 8ece574c8f8aaf4676525bb7368aae5677b5a193 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Thu Nov 2 11:48:12 2023 -0600 small tweaks to sort out some performance details. commit 9b006692706e3eec247b58ee091b8a12c6f1cc7c Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Wed Nov 1 20:38:03 2023 -0600 Tweak in attempt to debug potential numa issues that are impacting consistent performance across multiple application runs. commit f5d53a62aa38a6ac3aeaad5556bdbf0d63b08748 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Tue Oct 31 14:46:29 2023 -0600 A bit more cleanup and adding new tests specific to kitsune. commit ff078df1402067b57c3647a57b23820b1726a218 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Tue Oct 31 09:47:10 2023 -0600 A bit more cleanup and adding some infrastructure for the multi-target test code (added makefile and a kokkos version). Not all the pieces are in place to fully test. commit d884674842afe05ece0a815318c7275c1b913c93 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Tue Oct 31 08:53:36 2023 -0600 Clean up some code cruft -- no need to duplicate else branch cases. commit 88bc75baea41bc83078d3a7e7dff5f3e4820868e Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Tue Oct 31 08:39:57 2023 -0600 Forgot to save a cleaned up comment... commit bd7941e589309dcfd311a3a0ec5e459128ad54dc Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Mon Oct 30 20:01:09 2023 -0600 New code to handle tapir attributes on Kokkos "statements". Some new code for cuda memory management details (calloc, realloc, etc.). Along with some prep work for upcoming memory management and movement changes. commit 4f4585aa7f7c3d0c7029023e9504f0fd245b2914 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Wed Oct 25 14:27:27 2023 -0600 Tweaks for numpy allocation entry points. commit ce84cb404073bed4be9f0af1eb4c1106f5300a8c Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Thu Oct 5 09:31:20 2023 -0600 Small tweaks to remove some unnecessary code. commit 6a7d0af3c4abca1a15a6cd90122ad7b0c8831885 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Wed Oct 4 17:10:39 2023 -0600 Allow cudaabi target to be selected via the enviornment (for JIT use cases). Small tweak to loop spawning code. commit 0b9c7f8d0da00cb368b8309092ea6e9b6bb9790c Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Tue Oct 3 09:41:37 2023 -0600 Bug fix for the logic around invocation of post processing modules without parallel constructs. commit 5c56cb918429dc38aa225c96b51086c3afe824ac Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Mon Oct 2 13:21:44 2023 -0600 Some minor runtime tweaks to try and capture a call path for auto initialization in cases where we might not have an easy path to global ctors. changes. Lines starting # with '#' will be ignored, and an empty message aborts the commit. # # On branch dev/16.x # Your branch is up to date with 'origin/dev/16.x'. # # Changes to be committed: # modified: kitsune/runtime/cuda/cuda.cpp # modified: llvm/lib/Transforms/Tapir/CudaABI.cpp # modified: llvm/lib/Transforms/Tapir/HipABI.cpp # commit 483af046930b981d521a9da7e4bcac2ed1b64721 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Fri Sep 15 12:23:29 2023 -0600 A better (ABI-independent) path for avoiding calls to postProcessModule() on code where parallelism was not transformed/discovered during loop spawning. commit e0af8c3c8f0533f6612c664a10b942486f4d314d Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Tue Sep 12 09:33:40 2023 -0600 Bug fix for -ftapir gpu targets when no parallel loops are encountered in input module (overly strict assertion replaced with kernel module content check prior to starting postprocessing phase). Added a "no forall" chunk of code to the experiments. A bit of clean up over the various experiments to keep the overall output details identical. commit 258a71efb43e8e8e6e739d531d39c2f4362123ba Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Mon Jul 10 12:16:38 2023 -0600 Some tweaks to the runtime for smarter data movement and cuda/hip "hints". commit 7d48bb1bd420eb1a6c7ed5604be2429bf2ab0cb6 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Tue Jun 27 14:02:03 2023 -0600 Prep for new prefetch analysis functionality. commit 19df680b1b1bf95eee7f83e202ae24f0a8001cd1 Author: Patrick McCormick <> Date: Thu Jun 22 12:46:19 2023 -0600 bug fix due to kernel naming issue. commit d143626960ffd1e2d2c42e87f2fca7580ec4d9c4 Author: Patrick McCormick <> Date: Thu Jun 22 12:45:25 2023 -0600 Some clean up, testing, and a fix to bring our forall sema up-to-date w/ clang 16.x. commit 24e4135942d28ae056120a04ba01e1d15f1cdd47 Author: Patrick McCormick <> Date: Thu Jun 22 12:44:10 2023 -0600 Tweaks and changes for exploring new targets supported by 16.x... commit 8edad4c1b42c06d4fce288866c68451fc8942ed7 Author: Patrick McCormick <> Date: Thu Jun 22 12:43:09 2023 -0600 Some clean up, testing, and a fix to bring our forall sema up-to-date w/ clang 16.x. commit ed62938d3d75731505830649067bf79654a6badc Author: Patrick McCormick <> Date: Thu Jun 22 12:41:00 2023 -0600 Tweaks and changes for exploring new targets supported by 16.x... commit fb81e370b8bcbc73c2ed5cd070f04de0ef988ef7 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Wed May 24 08:59:33 2023 -0600 Fixes for kernel module optimization levels (failed at @ -O0). commit e4435c01888970e90140247d16e22a91d495c176 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Thu Jun 22 11:36:27 2023 -0600 only build kitsune-supported experiments by default. removed some verbose feedback during compilation. commit 3ce803ea5690e076f25c5fe4ab8c51118b5d0ba1 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Thu Jun 22 09:20:14 2023 -0600 bug fix due to kernel naming issue. commit 82dcb7bd4dfe397e440d75643de7c7c89fa60a3a Author: Patrick McCormick <> Date: Mon Jun 19 10:58:16 2023 -0600 Tweaks and changes for exploring new targets supported by 16.x... Clean up accidental comit of merge conflicts... commit 4ec1a4b86f6a3219bbaa81dc18ff18d635cb1a42 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Wed Jun 21 12:59:31 2023 -0600 Some clean up, testing, and a fix to bring our forall sema up-to-date w/ clang 16.x. Clean up missed conflicts in source... ???? commit f958b8a25f432f720b62af50168667b39d25d562 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Wed May 24 08:59:33 2023 -0600 Fixes for kernel module optimization levels (failed at @ -O0). commit 1c08014a03d6ccd986cb3444ea5a43cdc812b9d6 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Wed Jun 21 14:10:34 2023 -0600 start of some docs. commit 4d12699dfea25b6e80321912cd84706187445117 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Wed Jun 21 12:59:31 2023 -0600 Some clean up, testing, and a fix to bring our forall sema up-to-date w/ clang 16.x. commit 2a916da2885cdc7cec060399847209c492c28458 Author: Patrick McCormick <> Date: Mon Jun 19 10:58:16 2023 -0600 Tweaks and changes for exploring new targets supported by 16.x... commit 33ae54c3bf4b6d576186b1984d702dd211bb5ff4 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Wed Jun 21 12:59:31 2023 -0600 Some clean up, testing, and a fix to bring our forall sema up-to-date w/ clang 16.x. commit 967b4cc21c319fff43413367be52415e73ced84a Author: Tarun Prabhu <tarun@lanl.gov> Date: Thu May 25 10:35:36 2023 -0600 Fixes to get the AArch64 target to build after merge with 16.x. commit 6429e61f74e0f67975880834cc1972b603fb52f1 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Wed May 24 08:59:33 2023 -0600 Fixes for kernel module optimization levels (failed at @ -O0). commit cb6f5aecf6e827b008bb178845d4a68bd9bd8fe3 Author: Tarun Prabhu <tarun@lanl.gov> Date: Tue Apr 4 14:22:13 2023 -0600 Merge with 16.x This includes all the work by Pat McCormick <pat@lanl.gov> to add support for AMDGPU.s This also includes an overhaul of the kitsune/experiments directory and everything in it. There is still some work to be done - for instance, getting rid of the legacy pass manager for everything except backend code generation, but this should be in a functional state for now. commit 3264b97f507c0fcea3c7157311c7fd4128806903 Author: Tarun Prabhu <tarun.prabhu@gmail.com> Date: Mon Oct 17 12:03:11 2022 -0600 Merge with 15.x commit 9c15cbc84e6ad99a41cfd20da2f06e3d0a213368 Author: Alexis Perry-Holby <aperry@lanl.gov> Date: Fri Sep 23 10:01:40 2022 -0600 more minor build fixes commit 7d460b484d781ad943d5786bdaf4d1aafd0fce1c Author: Alexis Perry-Holby <aperry@lanl.gov> Date: Thu Sep 22 15:59:10 2022 -0600 minor build fixes - header file moved commit 4f5f1acf32d1e6f0385bd46a798ab67c2d400ae2 Author: George Stelle <stelleg@lanl.gov> Date: Thu Sep 22 14:21:51 2022 -0600 14.x fixes commit 7c39485fe83abf561a5ce76fdc76da27b9feac82 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Thu Sep 15 08:39:16 2022 -0600 Avoiding some junk files. commit cabb5c7011247e1d681fee2d972ebde5a9968452 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Wed Sep 14 15:42:05 2022 -0600 Typo fix. commit 8e90393e9e688be52811bd914fac7a6854696dfe Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Wed Sep 14 15:41:41 2022 -0600 Typo fix. commit 42576683abb6d21d01936e0133c90115f2c3f117 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Wed Sep 14 15:32:44 2022 -0600 Verbose mode addition to cmake. commit c4d8d446278924d2302dc589450c08af1833ff4a Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Wed Sep 14 13:56:21 2022 -0600 Updated docs and added missed experiment for the memory access attributes. commit fe5acc203e1f617040278692983fd603ad748a98 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Wed Sep 14 12:40:38 2022 -0600 Clean up some and merge in Alexis' memory access attributes. commit 1a971bb38f2ea9efa5301e84573cabd2d8f0701b Author: Patrick McCormick <> Date: Mon Aug 22 14:04:23 2022 -0600 Some code cleanup to get running on Darwin for testing. commit 713f9c53e732b24779fa2e6d2dc296df55885b66 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Mon Aug 22 09:57:06 2022 -0600 Fix experiments to account for shuffling of headers in the kitsune runtime organization (still not happy with it but at least things appear to work now). Also fixed a commented out stream sync call in the runtime from the stack overflow debug-a-thon... commit 8564169120ab16f70170295069b81ac0561d5cd3 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Fri Aug 19 15:51:46 2022 -0600 Bug fix in runtime (too many modules!). commit c8364e5f9724f37c7a36792f234ecb56ea0f9816 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Fri Aug 19 15:34:30 2022 -0600 Fix build issues. commit b24f76c9001ab4ac78e2aa2134c800aa34e73d0a Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Fri Aug 19 14:55:08 2022 -0600 More cleanup and missed some files on the last commit. commit b018ae9432a0912c2c5869bedf37841f95070267 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Fri Aug 19 14:50:46 2022 -0600 Some continued reorg of the runtime structure. Bug fix for stack explosions... commit 8b3b526743971c8554c48a6d797984dd38d6d2ba Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Wed Aug 17 12:04:35 2022 -0600 Hide some generated data files, exeutables, etc. commit b6549fbc71fc014fb6f2217d3d1d30b2073291f4 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Wed Aug 17 11:54:53 2022 -0600 Restructure runtime source to provide one library (makes life easier in the compiler/clang code base too). Using NVIDIA's HPC SDK seems to trip up some aspects of CMake's cuda package (set CUDAToolkit_ROOT to address this). Continued work on trying to track down GeForce crashes. Some new code for tapir target attributes. commit b95256c02f553f508874f403bfbf4c2700c86ea8 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Mon Aug 15 14:16:04 2022 -0600 more work on optimization passes, multi-target code, etc. testing across multiple systems trying to narrow down what appears to be a GeForce-only crash in kitsune-generated executables (with long running times). commit 1aea89636aa4a51fa7af7ecdc525a1d882ed264d Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Wed Aug 10 14:44:21 2022 -0600 working through more attribute code details and also fixed a command line bug in the cuda target transform that would allow both optimizations and debugging to be enabled (this is not currently supported by ptxas). commit 562578c2b22afa24aa4aa703b814599bbd9039e8 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Wed Aug 10 11:08:13 2022 -0600 Some minor tweaks to the runtime trying to find issues related to a large timestep count crash in the euler3d experiment. Tweaks to update tapir target attribute support to match some new clang features. commit aada2f3dd7bcaf1c62f172c25ba1181f06dd47a0 Author: Patrick McCormick <> Date: Mon Aug 15 11:47:04 2022 -0600 Testing for issues related to geforce crashes. commit 5a2e207206a4b2758a83cca1cea15228b393f715 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Mon Aug 8 10:41:32 2022 -0600 working to track down cuda crash on large time counts (time steps). moving to a "friendly" spot for some debugging help... commit 661d514d3b63b3171ae5837d2840a41f5d90e485 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Fri Aug 5 13:18:31 2022 -0600 Missed the no-view euler3d code. commit 2abef21c66dca99ac68c0418f7e00cd057c5cb1c Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Fri Aug 5 13:05:52 2022 -0600 Fix for bad codegen when forall loop iteration variable type differs from runtime type (e.g., trip counts not the same type). This fix adds a cast when necessary... tweaks to buid a non-view based version of the euler3d experiment. commit 923b68e27fdd9a9df4cb655d78e83e1dada36488 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Thu Aug 4 16:32:05 2022 -0600 Updates for some issues related to performance and new tests/experiments for digging futher into some UVM performance impacts. CUDA target transform changes: 1. Removed vectorization pass. 2. Ran additional inlining pass post PTX-prep transformations (as well as supporting passes). More experiments needed here. 3. Fixed some issues with ordering of PTX-level function renaming and some general code cleanup. Runtime: 1. Poking at some additional logic to support "auto" prefetch. 2. Playing with some runtime hints to the page system; so far they seem to have minimal (no?) impact. 3. Some general code cleanup. commit cab0a4c47f803dc6599fc349e768cbef396ce496 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Thu Jul 28 08:54:00 2022 -0600 Tweaks for clang builds (CUDA and GCC 12.x don't play together so well as CUDA headers #define noinline -- quick workaround is to tweak the define in CUDA host header file). commit 1875b778c50f7216334140899997a2bb59522b4a Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Wed Jul 27 15:20:57 2022 -0600 Serial version of euler3d and a test data set for running. commit 527244f423107d46f98dbe3f00987b7eccf330e4 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Wed Jul 27 15:20:16 2022 -0600 Kokkos version of the euler3d experiment. commit 78bc380bbfea9c5cc19ab54210f33a9f56251eb5 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Wed Jul 27 14:59:51 2022 -0600 Fixes for the euler3d code and addition of attributes for kitrt allocated memory buffers. commit 2ffea09d23373cc54f7a1b7046f70c8c1cd34eca Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Tue Jul 26 08:44:46 2022 -0600 Fixes for tranforming function calls into libdevice names (bug when we had multiple occurences of the same function across kernels). Also fixed a issue where we were not transforming cloned decls for functions that map into the cuda's libdevice library (module). Code still needs some cleanup, better checking, and some more test drives. Pushing this up as it is likely the bugs above will trip up more complex use cases. commit 3c2257180e1088e2780de0a4853ddefdb3fc27cc Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Thu Jul 21 19:05:21 2022 -0600 Quick fix for runtime compilation issue. Still some compiler debug messages in place. commit a310d8a6782b4f4320315a95e7c63bdbf4e1fc47 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Mon Jul 18 16:43:46 2022 -0600 Some fixes, simplification of srad benchmark for measuring overall runtime (in prep for graph rt code gen and support). New Rodina benchmark (euler3d) in forall form. commit fc898895bc9a5b1d697ba3da93151ef1b45b11d2 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Thu Jul 14 13:05:47 2022 -0600 Fixes for kokkos dual view code -- was missing some required sync points for correctness. commit df4969e51688d2a0dca2d9809c38e4d05ba01d9a Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Thu Jul 14 10:58:03 2022 -0600 Fix for memcheck error and some cleaner code gen given we want to fix the grainsize to 1 for the cuda abi. commit 7d049aa91f511454bb3782333f2ef4b4d09859b1 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Wed Jul 13 12:16:35 2022 -0600 Working on a bug. commit e3ade8a3038d165397d370bfbd57b64246d16ff1 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Wed Jul 13 12:16:03 2022 -0600 Clean up some vscode stuff so it won't stop on other's settings. commit 2bda5f1f99b388f569429cac58db8250d23ad1c8 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Tue Jul 12 10:41:23 2022 -0600 Still working on the srad benchmark -- trying to determine if there is a runtime prefetching impact. commit 7f08e2d80a16e0acca21ac532aa7cf86dd00f4cb Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Mon Jul 11 16:36:14 2022 -0600 Working on the srad benchmark. commit db1a00cc91e55f4c52a5edfc47afae3c0865fa0f Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Thu Jun 23 14:15:44 2022 -0600 More detailed timing reports for the srad example. commit a4bf6eefd524d24241d00d3da040d5092ff47759 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Thu Jun 23 14:02:39 2022 -0600 Fixed a mistaken benchmark name in the readme, working on trying to add similar infrastructure for the srad example/experiment. commit 51fb636551864c0644b406361834268e747b6b12 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Thu Jun 23 10:34:20 2022 -0600 Missed the new files. commit df62e2f80799748a0fa959d0f9ee3e0c2b559c1d Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Thu Jun 23 10:32:34 2022 -0600 More updates to experiments/examples for generating and tracking results and a bit more documentation. commit a00c99a5e07f4e9fad4a46ce4338920a6e9eb811 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Wed Jun 22 16:19:50 2022 -0600 Cleaned up some of the benchmarking bits with an eye towards CI and regression checks on the performance front. commit 077d6e9b59242e32a232677c8e73ebae3d434f6c Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Wed Jun 22 08:36:21 2022 -0600 forall and kokkos versions of the Rodinia srad benchmark. TODO: The forall and serial version of the code have similar final results but Kokkos looks to be much futher off. Could be a missed sync between the host-device dual view data but have not tracked it down yet... commit 49d6f38c167db5a66c58c06862d999137a99bb38 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Tue Apr 26 16:40:13 2022 -0600 Overhaul of the CudaABI tranform and portions of the Tapir infrastructure. This includes some bug fixes as well as an approach for post-processing the transformed code at the module level. Handles const global variables (creates device side and issues host-to-device memcpy of values -- something you can't do in CUDA/Kokkos without using __managed__). There are restrictions on this capability but checks are not yet in place to enforce those restrictions. Performance appears to be on target w/ previous version (i.e., up to 2-4x faster than Kokkos at best and on par with both CUDA and Kokkos for simpler code bases). This currently relies on UVM memory allocation and prefetch call code generation prior to kernel launches (default behavior now). commit 27ab5791380f0e147feadc26693946b9ff03da0d Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Fri Apr 22 09:52:56 2022 -0600 Comment out some debugging code. commit 3b13b65416c431163d3d478df71c896bc800a6ef Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Mon Mar 28 08:58:50 2022 -0600 This is a squash of several updates for the CudaABI transform and the supporting kitsune Cuda runtime components. There are several new features and capabilities for driving the transform via the command line (-mllvm -cuabi...), support that should allow for support across modules (compilation units), code generation of prefetch via coordination between the compiler and runtime, various bug fixes, and a few experiments that include hand-coded and tests as the features of the transform advanced. TODO: More work needs to be done to update the docs and such to capture the full scope of the feature set but things seem stable enough to unleash the next phase of testing across the team. commit 2dec7b5144bab11ab1f510eef96957ea28c83513 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Mon Mar 28 08:58:01 2022 -0600 Initial set of additions/updates for the new CUDA ABI transform. commit 1a808c84a8c690775c23f20b69fc73dc963ca049 Author: George Stelle <stelleg@gmail.com> Date: Tue Jun 21 16:52:54 2022 -0600 Updated GPU LTO extern example commit ccc264b0a35f927c7ced4876331e97bcccf62def Author: George Stelle <stelleg@gmail.com> Date: Tue Jun 21 16:11:38 2022 -0600 Added GPU abi to opt command line handling commit 3b67e8324f3fd7a8072dd2da2cb3426786bf360c Author: George Stelle <stelleg@gmail.com> Date: Tue Jun 21 16:09:21 2022 -0600 Working link time lowering to tapir targets commit fd9d42f050b40d3e1154de410f22f9151ed155f5 Author: George Stelle <stelleg@gmail.com> Date: Tue Jun 21 10:20:19 2022 -0600 Added externs example commit 199b4c07caf4a8065ef44013eeeee6e69840ceac Author: George Stelle <stelleg@gmail.com> Date: Wed Jun 15 06:54:51 2022 -0600 Added GPU abi to argument serialization commit dde26a27e37c93b2b0090133e5331f2136d037fe Author: George Stelle <stelleg@gmail.com> Date: Wed Jun 8 12:29:16 2022 -0600 Fixed libopencilk bitcode clang driver check commit 47176b0dc178cd871a99e533a384499267a50016 Author: George Stelle <stelleg@gmail.com> Date: Tue Mar 22 15:11:29 2022 -0600 Merge pull request #41 from shevitz/rf basic Kokkos parallel_for implementation commit f3f89ccabaab11a6a95e7a86be91eccc5fea8746 Author: George Stelle <stelleg@gmail.com> Date: Fri Mar 18 12:48:44 2022 -0600 [kitsune] 13.x fixes commit 176e85e645c1cfe39ce2135cee21754587d37b05 Author: George Stelle <stelleg@gmail.com> Date: Thu Feb 3 10:05:20 2022 -0700 Added metadata copying for GPU kernels commit a29ff78aadd554e0283f95ec2986b48db5fa4149 Author: George Stelle <stelleg@gmail.com> Date: Mon Jan 10 12:27:00 2022 -0700 Added always_inline attribute to gpu kernel callees commit 20448a029970b4409d66a2b5d6ec1958c462c572 Author: George Stelle <stelleg@gmail.com> Date: Mon Jan 10 11:27:44 2022 -0700 Add stripmine-loops check to be able to disable stripmining commit 2c05bd12e3c994a6ebb45c15392c1dca8ed59558 Author: Danny Shevitz <shevitz@lanl.gov> Date: Fri Jan 7 14:12:37 2022 -0700 changed the <KITSUNE> tags to descriptive comments. commit ca0aa787b8e711e8f5b5fb08d735aed56b4a064f Author: Danny Shevitz <shevitz@lanl.gov> Date: Thu Jan 6 14:55:19 2022 -0700 refactored forall range loops to use the same helper functions as forall commit c43e55777110718c6dfa48a31c6b3d6af8405dba Author: Danny Shevitz <shevitz@lanl.gov> Date: Wed Jan 5 17:48:39 2022 -0700 changed forall EmitIVLoad to shallow copy form for structs commit eda8520c5118a700d219cd7f77def6bb09066cb3 Author: Danny Shevitz <shevitz@lanl.gov> Date: Tue Jan 4 11:02:43 2022 -0700 fixed the loop end bug in range based for loops commit 1e7a144da050c7c4df88624ad51e6826293cb986 Author: Danny Shevitz <shevitz@lanl.gov> Date: Thu Dec 2 12:30:32 2021 -0700 unfortunately found a race in forall range, so committing before checking out master commit 1ccbfa922f02f4c0a01d916b413afd532f78d4e1 Author: Danny Shevitz <shevitz@lanl.gov> Date: Mon Nov 22 17:17:03 2021 -0700 Removed the Continue JumpDest in favor of explict Condition and Increment JumpDest's commit 5c6dbab3558b221f126f1e6ee86afcf53eebdda4 Author: Danny Shevitz <shevitz@lanl.gov> Date: Wed Nov 10 11:56:21 2021 -0700 forall refactor is working, and the code is mostly cleanup up commit 5e99b03af7275aefb2769ed4a8f584705eb0ebcf Author: Danny Shevitz <shevitz@lanl.gov> Date: Mon Oct 11 08:52:43 2021 -0600 minor changes to pull pat's stuff commit 6897db57bc6076fed03d35ffda601d47e48e7714 Author: Danny Shevitz <shevitz@lanl.gov> Date: Thu Sep 16 11:54:05 2021 -0600 tried to refactor forall, but it's buggy and I need to checkout release/10.x to see what's going on commit a9d6d8ba9291b9eae0cbe0667d7cff42a46af62b Author: Danny Shevitz <shevitz@lanl.gov> Date: Wed Sep 8 15:26:28 2021 -0600 first commit on new refactor branch commit 9df375cca94e4b4c118fde0399ff47c3f81a73d8 Author: Danny Shevitz <shevitz@lanl.gov> Date: Mon Nov 22 10:45:51 2021 -0700 changed kitsune-dev.cmake commit db037f8d15eadec12446b9c425da6f88f856b712 Author: George Stelle <stelleg@lanl.gov> Date: Tue Dec 21 08:58:06 2021 -0700 Fixed GPU codegen commit 001043d4067715257f6d7d8e301c9c865373cb5b Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Thu Nov 18 08:59:54 2021 -0700 Adding back in some tidbits that now appear to work correctly with the issues addressed in the previous commit/push. commit 66a3a2bd7b5b9c578de73bebd1e90f4e92177ef6 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Wed Nov 17 13:54:55 2021 -0700 A few quick fixes for clang driver code for config file support that was lost in the merge with 11/12.x. A few related cmake tweaks that are potentially still a bit buggy but working better now with some addition for pulling from cilk v1.1 repos. commit f35828658a26be6e4f4ebb1071c97c92553bb20e Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Tue Nov 16 11:34:01 2021 -0700 Fix cmake fetch stuff -- looks like we lost the configuration and include lines as part of the merge with upstream. commit 099cab25cafcc0cd7a6d78b474691b2ca7f88fc3 Author: George Stelle <stelleg@gmail.com> Date: Tue Nov 2 12:55:25 2021 -0600 Removed github additions commit 7577da9dc7d43bc6ac585428fad652e7ef33b78b Author: George Stelle <stelleg@gmail.com> Date: Tue Nov 2 12:38:16 2021 -0600 Added c++ condition to extern commit a137c5c2deb40060fc23d20e031faa89c2a06550 Author: George Stelle <stelleg@gmail.com> Date: Tue Nov 2 12:23:37 2021 -0600 Tapir abi 12.x merge fixes commit 65a36adfaa9f7a3e97b517dc93214abf3be7d1b6 Author: George Stelle <stelleg@lanl.gov> Date: Wed Oct 13 10:06:35 2021 -0600 Added bitwriter lib dependnecy for Tapir GPU backend commit 90ab9e61d257f5e3c2d1aaf90db0d520b5f99514 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Thu Oct 7 16:09:28 2021 -0600 Added cilktools. commit 6fe60d06e69e58b2696420b783663e29c30a75b5 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Wed Sep 8 15:00:40 2021 -0600 Merged changes from George and moved new GPU library into location for the new build system. commit 6f987d1478b16e7bf088c8c86c53bbddb60736c3 Author: George Stelle <stelleg@lanl.gov> Date: Wed Sep 8 14:04:02 2021 -0600 Added initial gpu runtime commit 7257f5bae9191b7ffd8c38ce1c1aaa605389fd6f Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Wed Sep 8 14:02:44 2021 -0600 prep to merge w/ an update from George. commit c04f21ebf5dd880870ec0dd097a9c11e11db279a Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Wed Sep 8 13:50:47 2021 -0600 New unified runtime abi for GPUs. commit 3d682af1c0744fcaa1d2522d3228760111528fd2 Author: George Stelle <stelleg@gmail.com> Date: Fri Sep 3 12:47:47 2021 -0600 Codegen looks good commit 19227307eebdb27537e4b5fb95ad59ca906f67e3 Author: George Stelle <stelleg@gmail.com> Date: Wed Sep 1 11:12:53 2021 -0600 Handling kernel arguments commit e5ee38f76c163c4467c1f6c4ed31b272a93c5fc1 Author: George Stelle <stelleg@gmail.com> Date: Tue Aug 31 15:20:00 2021 -0600 Initial GPUABI commit commit 3014ee0448cf47963a83d5770a6e14764e1f664e Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Tue Sep 7 10:47:28 2021 -0600 A few last-minute fixes before release: 1. -fkokkos mode forces -fno-exceptions to avoid code gen issues around parallelism code gen and internal exception mechanisms. 2. following #1, added a patch to #ifdef out exceptions (try and catch blocks) in Kokkos memory spaces. 3. fixed a bug in the realm ABI that only showed up for some examples -- would crash the compiler via an assertion on the types of a binary operator. 4. tweaked some of the kokkos examples to actually behave like a good kokkos program should... commit e63a4d5bcfe979e2832b7f970540671dc3c7517a Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Tue Sep 7 10:25:17 2021 -0600 Fixes for cuda abi. commit 668ceed82908c56ea6e390feafcdbf2998642b8d Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Wed Sep 1 16:58:46 2021 -0600 Playing with kernel approaches. commit 25f3dbf5d2e83f3bf1a46806ccd413a800d655c3 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Wed Sep 1 16:12:31 2021 -0600 Some clean up to remove the cudakit target (towards unification across the toolchain between tapir, kitsune, opencilk, etc.). New cuba abi runtime target code to simplify code gen -- needs to merge design ideas with George's current multi-architecture approach. commit 8f21b471808e468a5a00356d8883bd647ac2b9c4 Author: Patrick McCormick <pat@darwin-fe3.lanl.gov> Date: Thu Aug 26 13:21:46 2021 -0600 Yet another attempt to address build issues that are clearly a race condition on build order that can vary between parallel job size as well as platform (e.g., faster systems are builds are more likely to fail). This hopefully stems from a dependency issue but could also be impacted by bugs in either ninja or cmake. Would suggest updating to use the latest ninja (1.10.2) and cmake -- both were used on the final pass of testing before this commit. commit 16eef860f853122f748c88ff00692deab00cdc93 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Mon Aug 23 08:59:41 2021 -0600 Fixed bug for kitsune.h and install target. commit 1ba0104908dbb2b7c89ba69dec29dc43a4af9a3f Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Mon Aug 16 12:30:52 2021 -0600 More work to sort through adding the Realm ABI target into the build system w/ realm as a FetchContent component. This means some dependency and target work within cmake, some ordering details about where the FetchContent occurs, and also some fixes to the code base that were lost for realm support (e.g., missing switch entries). There is some point where the overall cmake config becomes more difficult to reason about than the llvm+clang+... source code. ;-) A few additional fixes for the config file (.cfg) settings and supporting cmake pieces. The default config files are autogenerated to handle both the in-tree and install use cases. It is probably worth tailoring for your own use cases at some point but adding them to your user directory. TODO: there is currently a "collision" between the use of a cmake option and the ABI libraries to build within kitsune. At present you will have both enable and add the abi library to the configuration (see the cache file under kitsune/cmake/caches/kitsune-dev.cmake). commit fbde04b2b248b84d06bac8e624916f397dd3764b Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Mon Aug 16 11:16:16 2021 -0600 Fixed merge to support LD_LIBRARY_PATH search for the opencilk bitcode file. A change from TB post-1.0. commit 171fd764cdf4743a24da18c47007e280e1412151 Author: Patrick McCormick <651611+pmccormick@users.noreply.github.com> Date: Mon Aug 16 10:05:52 2021 -0600 Tweaks to address some issues with the -fkokkos option and the addition of command line arguments to align with the full toolchain build. One issue is that link libraries that appe…
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.