Releases · StanfordLegion/legion

26 Sep 16:57

elliottslaughter

legion-24.09.0

4a03402

Version 24.09.0 (September 27, 2024) Latest

Latest

Legion
- Bug fixes for control replication and multi-node configurations
Regent
- Fixes for ROCm 6.0 code generation
Tools
- Legion Prof now uses subcommands (e.g., legion_prof view) to clarify which options apply to which actions
- Legion Prof now tracks backtraces at the points where blocking wait calls are performed by the application
- Legion Prof reports more detailed timing information for tasks
- Legion Prof calculates clock skew between nodes and reports it when relevant
- Commonly used features of Legion Prof are now enabled by default
- The old Python Legion Prof implementation is no longer supported
Realm
- Point fields x, y, z and w have been replaced by methods
- Support for launching CUDA tasks onto a CUDA stream asynchronously via cuCtxRecordEvent without the need of CUDA hijack
- Support for CUDA fabric sharing
- Support for host-to-host copies via CUDA DMA
- Support for querying number of NUMA nodes from the NumaModuleConfig
- Added reference counting for preimage operations
- Make std::atomic as the default atomic implementation
- Remove REALM_CXX_STANDARD, and bump the minimal requirement to C++17
- Implemented an ABI stable wrapper for GASNetEX
- Additional unit tests including CircularQueue, ReplicatedHeap, find_fastest_path, DynaamicTableAllocator, generate_gather_paths, TransferIteratorIndexSpace
- Dead code cleanups and bug fixes

Assets 2

28 Jun 16:22

elliottslaughter

legion-24.06.0

3f27977

Version 24.06.0 (June 28, 2024) – Nonidempotent Traces

Build
- Minimum required C++ standard is now 17
- Embedded GASNet build in CMake now automatically enables GPU memory kinds
Legion
- Support for nonidempotent traces (where the postconditions do not imply the preconditions of the trace)
- Deletions are now committed in program order, making it easier for users to reason about when their effects take place
- All tasks (and other operations) are now committed in order (a prerequisite for anticipated, but not yet implemented, precise exception support)
- Improvements to Legion's internal algorithm for virtual instances, fixing various correctness bugs in the implementation
- Improvements to the DefaultMapper handling of task layout constraints
Regent
- Improvements to make compiler more deterministic
- Improvements to auto-detect CUDA
- Support for complex numbers in std/format
- Static control replication (SCR) and RDIR have been completely removed. All SCR and RDIR related flags (-fflow-*) have been removed, except for -fflow 0 which is permitted (but no longer does anything, and now issues a warning)
Tools
- Restore profiler's ability to render dependent partitioning channels
- Render mapper information on mapper calls in the profiler
- Render user-provided profiling information in the profiler
Realm
- UVM support for the HIP module
- Error code support for command line parser
- Support for querying MIG devices from NVML
- Add indirection channel query
- Additional unit tests and bug fixes

Assets 2

27 Mar 16:14

elliottslaughter

legion-24.03.0

c610715

Version 24.03.0 (March 27, 2024) – Control Replication

Legion is an implicitly parallel, distributed runtime system for heterogeneous supercomputers.

The most notable feature in this release is control replication, a feature that we have been working on for many years that makes Legion dramatically more scalable in typical usage scenarios. In fact, the vast majority of users have already been using control replication, meaning that this is the first stable release of Legion which is usable (in a practical manner) for the vast majority of our users.

If you are not familiar with control replication, there is a wiki page that describes it, and of course the original paper.

As of this release, that means that the old control_replication branch is no longer being updated, and will be deleted at some point in the future. All updates from now on will go into the master branch, and it is our intention to avoid any long-standing feature branches in the future.

This release also finally removes some old Legion features that have been deprecated for nearly 10 years at this point. If you were somehow using those features, you will need to update to their replacements.

In addition, with this release, we are now packaging Legion Prof via crates.io. That means you can now install Legion Prof with:

cargo install --all-features --locked legion_prof@0.2403.0

(Note the version format is 0.YYMM.0. This is required because Rust uses semver while Legion uses calver.)

Full release notes:

Build
- ROCm 6.0 is now supported, and support for ROCm 4.x has been removed
Legion
- Support for control replication has been merged
- Support for discarding region contents on task completion
- Long-deprecated APIs, such as the old HighLevel namespace, have been removed
Mappers
- Default mapper support for control replication
- Default and null mapper now use C++ override keyword
Regent
- Support for pure projection functors that capture arguments
- Static control replication (SCR) has been deprecated and will be removed in a future release
Tools
- The profiler now correctly recognizes the logger format version and throws an error if it does not match
- The profiler now reports when a profile was generated with debug mode (or another expensive setting) was enabled
- Many profiler fixes for correctly rendering runtime and mapper calls
- Profiler now renders GPU device and host execution separately
- Optimizations to improve profiler memory usage and running time
- Rust profiler now requires at least Rust 1.74
Realm
- Support for registration of dynamically allocated buffers
- Support for handling poisoned events for reservation
- Refactor CUDA allocation and IPC paths
- Support for querying CUDA device information (GPU UUID and ID),process information (process ID, hostname, host ID) and timer calibration error from the profiler
- Remove address alignment from serializer and deserializer
- Support for creating network shared peers using IPC mailbox
- Support OMP thread binding and allow for multiple OMP parallel sections when enabling system OMP runtime
- Add Realm unit tests
- Fixes for Realm tests, sparsity map, MemoryQuery, dynamic framebuffer memory and memcpy channel

Assets 2

14 Dec 17:41

elliottslaughter

legion-23.12.0

8fea67e

Version 23.12.0 (December 14, 2023)

Regent
- Support for HIP multi-GPU per runtime
Realm
- Improve scalability of startup by replacing point-to-point communication with allgatherv for machine model announcements
- Support shared memory communication for system memory
- Provide sanity check for GPU tasks to detect any leak of CUDA streams
- Support for GPU transposes in CUDA-DMA
- Bug fixes for CUDA-DMA

Assets 2

28 Sep 23:38

elliottslaughter

legion-23.09.0

7304dfc

Version 23.09.0 (September 28, 2023)

Regent
- Elide future maps in index launches
- Improvements to Pygion interop
Realm
- Add a machine configuration API that allows applications to configure the machine model without using the command line
- Expose Realm managed CUDA/HIP stream to applications to launch GPU tasks without device-wise synchronization when hijack is disabled
- Change timers to use rdtsc
- Improve performance for getting highest priority task available in any task queue
- Implement framebuffer memory with cuMemMap
- Initial work for moving STL dependencies to header only

Assets 2

27 Jun 17:56

elliottslaughter

legion-23.06.0

7b5ff2f

Version 23.06.0 (June 28, 2023)

Build
- Fixes for CMake build on macOS
- Fixes for HIP build when arch is specified
Realm
- Support for better backtraces via libdw and libunwind
- Improve scalability and performance in task spawning by caching the triggering operation of an event if one is provided
- Fix a minor issue with affinity queries to properly clear the user-provided vector before populating it
- Add more accurate GPU memory bandwidth affinity calculations if NVML is available
- Refactor CPU core topology enumeration to serve systems without NUMA capabilities (like Jetson ARM systems)
- Improve scalability and performance of task spawning by moving event reuse freelists to be per-processor, reducing lock contention
- Add a microbenchmark for measuring task throughput more accurately
- Add a series of Realm API tutorials
- Replace CU_EVENT_DEFAULT with CU_EVENT_DISABLE_TIMING for better performance of CUDA events
- Support Kokkos interop for the HIP module
- Fixes for Realm tests on macOS
Tools
- Legion Prof now supports search in the new profiler UI
- Legion Prof now supports an HTTP client/server interface. Launch the server with --serve (on port 8080 by default) and attach a client to it with --attach http://127.0.0.1:8080
- Legion Prof now supports a new achival mode via the --archiveflag. Generate an offline profile and view it either via --attach or by uploading it to a server and navigating to https://legion.stanford.edu/prof-viewer/?url=...
- Legion Prof modes (client/server/viewer) are now parallel by default, and perform heavy computations off the UI thread for better responsiveness
- Add support for rendering indirect copies (i.e., gather/scatter)
- Fix rendering of profiles over HTTP with old profiler UI
- Fix profiling of copies with different numbers of hops between instances

Assets 2

27 Mar 19:01

elliottslaughter

legion-23.03.0

12f6051

Version 23.03.0 (March 27, 2023)

Build
- Minimum supported CMake version is now 3.16. (Some optional features may continue to require even newer versions.)
- Minimum supported GCC version is now 8.
- Minimum supported CUDA version is now 10.
Legion
- Added support for padded layout constraints to provide scratch space in instances for tasks to use (see examples/padded_instances).
- Added support for tiled layout constraints to provide an ability to layout instances by breaking down dimensions (see examples/tiling).
Realm
- An experimental UCX network backend has been added.
- Updated the Kokkos interop to support Kokkos 4.0.
Python
- Support loading Legion as a library from a stock Python interpreter.
Regent
- Fixes to avoid leaking futures.
- Improvements to Regent's predicate optimization.
Tools
- Legion Prof now supports a native viewer UI. Enable it with the viewer feature (e.g., cargo run --features=viewer) and use the flag --view.
- Legion Prof now has better support for rendering a subset of available nodes. Pass all log files (from all nodes) into Legion Prof and add the --subnodes flag to specify which ones to render. This ensures all copies in/out of those nodes will be shown correctly.

Assets 2

30 Dec 17:23

elliottslaughter

legion-22.12.0

9ed6f4d

Version 22.12.0 (December 30, 2022)

Regent
- Support for nested predication of if and while statements
Realm
- Support priorities for Copy operations
- Support building with multiple network backends enabled, and use -ll:networks (gasnetex/gasnet1/mpi/none) to pick which one to use during runtime
- Separate CUDA runtime from Realm by removing all references to CUDA runtime and relying only on driver API, which fixes an issue when mixing static and dynamic cudart across an application and improves Realm’s compatibility across driver versions
Tools
- Legion Prof support visualization of Channel of indirect copy, and Instances being used by different operations including Task, Copy and Fill

Assets 2

04 Oct 05:28

elliottslaughter

legion-22.09.0

5b6e013

Version 22.09.0 (September 30, 2022)

Python
- Support for running packages via legion_python -m
- Support for Jupyter Notebook on single node execution.
Regent
- Deprecated support for LLVM versions less than 11 in setup_env.py. These versions will be removed in the next release. LLVM 13 is recommended, except on ARM where LLVM 11 is currently required
- Added support for provenance for all launcher operations
- Debug info is no longer generated by default in order to optimize compile times. To re-enable it, run with -fdebuginfo 1
Legion
- Most Legion APIs now support passing a provenance string. This provenance information is passed through to tools like Legion Spy and Legion Prof so users can map what they are seeing back to their source code. In the future, provenance strings will also be used by all Legion error messages as well.
Realm
- Support for fills of arbitrary instances (via multi-hop paths where needed)
- Fixed crashes when using external instances and network-registered memory at the same time
- Removed all direct references to CUDA runtime library in CUDA module
- Caching of minimum-cost data transfer path for repeated copies
- Dependent partitioning support for image and preimage using structured (~affine) transforms in addition to existing unstructured (field-based) images/preimages

Assets 2

30 Jun 23:15

elliottslaughter

legion-22.06.0

f721be9

Version 22.06.0 (June 29, 2022)

Regent
- Support for cross-products in index launches, as well as multi-level projection functors.
- Support for HIP on AMD GPUs has been added. All tasks marked with __demand(__cuda) are automatically eligible. Note that the name of the annotation may change in the future to something more general, but for now no change is being made. Some CUDA flags have migrated to more general names. See below.
- The flag -fcuda 1 is deprecated. Use -fgpu cuda instead.
- The flag -fcuda-offline is deprecated. Use -fgpu-offline instead.
- The flag -fcuda-arch is deprecated. Use -fgpu-arch instead.
- Enable HIP support with -fgpu hip and use the -fgpu-offline and -fgpu-arch flags as necessary/appropriate.
- Support for new flag -ffast-math 1 which enables fast-math optimizations on CPU and GPU. By default, CPU code has this disabled, and GPU code uses only the contract flag in LLVM to generate FMA instructions. For compute-intensive applications, additional performance can sometimes be unlocked by enabling the full suite of optimizations with -ffast-math 1, at the cost of numerical accuracy.
- Performance improvements for CUDA allow recent LLVM versions (e.g., 13) to match or exceed the performance of LLVM 3.8. Previously, performance regressions made LLVM 3.8 the most performant version for use with CUDA. The recommended LLVM version moving forward is 13, and setup_env.py has been updated to set this on all platforms.
- The versions of GASNet and Terra are now pinned by default in setup_env.py. You can choose versions explicitly with GASNET_VERSION (as before, though the previous default was unpinned) and --terra-branch, respectively.
Realm
- Allow use of system OpenMP runtime (instead of Realm-provided one) with -DLegion_OpenMP_SYSTEM_RUNTIME=ON. This allows inter-operation with libraries that have already been linked to the system runtime, but limits each process to a single OMP processor.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: StanfordLegion/legion

Version 24.09.0 (September 27, 2024)

Version 24.06.0 (June 28, 2024) – Nonidempotent Traces

Version 24.03.0 (March 27, 2024) – Control Replication

Version 23.12.0 (December 14, 2023)

Version 23.09.0 (September 28, 2023)

Version 23.06.0 (June 28, 2023)

Version 23.03.0 (March 27, 2023)

Version 22.12.0 (December 30, 2022)

Version 22.09.0 (September 30, 2022)

Version 22.06.0 (June 29, 2022)