Documentation on how GPU's work internally and communicate externally. Mostly focused on compute applications.
Knowing details about internals is useful for
- debugging - some problems need understanding of lower levels of abstraction
- performance - need to know hardware layout to reason about performance
- curiosity - more confidence in using programming models with a better understanding of the hardware and abstraction layers
- https://nvidia.github.io/open-gpu-doc/
- https://github.com/envytools/envytools
- Linux kernel modules source: https://github.com/NVIDIA/open-gpu-kernel-modules
- Nvidia assembly (and low level programming info)
- https://github.com/pakmarkthub/dragon direct resource access for GPUs over NVM (similer to mmap on CPUs)
- https://github.com/yalue/cuda_scheduling_examiner_mirror
- https://github.com/NVlabs/NVBit Binary instrumentation tool
- https://rocmdocs.amd.com/en/latest/ ROCm documentation
- https://github.com/RadeonOpenCompute/ROCm ROCm source (See "Getting the ROCm Source Code" in the Installation Guide in the documentation)
- AMD GPU assembly (and low level programming info)
- Intel Gen assembly (and low level programming info)
- https://github.com/mn416/QPULib library for programming QPU (Quad Processing Units)
- https://github.com/doe300/VC4CL OpenCL implementation
Software models for programming GPUs
-
How do transfers across the bus (usually PCI-e) work?
- NVidia has GPUDirect which can transfer from device to device (and bypass the CPU) https://docs.nvidia.com/cuda/gpudirect-rdma/index.html
- Does this change with other buses (CAPI, NVLink?)
-
How does memory management work?
- How is the memory map maintained? There must be some sort of MMU to provide memory protections. How does it work?
-
What is the lifecycle of a kernel in detail?
- It must be something like
- Copy kernel to GPU memory
- Start executing kernel (how?)
- Signal that the kernel is finished
- It must be something like
-
Examples on visualizing the execution behavior of kernels