GPU Internals

Documentation on how GPU's work internally and communicate externally. Mostly focused on compute applications.

Knowing details about internals is useful for

debugging - some problems need understanding of lower levels of abstraction
performance - need to know hardware layout to reason about performance
curiosity - more confidence in using programming models with a better understanding of the hardware and abstraction layers

By Vendor

NVIDIA

https://nvidia.github.io/open-gpu-doc/
https://github.com/envytools/envytools
Linux kernel modules source: https://github.com/NVIDIA/open-gpu-kernel-modules

Assembly

Nvidia assembly (and low level programming info)

Projects that might yield interesting info

https://github.com/pakmarkthub/dragon direct resource access for GPUs over NVM (similer to mmap on CPUs)
https://github.com/yalue/cuda_scheduling_examiner_mirror
https://github.com/NVlabs/NVBit Binary instrumentation tool

AMD

https://rocmdocs.amd.com/en/latest/ ROCm documentation
https://github.com/RadeonOpenCompute/ROCm ROCm source (See "Getting the ROCm Source Code" in the Installation Guide in the documentation)

Assembly

AMD GPU assembly (and low level programming info)

Intel

Assembly

Intel Gen assembly (and low level programming info)

Raspberry Pi

https://github.com/mn416/QPULib library for programming QPU (Quad Processing Units)
https://github.com/doe300/VC4CL OpenCL implementation

Programming Models

Software models for programming GPUs

Questions

How do transfers across the bus (usually PCI-e) work?
- NVidia has GPUDirect which can transfer from device to device (and bypass the CPU) https://docs.nvidia.com/cuda/gpudirect-rdma/index.html
- Does this change with other buses (CAPI, NVLink?)
How does memory management work?
- How is the memory map maintained? There must be some sort of MMU to provide memory protections. How does it work?
What is the lifecycle of a kernel in detail?
- It must be something like
  1. Copy kernel to GPU memory
  2. Start executing kernel (how?)
  3. Signal that the kernel is finished
Examples on visualizing the execution behavior of kernels

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
AMD_assembly.md		AMD_assembly.md
CPU_SIMD.md		CPU_SIMD.md
Intel_Gen_assembly.md		Intel_Gen_assembly.md
LICENSE		LICENSE
LLVM_Openmp_offload.md		LLVM_Openmp_offload.md
LLVM_offload_example.md		LLVM_offload_example.md
Nvidia_assembly.md		Nvidia_assembly.md
OpenMP_examples.md		OpenMP_examples.md
OpenMP_observations.md		OpenMP_observations.md
README.md		README.md
SPIRV.md		SPIRV.md
Triton.md		Triton.md
software_models.md		software_models.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GPU Internals

By Vendor

NVIDIA

Assembly

Projects that might yield interesting info

AMD

Assembly

Intel

Assembly

Raspberry Pi

Programming Models

Questions

About

Releases

Packages

License

markdewing/GPU_internals

Folders and files

Latest commit

History

Repository files navigation

GPU Internals

By Vendor

NVIDIA

Assembly

Projects that might yield interesting info

AMD

Assembly

Intel

Assembly

Raspberry Pi

Programming Models

Questions

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages