Skip to content

Latest commit

 

History

History
161 lines (160 loc) · 16.2 KB

附录M_CUDA环境变量.md

File metadata and controls

161 lines (160 loc) · 16.2 KB

附录M_CUDA环境变量

下表列出了 CUDA 环境变量。 与多进程服务相关的环境变量记录在 GPU 部署和管理指南的多进程服务部分。

Table 18. CUDA Environment Variables
Variable Values Description
Device Enumeration and Properties
CUDA_VISIBLE_DEVICES A comma-separated sequence of GPU identifiers

MIG support: MIG-<GPU-UUID>/<GPU instance ID>/<compute instance ID>
GPU identifiers are given as integer indices or as UUID strings. GPU UUID strings should follow the same format as given by nvidia-smi, such as GPU-8932f937-d72c-4106-c12f-20bd9faed9f6. However, for convenience, abbreviated forms are allowed; simply specify enough digits from the beginning of the GPU UUID to uniquely identify that GPU in the target system. For example, CUDA_VISIBLE_DEVICES=GPU-8932f937 may be a valid way to refer to the above GPU UUID, assuming no other GPU in the system shares this prefix.

Only the devices whose index is present in the sequence are visible to CUDA applications and they are enumerated in the order of the sequence. If one of the indices is invalid, only the devices whose index precedes the invalid index are visible to CUDA applications. For example, setting CUDA_VISIBLE_DEVICES to 2,1 causes device 0 to be invisible and device 2 to be enumerated before device 1. Setting CUDA_VISIBLE_DEVICES to 0,2,-1,1 causes devices 0 and 2 to be visible and device 1 to be invisible.

MIG format starts with MIG keyword and GPU UUID should follow the same format as given by nvidia-smi. For example, MIG-GPU-8932f937-d72c-4106-c12f-20bd9faed9f6/1/2. Only single MIG instance enumeration is supported.
CUDA_MANAGED_FORCE_DEVICE_ALLOC 0 or 1 (default is 0) Forces the driver to place all managed allocations in device memory.
CUDA_DEVICE_ORDER FASTEST_FIRST, PCI_BUS_ID, (default is FASTEST_FIRST) FASTEST_FIRST causes CUDA to enumerate the available devices in fastest to slowest order using a simple heuristic. PCI_BUS_ID orders devices by PCI bus ID in ascending order.
Compilation
CUDA_CACHE_DISABLE 0 or 1 (default is 0) Disables caching (when set to 1) or enables caching (when set to 0) for just-in-time-compilation. When disabled, no binary code is added to or retrieved from the cache.
CUDA_CACHE_PATH filepath Specifies the folder where the just-in-time compiler caches binary codes; the default values are:
  • on Windows, %APPDATA%\NVIDIA\ComputeCache
  • on Linux, ~/.nv/ComputeCache
CUDA_CACHE_MAXSIZE integer (default is 268435456 (256 MiB) and maximum is 4294967296 (4 GiB)) Specifies the size in bytes of the cache used by the just-in-time compiler. Binary codes whose size exceeds the cache size are not cached. Older binary codes are evicted from the cache to make room for newer binary codes if needed.
CUDA_FORCE_PTX_JIT 0 or 1 (default is 0) When set to 1, forces the device driver to ignore any binary code embedded in an application (see Application Compatibility) and to just-in-time compile embedded PTX code instead. If a kernel does not have embedded PTX code, it will fail to load. This environment variable can be used to validate that PTX code is embedded in an application and that its just-in-time compilation works as expected to guarantee application forward compatibility with future architectures (see Just-in-Time Compilation).
CUDA_DISABLE_PTX_JIT 0 or 1 (default is 0) When set to 1, disables the just-in-time compilation of embedded PTX code and use the compatible binary code embedded in an application (see Application Compatibility). If a kernel does not have embedded binary code or the embedded binary was compiled for an incompatible architecture, then it will fail to load. This environment variable can be used to validate that an application has the compatible SASS code generated for each kernel.(see Binary Compatibility).
Execution
CUDA_LAUNCH_BLOCKING 0 or 1 (default is 0) Disables (when set to 1) or enables (when set to 0) asynchronous kernel launches.
CUDA_DEVICE_MAX_CONNECTIONS 1 to 32 (default is 8) Sets the number of compute and copy engine concurrent connections (work queues) from the host to each device of compute capability 3.5 and above.
CUDA_AUTO_BOOST 0 or 1 Overrides the autoboost behavior set by the --auto-boost-default option of nvidia-smi. If an application requests via this environment variable a behavior that is different from nvidia-smi's, its request is honored if there is no other application currently running on the same GPU that successfully requested a different behavior, otherwise it is ignored.
cuda-gdb (on Linux platform)
CUDA_DEVICE_WAITS_ON_EXCEPTION 0 or 1 (default is 0) When set to 1, a CUDA application will halt when a device exception occurs, allowing a debugger to be attached for further debugging.
MPS service (on Linux platform)
CUDA_DEVICE_DEFAULT_PERSISTING_L2_CACHE_PERCENTAGE_LIMIT Percentage value (between 0 - 100, default is 0) Devices of compute capability 8.x allow, a portion of L2 cache to be set-aside for persisting data accesses to global memory. When using CUDA MPS service, the set-aside size can only be controlled using this environment variable, before starting CUDA MPS control daemon. I.e., the environment variable should be set before running the command nvidia-cuda-mps-control -d.