Intel XPU System Management Interface is an in-band node-level tool that provides local GPU management. It is easily integrated into the cluster management solutions and cluster scheduler. GPU users may use it to manage Intel GPUs, locally. It supports local command line interface and local library call interface.
- Provide GPU basic information, including GPU model, frequency, GPU memory capacity, firmware version
- Provide lots of GPU telemetries, including GPU utilization, performance metrics, GPU memory bandwidth, temperature
- Provide GPU health status, memory health, temperature health
- GPU diagnotics through different levels of GPU test suites
- GPU firmware update
- Get/change GPU settings, including power limit, GPU frequency, standby mode and scheduler mode
- Support K8s and can export GPU telemetries to Prometheus
- Intel(R) Data Center Flex Series GPU
- Intel(R) Data Center Max Series GPU
- Ubuntu 20.04.3/22.04
- RHEL 8.5/8.6
- CentOS 8/9 Stream
- CentOS 7.4/7.9
- SLES 15 SP3/SP4
- Debian 10.13
Please follow XPU System Management Interface Installation Guide to install/uninstall Intel XPU System Management Interface.
By default, Intel XPU System Management Interface is installed the folder, /usr/bin, /usr/lib and /usr/lib64. The command line tool is /usr/bin/xpu-smi. Please refer to XPU System Management Interface CLI User Guide for how to use the command line tool.