华为昇腾（Atlas 800T A2）

我们基于 LMDeploy 的 PytorchEngine，增加了华为昇腾设备的支持。所以，在华为昇腾上使用 LDMeploy 的方法与在英伟达 GPU 上使用 PytorchEngine 后端的方法几乎相同。在阅读本教程之前，请先阅读原版的快速开始。

支持的模型列表在这里.

安装

我们强烈建议用户构建一个 Docker 镜像以简化环境设置。

克隆 lmdeploy 的源代码，Dockerfile 位于 docker 目录中。

git clone https://github.com/InternLM/lmdeploy.git
cd lmdeploy

环境准备

Docker 版本应不低于 18.03。并且需按照官方指南安装 Ascend Docker Runtime。

Caution

如果在后续容器内出现libascend_hal.so: cannot open shared object file错误，说明Ascend Docker Runtime没有被正确安装。

Drivers，Firmware 和 CANN

目标机器需安装华为驱动程序和固件版本至少为 23.0.3，请参考 CANN 驱动程序和固件安装和下载资源。

另外，docker/Dockerfile_aarch64_ascend没有提供CANN 安装包，用户需要自己从昇腾资源下载中心下载CANN(version 8.0.RC2.beta1)软件包。并将Ascend-cann-kernels-910b*.run，Ascend-cann-nnal_*.run和Ascend-cann-toolkit*.run 放在 lmdeploy 源码根目录下。

构建镜像

请在 lmdeploy源代码根目录下执行以下镜像构建命令，CANN 相关的安装包也放在此目录下。

DOCKER_BUILDKIT=1 docker build -t lmdeploy-aarch64-ascend:latest \
    -f docker/Dockerfile_aarch64_ascend .

上述Dockerfile_aarch64_ascend适用于鲲鹏CPU. 如果是Intel CPU的机器，请尝试这个dockerfile (未经过测试)

如果以下命令执行没有任何错误，这表明环境设置成功。

docker run -e ASCEND_VISIBLE_DEVICES=0 --rm --name lmdeploy -t lmdeploy-aarch64-ascend:latest lmdeploy check_env

关于在昇腾设备上运行docker run命令的详情，请参考这篇文档。

离线批处理

Tip

图模式已经支持了Atlas 800T A2。用户可以设定eager_mode=False来开启图模式，或者设定eager_mode=True来关闭图模式。(启动图模式需要事先source /usr/local/Ascend/nnal/atb/set_env.sh)

LLM 推理

将device_type="ascend"加入PytorchEngineConfig的参数中。

from lmdeploy import pipeline
from lmdeploy import PytorchEngineConfig
if __name__ == "__main__":
    pipe = pipeline("internlm/internlm2_5-7b-chat",
                    backend_config=PytorchEngineConfig(tp=1, device_type="ascend", eager_mode=True))
    question = ["Shanghai is", "Please introduce China", "How are you?"]
    response = pipe(question)
    print(response)

VLM 推理

将device_type="ascend"加入PytorchEngineConfig的参数中。

from lmdeploy import pipeline, PytorchEngineConfig
from lmdeploy.vl import load_image
if __name__ == "__main__":
    pipe = pipeline('OpenGVLab/InternVL2-2B',
                    backend_config=PytorchEngineConfig(tp=1, device_type='ascend', eager_mode=True))
    image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg')
    response = pipe(('describe this image', image))
    print(response)

在线服务

Tip

图模式已经支持Atlas 800T A2。在线服务时，图模式默认开启，用户可以添加--eager-mode来关闭图模式。(启动图模式需要事先source /usr/local/Ascend/nnal/atb/set_env.sh)

LLM 模型服务

将--device ascend加入到服务启动命令中。

lmdeploy serve api_server --backend pytorch --device ascend --eager-mode internlm/internlm2_5-7b-chat

VLM 模型服务

将--device ascend加入到服务启动命令中。

lmdeploy serve api_server --backend pytorch --device ascend --eager-mode OpenGVLab/InternVL2-2B

使用命令行与LLM模型对话

将--device ascend加入到服务启动命令中。

lmdeploy chat internlm/internlm2_5-7b-chat --backend pytorch --device ascend --eager-mode

也可以运行以下命令使启动容器后开启lmdeploy聊天

docker exec -it lmdeploy_ascend_demo \
    bash -i -c "lmdeploy chat --backend pytorch --device ascend --eager-mode internlm/internlm2_5-7b-chat"

量化

w4a16 AWQ

运行下面的代码可以在Atlas 800T A2上对权重进行W4A16量化。

lmdeploy lite auto_awq $HF_MODEL --work-dir $WORK_DIR --device npu

支持的模型列表请参考支持的模型。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

get_started.md

get_started.md

华为昇腾（Atlas 800T A2）

安装

环境准备

Drivers，Firmware 和 CANN

构建镜像

离线批处理

LLM 推理

VLM 推理

在线服务

LLM 模型服务

VLM 模型服务

使用命令行与LLM模型对话

量化

w4a16 AWQ

Files

get_started.md

Latest commit

History

get_started.md

File metadata and controls

华为昇腾（Atlas 800T A2）

安装

环境准备

Drivers，Firmware 和 CANN

构建镜像

离线批处理

LLM 推理

VLM 推理

在线服务

LLM 模型服务

VLM 模型服务

使用命令行与LLM模型对话

量化

w4a16 AWQ