云服务器内容精选

华为云首页用户手册

主流开源大模型基于Standard适配PyTorch NPU推理指导（6.3.907）

AI开发平台MODELARTS-场景介绍:操作流程

操作流程图1 操作流程图表2 操作任务流程说明阶段任务说明准备工作准备资源本教程案例是基于ModelArts Standard运行，需要购买ModelArts专属资源池。准备权重准备对应模型的权重文件。准备代码准备AscendCloud-6.3.907-xxx.zip。准备镜像准备推理模型适用的容器镜像。准备Notebook 本案例在Notebook上部署推理服务进行调试，因此需要创建Notebook。部署推理服务在Notebook调试环境中部署推理服务介绍如何在Notebook中配置NPU环境，部署并启动推理服务，完成精度测试和性能测试。若需要部署量化模型，需在Notebook中进行模型权重转换后再部署推理服务。在推理生产环境中部署推理服务介绍如何创建AI应用，部署模型并启动推理服务，在线预测服务。

AI开发平台MODELARTS 主流开源大模型基于Standard适配PyTorch NPU推理指导（6.3.907）
AI开发平台MODELARTS-场景介绍:支持的模型列表和权重文件

支持的模型列表和权重文件本方案支持vLLM的v0.5.0版本。不同vLLM版本支持的模型列表有差异，具体如表1所示。表1 支持的模型列表和权重获取地址序号模型名称是否支持fp16/bf16推理是否支持W4A16量化是否支持W8A8量化是否支持 kv-cache-int8量化开源权重获取地址 1 llama-7b √ √ √ √ https://huggingface.co/huggyllama/llama-7b 2 llama-13b √ √ √ √ https://huggingface.co/huggyllama/llama-13b 3 llama-65b √ √ √ √ https://huggingface.co/huggyllama/llama-65b 4 llama2-7b √ √ √ √ https://huggingface.co/meta-llama/Llama-2-7b-chat-hf 5 llama2-13b √ √ √ √ https://huggingface.co/meta-llama/Llama-2-13b-chat-hf 6 llama2-70b √ √ √ √ https://huggingface.co/meta-llama/Llama-2-70b-hf https://huggingface.co/meta-llama/Llama-2-70b-chat-hf (推荐) 7 llama3-8b √ √ √ √ https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct 8 llama3-70b √ √ √ √ https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct 9 yi-6b √ √ √ √ https://huggingface.co/01-ai/Yi-6B-Chat 10 yi-9b √ √ √ √ https://huggingface.co/01-ai/Yi-9B 11 yi-34b √ √ √ √ https://huggingface.co/01-ai/Yi-34B-Chat 12 deepseek-llm-7b √ x x x https://huggingface.co/deepseek-ai/deepseek-llm-7b-chat 13 deepseek-coder-instruct-33b √ x x x https://huggingface.co/deepseek-ai/deepseek-coder-33b-instruct 14 deepseek-llm-67b √ x x x https://huggingface.co/deepseek-ai/deepseek-llm-67b-chat 15 qwen-7b √ √ √ x https://huggingface.co/Qwen/Qwen-7B-Chat 16 qwen-14b √ √ √ x https://huggingface.co/Qwen/Qwen-14B-Chat 17 qwen-72b √ √ √ x https://huggingface.co/Qwen/Qwen-72B-Chat 18 qwen1.5-0.5b √ √ √ x https://huggingface.co/Qwen/Qwen1.5-0.5B-Chat 19 qwen1.5-7b √ √ √ x https://huggingface.co/Qwen/Qwen1.5-7B-Chat 20 qwen1.5-1.8b √ √ √ x https://huggingface.co/Qwen/Qwen1.5-1.8B-Chat 21 qwen1.5-14b √ √ √ x https://huggingface.co/Qwen/Qwen1.5-14B-Chat 22 qwen1.5-32b √ √ √ x https://huggingface.co/Qwen/Qwen1.5-32B/tree/main 23 qwen1.5-72b √ √ √ x https://huggingface.co/Qwen/Qwen1.5-72B-Chat 24 qwen1.5-110b √ √ √ x https://huggingface.co/Qwen/Qwen1.5-110B-Chat 25 qwen2-0.5b √ √ √ x https://huggingface.co/Qwen/Qwen2-0.5B-Instruct 26 qwen2-1.5b √ √ √ x https://huggingface.co/Qwen/Qwen2-1.5B-Instruct 27 qwen2-7b √ √ x x https://huggingface.co/Qwen/Qwen2-7B-Instruct 28 qwen2-72b √ √ √ x https://huggingface.co/Qwen/Qwen2-72B-Instruct 29 baichuan2-7b √ x x x https://huggingface.co/baichuan-inc/Baichuan2-7B-Chat 30 baichuan2-13b √ x x x https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat 31 gemmma-2b √ x x x https://huggingface.co/google/gemma-2b 32 gemmma-7b √ x x x https://huggingface.co/google/gemma-7b 33 chatglm2-6b √ x x x https://huggingface.co/THUDM/chatglm2-6b 34 chatglm3-6b √ x x x https://huggingface.co/THUDM/chatglm3-6b 35 glm-4-9b √ x x x https://huggingface.co/THUDM/glm-4-9b-chat 36 mistral-7b √ x x x https://huggingface.co/mistralai/Mistral-7B-v0.1 37 mixtral-8x7b √ x x x https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1 38 falcon2-11b √ x x x https://huggingface.co/tiiuae/falcon-11B/tree/main 39 qwen2-57b-a14b √ x x x https://huggingface.co/Qwen/Qwen2-57B-A14B-Instruct 40 llama3.1-8b √ x x x https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct 41 llama3.1-70b √ x x x https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct 说明：当前版本中yi-34b、qwen1.5-32b模型暂不支持单卡启动。

AI开发平台MODELARTS 主流开源大模型基于Standard适配PyTorch NPU推理指导（6.3.907）

主流开源大模型基于Standard适配PyTorch NPU推理指导（6.3.907）

意见反馈

0/200

提交取消

提交成功！非常感谢您的反馈，我们会继续努力做到更好反馈提交失败！请稍后重试！

云服务器内容精选

主流开源大模型基于Standard适配PyTorch NPU推理指导（6.3.907）

7*24

备案

专业服务

退订

建议反馈

售前咨询热线