AI开发平台MODELARTS-训练启动脚本说明和参数配置:模型推荐的参数与NPU卡数设置
模型推荐的参数与NPU卡数设置
不同模型推荐的训练参数和计算规格要求如表1所示。规格与节点数中的1*节点 & 4*Ascend表示单机4卡,以此类推。
序号 |
支持模型 |
支持模型参数量 |
文本序列长度 |
并行参数设置 |
规格与节点数 |
---|---|---|---|---|---|
1 |
llama2 |
llama2-7b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=4 |
1*节点 & 8*Ascend |
SEQ_LEN=8192 |
TP(tensor model parallel size)=2 PP(pipeline model parallel size)=4 |
1*节点 & 8*Ascend |
|||
2 |
llama2-13b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 |
1*节点 & 8*Ascend |
|
SEQ_LEN=8192 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 |
1*节点 & 8*Ascend |
|||
3 |
llama2-70b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 |
4*节点 & 8*Ascend |
|
SEQ_LEN=8192 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=8 |
8*节点 & 8*Ascend |
|||
4 |
llama3 |
llama3-8b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
1*节点 & 8*Ascend |
SEQ_LEN=8192 |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
1*节点 & 8*Ascend |
|||
5 |
llama3-70b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 |
4*节点 & 8*Ascend |
|
SEQ_LEN=8192 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=8 |
8*节点 & 8*Ascend |
|||
6 |
Qwen |
qwen-7b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
1*节点 & 8*Ascend |
SEQ_LEN=8192 |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
1*节点 & 8*Ascend |
|||
7 |
qwen-14b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 |
1*节点 & 8*Ascend |
|
SEQ_LEN=8192 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 |
1*节点 & 8*Ascend |
|||
8 |
qwen-72b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 |
4*节点 & 8*Ascend |
|
SEQ_LEN=8192 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=8 |
8*节点 & 8*Ascend |
|||
9 |
Qwen1.5 |
qwen1.5-7b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
1*节点 & 8*Ascend |
SEQ_LEN=8192 |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
1*节点 & 8*Ascend |
|||
10 |
qwen1.5-14b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 |
1*节点 & 8*Ascend |
|
SEQ_LEN=8192 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 |
1*节点 & 8*Ascend |
|||
11 |
qwen1.5-32b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=2 |
2*节点 & 8*Ascend |
|
SEQ_LEN=8192 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 |
4*节点 & 8*Ascend |
|||
12 |
qwen1.5-72b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 |
4*节点 & 8*Ascend |
|
SEQ_LEN=8192 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=8 |
8*节点 & 8*Ascend |
|||
13 |
Yi |
yi-6b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=4 |
1*节点 & 8*Ascend |
SEQ_LEN=8192 |
TP(tensor model parallel size)=2 PP(pipeline model parallel size)=4 |
1*节点 & 8*Ascend |
|||
14 |
yi-34b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=4 |
2*节点 & 8*Ascend |
|
SEQ_LEN=8192 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 |
4*节点 & 8*Ascend |
|||
15 |
ChatGLMv3 |
glm3-6b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=4 |
1*节点 & 8*Ascend |
SEQ_LEN=8192 |
TP(tensor model parallel size)=2 PP(pipeline model parallel size)=4 |
1*节点 & 8*Ascend |
|||
16 |
Baichuan2 |
baichuan2-13b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 |
1*节点 & 8*Ascend |
SEQ_LEN=8192 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 |
1*节点 & 8*Ascend |
|||
17 |
Qwen2 |
qwen2-0.5b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=2 PP(pipeline model parallel size)=1 |
1*节点 & 4*Ascend |
SEQ_LEN=8192 |
TP(tensor model parallel size)=2 PP(pipeline model parallel size)=1 |
1*节点 & 4*Ascend |
|||
18 |
qwen2-1.5b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=2 PP(pipeline model parallel size)=1 |
1*节点 & 4*Ascend |
|
SEQ_LEN=8192 |
TP(tensor model parallel size)=2 PP(pipeline model parallel size)=1 |
1*节点 & 4*Ascend |
|||
19 |
qwen2-7b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
1*节点 & 8*Ascend |
|
SEQ_LEN=8192 |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
1*节点 & 8*Ascend |
|||
20 |
qwen2-72b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 |
4*节点 & 8*Ascend |
|
SEQ_LEN=8192 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=8 |
8*节点 & 8*Ascend |
|||
21 |
GLMv4 |
glm4-9b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=2 PP(pipeline model parallel size)=4 |
1*节点 & 8*Ascend |
SEQ_LEN=8192 |
TP(tensor model parallel size)=2 PP(pipeline model parallel size)=4 |
1*节点 & 8*Ascend |
|||
22 |
mistral |
mistral-7b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=4 |
1*节点 & 8*Ascend |
23 |
mixtral |
mixtral-8x7b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=2 PP(pipeline model parallel size)=8 |
2*节点 & 8*Ascend |
SEQ_LEN=8192 |
TP(tensor model parallel size)=2 PP(pipeline model parallel size)=8 |
2*节点 & 8*Ascend |
|||
24 |
llama3.1 |
llama3.1-8b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
1*节点 & 4*Ascend |
SEQ_LEN=8192 |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
1*节点 & 4*Ascend |
|||
25 |
llama3.1-70b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 |
4*节点 & 8*Ascend |
|
SEQ_LEN=8192 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=8 |
8*节点 & 8*Ascend |
- ModelArts分布式训练_分布式训练介绍_分布式调测
- ModelArts模型训练_超参搜索简介_超参搜索算法
- ModelArts自定义镜像_自定义镜像简介_如何使用自定义镜像
- ModelArts模型训练_创建训练作业_如何创建训练作业
- ModelArts模型训练_模型训练简介_如何训练模型
- 华为云IEF_华为云智能边缘平台_智能边缘平台IEF容器应用管理
- ModelArts推理部署_纳管Atlas 500_边缘服务-华为云
- AI训练加速存储_高性能数据存储_AI数据存储内存不足怎么办
- ModelArts Workflow_什么是Workflow_工作流
- ModelArts自动学习是什么_自动学习简介_零代码完成AI开发