AI开发平台MODELARTS-训练启动脚本说明和参数配置:模型推荐的参数与NPU卡数设置
模型推荐的参数与NPU卡数设置
序号 |
支持模型 |
支持模型参数量 |
训练策略类型 |
文本序列长度(SEQ_LEN) |
并行参数设置 |
micro batch size (MBS) |
规格与节点数 |
---|---|---|---|---|---|---|---|
1 |
llama2 |
llama2-7b |
pretrain/sft |
4096 |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=4 |
1 |
1*节点 & 8*Ascend |
lora |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=4 |
2 |
1*节点 & 8*Ascend |
||||
pretrain/sft |
8192 |
TP(tensor model parallel size)=2 PP(pipeline model parallel size)=4 |
1 |
1*节点 & 8*Ascend |
|||
lora |
TP(tensor model parallel size)=2 PP(pipeline model parallel size)=4 |
2 |
1*节点 & 8*Ascend |
||||
2 |
llama2-13b |
pretrain/sft |
4096 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 |
4 |
1*节点 & 8*Ascend |
|
lora |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 |
4 |
1*节点 & 8*Ascend |
||||
pretrain/sft |
8192 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 |
2 |
1*节点 & 8*Ascend |
|||
lora |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 |
2 |
1*节点 & 8*Ascend |
||||
3 |
llama2-70b |
pretrain/sft |
4096 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 |
1 |
4*节点 & 8*Ascend |
|
lora |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 |
2 |
4*节点 & 8*Ascend |
||||
pretrain/sft |
8192 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=8 |
1 |
8*节点 & 8*Ascend |
|||
lora |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 |
1 |
4*节点 & 8*Ascend |
||||
4 |
llama3 |
llama3-8b |
pretrain/sft |
4096 |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
2 |
1*节点 & 8*Ascend |
lora |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
4 |
1*节点 & 8*Ascend |
||||
pretrain/sft |
8192 |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
1 |
1*节点 & 8*Ascend |
|||
lora |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
2 |
1*节点 & 8*Ascend |
||||
5 |
llama3-70b |
pretrain/sft |
4096 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 |
1 |
4*节点 & 8*Ascend |
|
lora |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 |
2 |
4*节点 & 8*Ascend |
||||
pretrain/sft |
8192 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=8 |
1 |
8*节点 & 8*Ascend |
|||
lora |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 |
1 |
4*节点 & 8*Ascend |
||||
6 |
Qwen |
qwen-7b |
pretrain/sft |
4096 |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
2 |
1*节点 & 8*Ascend |
lora |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
4 |
1*节点 & 8*Ascend |
||||
pretrain/sft |
8192 |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
1 |
1*节点 & 8*Ascend |
|||
lora |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
2 |
1*节点 & 8*Ascend |
||||
7 |
qwen-14b |
pretrain/sft |
4096 |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=2 |
2 |
1*节点 & 8*Ascend |
|
lora |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
2 |
1*节点 & 8*Ascend |
||||
pretrain/sft |
8192 |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=2 |
1 |
1*节点 & 8*Ascend |
|||
lora |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=2 |
2 |
1*节点 & 8*Ascend |
||||
8 |
qwen-72b |
pretrain/sft |
4096 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 |
1 |
4*节点 & 8*Ascend |
|
lora |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 |
2 |
4*节点 & 8*Ascend |
||||
pretrain/sft |
8192 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=8 |
1 |
8*节点 & 8*Ascend |
|||
lora |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 |
1 |
4*节点 & 8*Ascend |
||||
9 |
Qwen1.5 |
qwen1.5-7b |
pretrain/sft |
4096 |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=4 |
1 |
1*节点 & 8*Ascend |
lora |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=4 |
2 |
1*节点 & 8*Ascend |
||||
pretrain/sft |
8192 |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
1 |
1*节点 & 8*Ascend |
|||
lora |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=4 |
1 |
1*节点 & 8*Ascend |
||||
10 |
qwen1.5-14b |
pretrain/sft |
4096 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 |
4 |
1*节点 & 8*Ascend |
|
lora |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
4 |
1*节点 & 8*Ascend |
||||
pretrain/sft |
8192 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 |
2 |
1*节点 & 8*Ascend |
|||
lora |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 |
2 |
1*节点 & 8*Ascend |
||||
11 |
qwen1.5-32b |
pretrain/sft |
4096 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=2 |
2 |
2*节点 & 8*Ascend |
|
lora |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=2 |
4 |
2*节点 & 8*Ascend |
||||
pretrain/sft |
8192 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=2 |
1 |
2*节点 & 8*Ascend |
|||
lora |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=2 |
2 |
2*节点 & 8*Ascend |
||||
12 |
qwen1.5-72b |
pretrain/sft |
4096 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 |
1 |
4*节点 & 8*Ascend |
|
lora |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 |
2 |
4*节点 & 8*Ascend |
||||
pretrain/sft |
8192 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=8 |
1 |
8*节点 & 8*Ascend |
|||
lora |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 |
1 |
4*节点 & 8*Ascend |
||||
13 |
Yi |
yi-6b |
pretrain/sft |
4096 |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=4 |
1 |
1*节点 & 8*Ascend |
lora |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=4 |
2 |
1*节点 & 8*Ascend |
||||
pretrain/sft |
8192 |
TP(tensor model parallel size)=2 PP(pipeline model parallel size)=2 |
1 |
1*节点 & 8*Ascend |
|||
lora |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=4 |
1 |
1*节点 & 8*Ascend |
||||
14 |
yi-34b |
pretrain/sft |
4096 |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=4 |
1 |
2*节点 & 8*Ascend |
|
lora |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=4 |
2 |
2*节点 & 8*Ascend |
||||
pretrain/sft |
8192 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 |
1 |
4*节点 & 8*Ascend |
|||
lora |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 |
2 |
4*节点 & 8*Ascend |
||||
15 |
ChatGLMv3 |
glm3-6b |
pretrain/sft |
4096 |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=2 |
1 |
1*节点 & 4*Ascend |
lora |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=2 |
2 |
1*节点 & 4*Ascend |
||||
pretrain/sft |
8192 |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=4 |
1 |
1*节点 & 4*Ascend |
|||
lora |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=2 |
1 |
1*节点 & 4*Ascend |
||||
16 |
Baichuan2 |
baichuan2-13b |
pretrain/sft |
4096 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 |
2 |
1*节点 & 8*Ascend |
lora |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 |
4 |
1*节点 & 8*Ascend |
||||
pretrain/sft |
8192 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=2 |
1 |
1*节点 & 8*Ascend |
|||
lora |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 |
1 |
2*节点 & 8*Ascend |
||||
17 |
Qwen2 |
qwen2-0.5b |
pretrain/sft |
4096 |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=1 |
2 |
1*节点 & 4*Ascend |
lora |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=1 |
2 |
1*节点 & 4*Ascend |
||||
pretrain/sft |
8192 |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=1 |
1 |
1*节点 & 4*Ascend |
|||
lora |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=1 |
1 |
1*节点 & 4*Ascend |
||||
18 |
qwen2-1.5b |
pretrain/sft |
4096 |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=1 |
2 |
1*节点 & 4*Ascend |
|
lora |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=1 |
2 |
1*节点 & 4*Ascend |
||||
pretrain/sft |
8192 |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=1 |
1 |
1*节点 & 4*Ascend |
|||
lora |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=1 |
1 |
1*节点 & 4*Ascend |
||||
19 |
qwen2-7b |
pretrain/sft |
4096 |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
2 |
1*节点 & 8*Ascend |
|
lora |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
2 |
1*节点 & 8*Ascend |
||||
pretrain/sft |
8192 |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=2 |
1 |
1*节点 & 8*Ascend |
|||
lora |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=2 |
2 |
1*节点 & 8*Ascend |
||||
20 |
qwen2-72b |
pretrain/sft |
4096 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 |
1 |
4*节点 & 8*Ascend |
|
lora |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 |
2 |
4*节点 & 8*Ascend |
||||
pretrain/sft |
8192 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=8 |
1 |
8*节点 & 8*Ascend |
|||
lora |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=8 |
1 |
8*节点 & 8*Ascend |
||||
21 |
GLMv4 |
glm4-9b |
pretrain/sft |
4096 |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=4 |
1 |
1*节点 & 8*Ascend |
lora |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=2 |
1 |
1*节点 & 4*Ascend |
||||
pretrain/sft |
8192 |
TP(tensor model parallel size)=2 PP(pipeline model parallel size)=2 |
1 |
1*节点 & 8*Ascend |
|||
lora |
TP(tensor model parallel size)=2 PP(pipeline model parallel size)=1 |
1 |
1*节点 & 4*Ascend |
||||
22 |
mistral |
mistral-7b |
pretrain/sft |
4096 |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=4 |
1 |
1*节点 & 8*Ascend |
lora |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=4 |
2 |
1*节点 & 8*Ascend |
||||
23 |
mixtral |
mixtral-8x7b |
pretrain/sft |
4096 |
TP(tensor model parallel size)=2 PP(pipeline model parallel size)=8 |
1 |
2*节点 & 8*Ascend |
pretrain/sft |
8192 |
TP(tensor model parallel size)=2 PP(pipeline model parallel size)=8 |
1 |
2*节点 & 8*Ascend |
|||
24 |
llama3.1 |
llama3.1-8b |
pretrain/sft |
4096 |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
2 |
1*节点 & 8*Ascend |
lora |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
4 |
1*节点 & 8*Ascend |
||||
pretrain/sft |
8192 |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
1 |
1*节点 & 8*Ascend |
|||
lora |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
2 |
1*节点 & 8*Ascend |
||||
25 |
llama3.1-70b |
pretrain/sft |
4096 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 |
1 |
4*节点 & 8*Ascend |
|
lora |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=2 |
4 |
2*节点 & 8*Ascend |
||||
pretrain/sft |
8192 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=8 |
1 |
8*节点 & 8*Ascend |
|||
lora |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=2 |
2 |
2*节点 & 8*Ascend |
||||
26 |
Qwen2.5 |
qwen2.5-0.5b |
pretrain/sft |
4096 |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=1 |
1 |
1*节点 & 4*Ascend |
lora |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=1 |
2 |
1*节点 & 4*Ascend |
||||
pretrain/sft |
8192 |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=1 |
1 |
1*节点 & 4*Ascend |
|||
lora |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=1 |
1 |
1*节点 & 4*Ascend |
||||
27 |
qwen2.5-7b |
pretrain/sft |
4096 |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
2 |
1*节点 & 8*Ascend |
|
lora |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
4 |
1*节点 & 8*Ascend |
||||
pretrain/sft |
8192 |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=2 |
1 |
1*节点 & 8*Ascend |
|||
lora |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=2 |
2 |
1*节点 & 8*Ascend |
||||
28 |
qwen2.5-14b |
pretrain/sft |
4096 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 |
4 |
1*节点 & 8*Ascend |
|
lora |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
4 |
1*节点 & 8*Ascend |
||||
pretrain/sft |
8192 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 |
2 |
1*节点 & 8*Ascend |
|||
lora |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 |
2 |
1*节点 & 8*Ascend |
||||
29 |
qwen2.5-32b |
pretrain/sft |
4096 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=2 |
2 |
2*节点 & 8*Ascend |
|
lora |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=2 |
4 |
2*节点 & 8*Ascend |
||||
pretrain/sft |
8192 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=2 |
1 |
2*节点 & 8*Ascend |
|||
lora |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=2 |
2 |
2*节点 & 8*Ascend |
||||
30 |
qwen2.5-72b |
pretrain/sft |
4096 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 |
1 |
4*节点 & 8*Ascend |
|
lora |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 |
4 |
4*节点 & 8*Ascend |
||||
pretrain/sft |
8192 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=8 |
1 |
8*节点 & 8*Ascend |
|||
lora |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 |
2 |
4*节点 & 8*Ascend |
||||
31 |
llama3.2 |
llama3.2-1b |
pretrain/sft |
4096 |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=1 |
2 |
1*节点 & 4*Ascend |
lora |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=1 |
2 |
1*节点 & 4*Ascend |
||||
pretrain/sft |
8192 |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=1 |
1 |
1*节点 & 4*Ascend |
|||
lora |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=1 |
1 |
1*节点 & 4*Ascend |
||||
32 |
llama3.2-3b |
pretrain/sft |
4096 |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=2 |
2 |
1*节点 & 4*Ascend |
|
lora |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=1 |
2 |
1*节点 & 4*Ascend |
||||
pretrain/sft |
8192 |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=2 |
1 |
1*节点 & 4*Ascend |
|||
lora |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=1 |
1 |
1*节点 & 4*Ascend |
- ModelArts分布式训练_分布式训练介绍_分布式调测
- ModelArts模型训练_超参搜索简介_超参搜索算法
- ModelArts自定义镜像_自定义镜像简介_如何使用自定义镜像
- ModelArts模型训练_创建训练作业_如何创建训练作业
- ModelArts模型训练_模型训练简介_如何训练模型
- 华为云IEF_华为云智能边缘平台_智能边缘平台IEF容器应用管理
- ModelArts推理部署_纳管Atlas 500_边缘服务-华为云
- AI训练加速存储_高性能数据存储_AI数据存储内存不足怎么办
- ModelArts Workflow_什么是Workflow_工作流
- ModelArts自动学习是什么_自动学习简介_零代码完成AI开发