AI开发平台MODELARTS-模型NPU卡数、梯度累积值取值表
模型NPU卡数、梯度累积值取值表
不同模型推荐的训练参数和计算规格要求如表1所示。规格与节点数中的1*节点 & 4*Ascend表示单机4卡,以此类推。
模型 |
Template |
模型参数量 |
训练策略类型 |
序列长度cutoff_len |
梯度累积值 |
优化工具(Deepspeed) |
规格与节点数 |
---|---|---|---|---|---|---|---|
llama2 |
llama2 |
7B |
lora |
4096/8192 |
gradient_accumulation_steps: 8 |
ZeRO-1 |
1*节点 & 1*Ascend |
full |
gradient_accumulation_steps: 8 |
ZeRO-2 |
1*节点 & 8*Ascend |
||||
13B |
lora |
4096/8192 |
gradient_accumulation_steps: 8 |
ZeRO-2 |
1*节点 & 1*Ascend |
||
full |
gradient_accumulation_steps: 8 |
ZeRO-3 |
1*节点 & 8*Ascend |
||||
70B |
lora |
4096 |
gradient_accumulation_steps: 8 |
ZeRO-3 |
2*节点 & 8*Ascend |
||
8192 |
gradient_accumulation_steps: 8 |
ZeRO-3-Offload |
2*节点 & 8*Ascend |
||||
full |
4096/8192 |
gradient_accumulation_steps: 4 |
ZeRO-3-Offload |
4*节点 & 8*Ascend |
|||
llama3 |
llama3 |
70B |
lora |
4096/8192 |
gradient_accumulation_steps: 8 |
ZeRO-3 |
2*节点 & 8*Ascend |
full |
gradient_accumulation_steps: 4 |
ZeRO-3-Offload |
4*节点 & 8*Ascend |
||||
8B |
lora |
4096/8192 |
gradient_accumulation_steps: 8 |
ZeRO-2 |
1*节点 & 1*Ascend |
||
full |
gradient_accumulation_steps: 8 |
ZeRO-2 |
1*节点 & 8*Ascend |
||||
llama3.1 |
llama3 |
8B |
lora |
4096/8192 |
gradient_accumulation_steps: 8 |
ZeRO-1 |
1*节点 & 1*Ascend |
full |
4096/8192 |
gradient_accumulation_steps: 8 |
ZeRO-2 |
1*节点 & 8*Ascend |
|||
70B |
lora |
4096 |
gradient_accumulation_steps: 8 |
ZeRO-3 |
2*节点 & 8*Ascend |
||
8192 |
gradient_accumulation_steps: 8 |
ZeRO-3-Offload |
2*节点 & 8*Ascend |
||||
full |
4096/8192 |
gradient_accumulation_steps: 4 |
ZeRO-3-Offload |
4*节点 & 8*Ascend |
|||
Qwen2 |
qwen |
72B |
lora |
4096 |
gradient_accumulation_steps: 8 |
ZeRO-3 |
2*节点 & 8*Ascend |
8192 |
gradient_accumulation_steps: 8 |
ZeRO-3-Offload |
2*节点 & 8*Ascend |
||||
full |
4096/8192 |
gradient_accumulation_steps: 4 |
ZeRO-3-Offload |
4*节点 & 8*Ascend |
|||
7B |
lora |
4096/8192 |
gradient_accumulation_steps: 8 |
ZeRO-0 |
1*节点 & 1*Ascend |
||
full |
4096/8192 |
gradient_accumulation_steps: 8 |
ZeRO-2 |
1*节点 & 8*Ascend |
|||
0.5/1.5B |
lora/full |
4096/8192 |
gradient_accumulation_steps: 8 |
ZeRO-0 |
1*节点 & 1*Ascend |
||
Qwen2_vl |
qwen2_vl |
2B |
lora |
4096/8192 |
gradient_accumulation_steps: 8 |
ZeRO-0 |
1*节点 & 1*Ascend |
full |
4096/8192 |
gradient_accumulation_steps: 8 |
ZeRO-0 |
1*节点 & 2*Ascend |
|||
7B |
lora |
4096 |
gradient_accumulation_steps: 8 |
ZeRO-0 |
1*节点 & 1*Ascend |
||
8192 |
gradient_accumulation_steps: 8 |
ZeRO-1 |
1*节点 & 1*Ascend |
||||
full |
4096 |
gradient_accumulation_steps: 8 |
ZeRO-2 |
1*节点 & 8*Ascend |
|||
8192 |
gradient_accumulation_steps: 8 |
ZeRO-2-Offload |
1*节点 & 8*Ascend |
||||
Qwen1.5 |
qwen |
0.5/1.8B |
lora/full |
4096/8192 |
gradient_accumulation_steps: 8 |
ZeRO-0 |
1*节点 & 1*Ascend |
4B |
lora |
4096/8192 |
gradient_accumulation_steps: 8 |
ZeRO-1 |
1*节点 & 1*Ascend |
||
full |
4096/8192 |
gradient_accumulation_steps: 8 |
ZeRO-1 |
1*节点 & 4*Ascend |
|||
7B |
lora |
4096/8192 |
gradient_accumulation_steps: 8 |
ZeRO-1 |
1*节点 & 1*Ascend |
||
full |
4096/8192 |
gradient_accumulation_steps: 8 |
ZeRO-2 |
1*节点 & 8*Ascend |
|||
14B |
lora |
4096/8192 |
gradient_accumulation_steps: 8 |
ZeRO-3 |
1*节点 & 1*Ascend |
||
full |
4096/8192 |
gradient_accumulation_steps: 8 |
ZeRO-3 |
1*节点 & 8*Ascend |
|||
32B |
lora |
4096/8192 |
gradient_accumulation_steps: 8 |
ZeRO-3 |
1*节点 & 4*Ascend |
||
full |
4096 |
gradient_accumulation_steps: 8 |
ZeRO-3 |
2*节点 & 8*Ascend |
|||
full |
8192 |
gradient_accumulation_steps: 4 |
ZeRO-3-Offload |
2*节点 & 8*Ascend |
|||
72B |
lora |
4096 |
gradient_accumulation_steps: 8 |
ZeRO-3 |
2*节点 & 8*Ascend |
||
lora |
8192 |
gradient_accumulation_steps: 8 |
ZeRO-3-Offload |
2*节点 & 8*Ascend |
|||
full |
4096/8192 |
gradient_accumulation_steps: 4 |
ZeRO-3-Offload |
4*节点 & 8*Ascend |
|||
falcon2 |
falcon |
11B |
lora |
4096/8192 |
gradient_accumulation_steps: 8 |
ZeRO-2 |
1*节点 & 1*Ascend |
full |
4096/8192 |
gradient_accumulation_steps: 8 |
ZeRO-2 |
1*节点 & 8*Ascend |
|||
GLM4 |
glm4 |
9B |
lora |
4096/8192 |
gradient_accumulation_steps: 8 |
ZeRO-2 |
1*节点 & 1*Ascend |
full |
4096/8192 |
gradient_accumulation_steps: 8 |
ZeRO-3 |
1*节点 & 8*Ascend |
|||
Yi |
yi |
6B |
lora |
4096/8192 |
gradient_accumulation_steps: 8 |
ZeRO-1 |
1*节点 & 1*Ascend |
full |
4096/8192 |
gradient_accumulation_steps: 8 |
ZeRO-1 |
1*节点 & 4*Ascend |
|||
34B |
full |
4096 |
gradient_accumulation_steps: 8 |
ZeRO-3 |
2*节点 & 8*Ascend |
||
lora |
gradient_accumulation_steps: 8 |
ZeRO-3 |
1*节点 & 2*Ascend |
||||
full |
8192 |
gradient_accumulation_steps: 8 |
ZeRO-3 |
4*节点 & 8*Ascend |
|||
lora |
gradient_accumulation_steps: 8 |
ZeRO-3 |
1*节点 & 4*Ascend |
以上参数为开启NPU FlashAttention融合算子,上述参数值仅供参考,请根据自己实际要求合理配置其他加速框架或ZeRO (Zero Redundancy Optimizer)优化器、NPU节点数即其他配置。
具体优化工具使用说明可参考如何选择最佳性能的zero-stage和-offloads。
- ModelArts模型训练_超参搜索简介_超参搜索算法
- ModelArts分布式训练_分布式训练介绍_分布式调测
- ModelArts推理部署_纳管Atlas 500_边缘服务-华为云
- ModelArts模型训练_模型训练简介_如何训练模型
- AI训练加速存储_高性能数据存储_AI数据存储内存不足怎么办
- ModelArts模型训练_创建训练作业_如何创建训练作业
- 数据存储共享_文件存储应用场景_共享文件存储SFS
- 文件存储与块存储的区别_免费的文件存储_分布式文件存储系统
- TMS开发_金蝶TMS系统_TMS技术系统_信息化管理_视频
- ModelArts计费说明_计费简介_ModelArts怎么计费