AI开发平台MODELARTS-训练启动脚本说明和参数配置:模型推荐的参数与NPU卡数设置
模型推荐的参数与NPU卡数设置
序号 |
支持模型 |
支持模型参数量 |
文本序列长度 |
并行参数设置 |
规格与节点数 |
---|---|---|---|---|---|
1 |
llama2 |
llama2-7b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=4 |
1*节点 & 4*Ascend |
SEQ_LEN=8192 |
TP(tensor model parallel size)=2 PP(pipeline model parallel size)=4 |
1*节点 & 8*Ascend |
|||
2 |
llama2-13b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 |
1*节点 & 8*Ascend |
|
SEQ_LEN=8192 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 |
1*节点 & 8*Ascend |
|||
3 |
llama2-70b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 |
4*节点 & 8*Ascend |
|
SEQ_LEN=8192 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=8 |
8*节点 & 8*Ascend |
|||
4 |
llama3 |
llama3-8b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
1*节点 & 4*Ascend |
SEQ_LEN=8192 |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
1*节点 & 4*Ascend |
|||
5 |
llama3-70b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 |
4*节点 & 8*Ascend |
|
SEQ_LEN=8192 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=8 |
8*节点 & 8*Ascend |
|||
6 |
Qwen |
qwen-7b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
1*节点 & 4*Ascend |
SEQ_LEN=8192 |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
1*节点 & 4*Ascend |
|||
7 |
qwen-14b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 |
1*节点 & 8*Ascend |
|
SEQ_LEN=8192 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 |
1*节点 & 8*Ascend |
|||
8 |
qwen-72b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 |
4*节点 & 8*Ascend |
|
SEQ_LEN=8192 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=8 |
8*节点 & 8*Ascend |
|||
9 |
Qwen1.5 |
qwen1.5-7b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
1*节点 & 4*Ascend |
SEQ_LEN=8192 |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
1*节点 & 4*Ascend |
|||
10 |
qwen1.5-14b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 |
1*节点 & 8*Ascend |
|
SEQ_LEN=8192 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 |
1*节点 & 8*Ascend |
|||
11 |
qwen1.5-32b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=2 |
2*节点 & 8*Ascend |
|
SEQ_LEN=8192 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 |
4*节点 & 8*Ascend |
|||
12 |
qwen1.5-72b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 |
4*节点 & 8*Ascend |
|
SEQ_LEN=8192 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=8 |
8*节点 & 8*Ascend |
|||
13 |
Yi |
yi-6b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=4 |
1*节点 & 4*Ascend |
SEQ_LEN=8192 |
TP(tensor model parallel size)=2 PP(pipeline model parallel size)=4 |
1*节点 & 8*Ascend |
|||
14 |
yi-34b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=4 |
2*节点 & 8*Ascend |
|
SEQ_LEN=8192 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 |
4*节点 & 8*Ascend |
|||
15 |
ChatGLMv3 |
glm3-6b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=4 |
1*节点 & 4*Ascend |
SEQ_LEN=8192 |
TP(tensor model parallel size)=2 PP(pipeline model parallel size)=4 |
1*节点 & 8*Ascend |
|||
16 |
Baichuan2 |
baichuan2-13b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 |
1*节点 & 8*Ascend |
SEQ_LEN=8192 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 |
1*节点 & 8*Ascend |
|||
17 |
Qwen2 |
qwen2-0.5b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=2 PP(pipeline model parallel size)=1 |
1*节点 & 2*Ascend |
SEQ_LEN=8192 |
TP(tensor model parallel size)=2 PP(pipeline model parallel size)=1 |
1*节点 & 2*Ascend |
|||
18 |
qwen2-1.5b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=2 PP(pipeline model parallel size)=1 |
1*节点 & 2*Ascend |
|
SEQ_LEN=8192 |
TP(tensor model parallel size)=2 PP(pipeline model parallel size)=1 |
1*节点 & 2*Ascend |
|||
19 |
qwen2-7b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
1*节点 & 4*Ascend |
|
SEQ_LEN=8192 |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
1*节点 & 4*Ascend |
|||
20 |
qwen2-72b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 |
4*节点 & 8*Ascend |
|
SEQ_LEN=8192 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=8 |
8*节点 & 8*Ascend |
|||
21 |
GLMv4 |
glm4-9b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=2 PP(pipeline model parallel size)=4 |
1*节点 & 8*Ascend |
SEQ_LEN=8192 |
TP(tensor model parallel size)=2 PP(pipeline model parallel size)=4 |
1*节点 & 8*Ascend |
|||
22 |
mistral |
mistral-7b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=4 |
1*节点 & 8*Ascend |
23 |
mixtral |
mixtral-8x7b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=2 PP(pipeline model parallel size)=8 |
2*节点 & 8*Ascend |
SEQ_LEN=8192 |
TP(tensor model parallel size)=2 PP(pipeline model parallel size)=8 |
2*节点 & 8*Ascend |