云服务器内容精选

  • 模型推荐的参数与NPU卡数设置 不同模型推荐的训练参数和计算规格要求如表2所示。规格与节点数中的1*节点 & 4*Ascend表示单机4卡,以此类推。 表2 不同模型推荐的参数与NPU卡数设置 序号 支持模型 支持模型参数量 文本序列长度 并行参数设置 规格与节点数 1 llama2 llama2-7b SEQ_LEN=4096 TP(tensor model parallel size)=1 PP(pipeline model parallel size)=4 1*节点 & 4*Ascend SEQ_LEN=8192 TP(tensor model parallel size)=2 PP(pipeline model parallel size)=4 1*节点 & 8*Ascend 2 llama2-13b SEQ_LEN=4096 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 1*节点 & 8*Ascend SEQ_LEN=8192 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 1*节点 & 8*Ascend 3 llama2-70b SEQ_LEN=4096 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 4*节点 & 8*Ascend SEQ_LEN=8192 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=8 8*节点 & 8*Ascend 4 llama3 llama3-8b SEQ_LEN=4096 TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 1*节点 & 4*Ascend SEQ_LEN=8192 TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 1*节点 & 4*Ascend 5 llama3-70b SEQ_LEN=4096 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 4*节点 & 8*Ascend SEQ_LEN=8192 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=8 8*节点 & 8*Ascend 6 Qwen qwen-7b SEQ_LEN=4096 TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 1*节点 & 4*Ascend SEQ_LEN=8192 TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 1*节点 & 4*Ascend 7 qwen-14b SEQ_LEN=4096 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 1*节点 & 8*Ascend SEQ_LEN=8192 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 1*节点 & 8*Ascend 8 qwen-72b SEQ_LEN=4096 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 4*节点 & 8*Ascend SEQ_LEN=8192 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=8 8*节点 & 8*Ascend 9 Qwen1.5 qwen1.5-7b SEQ_LEN=4096 TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 1*节点 & 4*Ascend SEQ_LEN=8192 TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 1*节点 & 4*Ascend 10 qwen1.5-14b SEQ_LEN=4096 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 1*节点 & 8*Ascend SEQ_LEN=8192 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 1*节点 & 8*Ascend 11 qwen1.5-32b SEQ_LEN=4096 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=2 2*节点 & 8*Ascend SEQ_LEN=8192 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 4*节点 & 8*Ascend 12 qwen1.5-72b SEQ_LEN=4096 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 4*节点 & 8*Ascend SEQ_LEN=8192 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=8 8*节点 & 8*Ascend 13 Yi yi-6b SEQ_LEN=4096 TP(tensor model parallel size)=1 PP(pipeline model parallel size)=4 1*节点 & 4*Ascend SEQ_LEN=8192 TP(tensor model parallel size)=2 PP(pipeline model parallel size)=4 1*节点 & 8*Ascend 14 yi-34b SEQ_LEN=4096 TP(tensor model parallel size)=4 PP(pipeline model parallel size)=4 2*节点 & 8*Ascend SEQ_LEN=8192 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 4*节点 & 8*Ascend 15 ChatGLMv3 glm3-6b SEQ_LEN=4096 TP(tensor model parallel size)=1 PP(pipeline model parallel size)=4 1*节点 & 4*Ascend SEQ_LEN=8192 TP(tensor model parallel size)=2 PP(pipeline model parallel size)=4 1*节点 & 8*Ascend 16 Baichuan2 baichuan2-13b SEQ_LEN=4096 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 1*节点 & 8*Ascend SEQ_LEN=8192 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 1*节点 & 8*Ascend 17 Qwen2 qwen2-0.5b SEQ_LEN=4096 TP(tensor model parallel size)=2 PP(pipeline model parallel size)=1 1*节点 & 2*Ascend SEQ_LEN=8192 TP(tensor model parallel size)=2 PP(pipeline model parallel size)=1 1*节点 & 2*Ascend 18 qwen2-1.5b SEQ_LEN=4096 TP(tensor model parallel size)=2 PP(pipeline model parallel size)=1 1*节点 & 2*Ascend SEQ_LEN=8192 TP(tensor model parallel size)=2 PP(pipeline model parallel size)=1 1*节点 & 2*Ascend 19 qwen2-7b SEQ_LEN=4096 TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 1*节点 & 4*Ascend SEQ_LEN=8192 TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 1*节点 & 4*Ascend 20 qwen2-72b SEQ_LEN=4096 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 4*节点 & 8*Ascend SEQ_LEN=8192 TP(tensor model parallel size)=8 PP(pipeline model parallel size)=8 8*节点 & 8*Ascend 21 GLMv4 glm4-9b SEQ_LEN=4096 TP(tensor model parallel size)=2 PP(pipeline model parallel size)=4 1*节点 & 8*Ascend SEQ_LEN=8192 TP(tensor model parallel size)=2 PP(pipeline model parallel size)=4 1*节点 & 8*Ascend 22 mistral mistral-7b SEQ_LEN=4096 TP(tensor model parallel size)=1 PP(pipeline model parallel size)=4 1*节点 & 8*Ascend 23 mixtral mixtral-8x7b SEQ_LEN=4096 TP(tensor model parallel size)=2 PP(pipeline model parallel size)=8 2*节点 & 8*Ascend SEQ_LEN=8192 TP(tensor model parallel size)=2 PP(pipeline model parallel size)=8 2*节点 & 8*Ascend