AI开发平台MODELARTS-LoRA微调权重合并及转换:Step2 将多个权重文件合并为一个文件并转换格式

时间：2024-04-30 18:09:30

Step2 将多个权重文件合并为一个文件并转换格式

此步骤的目的是将Step1 合并LoRA微调训练生成的权重文件中生成的多个权重文件进行合并，生成一个权重文件，并转换权重文件的格式为HuggingFace格式。

脚本convert_weights_to_huggingface.py包含了权重文件合并和转换操作，具体的脚本内容和参数解释如下。

该脚本的执行需要在/home/ma-user/ws/AscendCloud-3rdLLM-6.3.902/llm_train/AscendSpeed/代码目录下进行。

python scripts/tools/ckpt_convert/llama/convert_weights_to_huggingface.py \
        --input-model-dir ${ASCNEDSPEED_CKPT_PATH} \
        --output-model-dir ${MERGE_CKPT_PATH} \
        --src-tensor-model-parallel-size ${TENSOR-MODEL-PARALLEL-SIZE} \
        --src-pipeline-model-parallel-size ${PIPELINE-MODEL-PARALLEL-SIZE} \
        --type ${TYPE} \
        --org-huggingface-dir ${HUGGINFGFACE_DIR} \
        --merge-mlp

参数说明：

${ASCNEDSPEED_CKPT_PATH}：训练生成的AscendSpeed格式权重目录，多机多卡场景下需要把多个节点上的权重文件都放到任意一个节点的这个目录下；需要指定到含有mp_rank_xxxxxxx的目录，一般为iter_xxxxx或release）。
${MERGE_CKPT_PATH}：合并后的权重路径。
${TENSOR-MODEL-PARALLEL-SIZE}：原始模型的TP配置大小，取值来自训练中的配置，此处需要手动输入。
${PIPELINE-MODEL-PARALLEL-SIZE}：原始模型的PP配置大小，取值来自训练中的配置，此处需要手动输入。
${TYPE}：原始模型参数大小，支持参数配置： 7B、13B、70B，按实际模型要求设置。
${HUGGINFGFACE_DIR}：可选，开源HuggingFace权重目录，用于将开源权重内的配置文件，复制到转换后权重的输出目录中。

下面提供一个convert_weights_to_huggingface.py脚本的实际样例，供参考。

python scripts/tools/ckpt_convert/llama/convert_weights_to_huggingface.py \
        --input-model-dir /home/ma-user/ws/AscendCloud-3rdLLM-6.3.902/llm_train/AscendSpeed/ckpt/ckpt-llama2-13b-lora/iter_xxxxxxx \
        --output-model-dir /home/ma-user/ws/weight/ckpt-llama2-13b-lora-hf \
        --src-tensor-model-parallel-size 8 \
        --src-pipeline-model-parallel-size 1 \
        --type 13B \
        --org-huggingface-dir /home/ma-user/ws/tokenizers/llama2-13b-hf \
        --merge-mlp

日志中出现下列提示即表示合并转换完成。

Merging tp pp weight from path: {as_dir} ......
Merging weight complete!!!
Converting weight to huggingface......
Converting weight to huggingface complete!!!
Saving weight to path: {hf_dir}
huggingface weight saved to: {hf_dir}/pytorch_model.bin
Generating model index config......
model index config saved in: {hf_dir}/pytorch_model.bin.index.json
Generating weight config file from: {org_hf_dir}
config file copy from "{org_hf_dir}" complete!!!

转换完成后目录中的release文件夹内至少包含一个bin文件和一个bin.index.json文件，bin文件大小应和huggingface原始权重大小相似：

{hf_dir}
├── pytorch_model.bin
└── pytorch_model.bin.index.json

# 以下config文件需要指定org_huggingface_dir参数才会生成。

├── config.json
├── generation_config.json
├── gitattributes.txt
├── LICENSE.txt
├── README.md
├── special_tokens_map.json
├── tokenizer_config.json
├── tokenizer.json
├── tokenizer.model
├── USE_POLICY.md

上一篇：AI开发平台MODELARTS-数据处理:数据预处理

下一篇：AI开发平台MODELARTS-推理服务部署:Step5 启动推理服务