LLaMA-Factory#

Here we provide a script for supervised finetuning Qwen1.5 with LLaMA-Factory. This script for supervised finetuning (SFT) has the following features:

Support single-GPU and multi-GPU training;
Support full-parameter tuning, LoRA, Q-LoRA, Dora.

In the following, we introduce more details about the usage of the script.

Installation#

Before you start, make sure you have installed the following packages:

Follow the instructions of LLaMA-Factory, and build the environment.
Install these packages (Optional):

pip install deepspeed
pip install flash-attn --no-build-isolation

If you want to use FlashAttention-2, make sure your CUDA is 11.6 and above.

Data Preparation#

LLaMA-Factory provides several training datasets in data folder, you can use it directly. If you are using a custom dataset, please prepare your dataset as follow.

Organize your data in a json file and put your data in data folder. LLaMA-Factory supports dataset in alpaca or sharegpt format.

The dataset in alpaca format should follow the below format:

[
  {
    "instruction": "user instruction (required)",
    "input": "user input (optional)",
    "output": "model response (required)",
    "system": "system prompt (optional)",
    "history": [
      ["user instruction in the first round (optional)", "model response in the first round (optional)"],
      ["user instruction in the second round (optional)", "model response in the second round (optional)"]
    ]
  }
]

The dataset in sharegpt format should follow the below format:

[
  {
    "conversations": [
      {
        "from": "human",
        "value": "user instruction"
      },
      {
        "from": "gpt",
        "value": "model response"
      }
    ],
    "system": "system prompt (optional)",
    "tools": "tool description (optional)"
  }
]

Provide your dataset definition in data/dataset_info.json in the following format .

For alpaca format dataset, the columns in dataset_info.json should be:

"dataset_name": {
  "file_name": "dataset_name.json",
  "columns": {
    "prompt": "instruction",
    "query": "input",
    "response": "output",
    "system": "system",
    "history": "history"
  }
}

For sharegpt format dataset, the columns in dataset_info.json should be:

"dataset_name": {
    "file_name": "dataset_name.json",
    "formatting": "sharegpt",
    "columns": {
      "messages": "conversations",
      "system": "system",
      "tools": "tools"
    },
    "tags": {
      "role_tag": "from",
      "content_tag": "value",
      "user_tag": "user",
      "assistant_tag": "assistant"
    }
  }

Training#

Execute the following training command:

DISTRIBUTED_ARGS="
    --nproc_per_node $NPROC_PER_NODE \
    --nnodes $NNODES \
    --node_rank $NODE_RANK \
    --master_addr $MASTER_ADDR \
    --master_port $MASTER_PORT
  "

torchrun $DISTRIBUTED_ARGS src/train_bash.py \
    --deepspeed $DS_CONFIG_PATH \
    --stage sft \
    --do_train \
    --use_fast_tokenizer \
    --flash_attn \
    --model_name_or_path $MODEL_PATH \
    --dataset your_dataset \
    --template qwen \
    --finetuning_type lora \
    --lora_target q_proj,v_proj\
    --output_dir $OUTPUT_PATH \
    --overwrite_cache \
    --overwrite_output_dir \
    --warmup_steps 100 \
    --weight_decay 0.1 \
    --per_device_train_batch_size 4 \
    --gradient_accumulation_steps 4 \
    --ddp_timeout 9000 \
    --learning_rate 5e-6 \
    --lr_scheduler_type cosine \
    --logging_steps 1 \
    --cutoff_len 4096 \
    --save_steps 1000 \
    --plot_loss \
    --num_train_epochs 3 \
    --bf16

and enjoy the training process. To make changes to your training, you can modify the arguments in the training command to adjust the hyperparameters. One argument to note is cutoff_len, which is the maximum length of the training data. Control this parameter to avoid OOM error.

Merge LoRA#

If you train your model with LoRA, you probably need to merge adapter parameters to the main branch. Run the following command to perform the merging of LoRA adapters.

CUDA_VISIBLE_DEVICES=0 python src/export_model.py \
    --model_name_or_path path_to_base_model \
    --adapter_name_or_path path_to_adapter \
    --template default \
    --finetuning_type lora \
    --export_dir path_to_export \
    --export_size 2 \
    --export_legacy_format False

Conclusion#

The above content is the simplest way to use LLaMA-Factory to train Qwen. Feel free to dive into the details by checking the official repo!