ms-swift¶
Introduction to ms-swift SFT¶
ms-swift is the official large model and multimodal model training and deployment framework provided by the ModelScope community.
GitHub repository: ms-swift
The SFT script in ms-swift has the following features:
Flexible training options: single-GPU and multi-GPU support
Efficient tuning methods: full-parameter, LoRA, Q-LoRA, and Dora
Broad model compatibility: supports various LLM and MLLM architectures
For detailed model compatibility, see: Supported Models
Environment Setup¶
Follow the instructions of ms-swift, and build the environment.
Optional packages for advanced features:
pip install deepspeed # For multi-GPU training pip install flash-attn --no-build-isolation
Data Preparation¶
ms-swift supports multiple dataset formats:
# Standard messages format
{"messages": [
{"role": "system", "content": "<system-prompt>"},
{"role": "user", "content": "<query1>"},
{"role": "assistant", "content": "<response1>"}
]}
# ShareGPT conversation format
{"system": "<system-prompt>", "conversation": [
{"human": "<query1>", "assistant": "<response1>"},
{"human": "<query2>", "assistant": "<response2>"}
]}
# Instruction tuning format
{"system": "<system-prompt>",
"instruction": "<task-instruction>",
"input": "<additional-context>",
"output": "<expected-response>"}
# Multimodal format (supports images, audio, video)
{"messages": [
{"role": "user", "content": "<image>Describe this image"},
{"role": "assistant", "content": "<description>"}
], "images": ["/path/to/image.jpg"]}
For complete dataset formatting guidelines, see: Custom Dataset Documentation
Pre-built datasets are available at: Supported Datasets
Training Examples¶
Single-GPU Training¶
LLM Example (Qwen2.5-7B-Instruct):
# 19GB
CUDA_VISIBLE_DEVICES=0 \
swift sft \
--model Qwen/Qwen2.5-7B-Instruct \
--dataset 'AI-ModelScope/alpaca-gpt4-data-zh' \
--train_type lora \
--lora_rank 8 \
--lora_alpha 32 \
--target_modules all-linear \
--torch_dtype bfloat16 \
--num_train_epochs 1 \
--per_device_train_batch_size 1 \
--gradient_accumulation_steps 16 \
--learning_rate 1e-4 \
--max_length 2048 \
--eval_steps 50 \
--save_steps 50 \
--save_total_limit 2 \
--logging_steps 5 \
--output_dir output \
--system 'You are a helpful assistant.' \
--warmup_ratio 0.05 \
--dataloader_num_workers 4 \
--attn_impl flash_attn
MLLM Example (Qwen2.5-VL-7B-Instruct):
# 18GB
CUDA_VISIBLE_DEVICES=0 \
MAX_PIXELS=602112 \
swift sft \
--model Qwen/Qwen2.5-VL-7B-Instruct \
--dataset 'AI-ModelScope/LaTeX_OCR:human_handwrite' \
--train_type lora \
--torch_dtype bfloat16 \
--num_train_epochs 1 \
--per_device_train_batch_size 1 \
--gradient_accumulation_steps 16 \
--learning_rate 1e-4 \
--max_length 2048 \
--eval_steps 200 \
--save_steps 200 \
--save_total_limit 5 \
--logging_steps 5 \
--output_dir output \
--warmup_ratio 0.05 \
--dataloader_num_workers 4
Multi-GPU Training¶
LLM Example with DeepSpeed:
# 18G*8
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
NPROC_PER_NODE=8 \
nohup swift sft \
--model Qwen/Qwen2.5-7B-Instruct \
--dataset 'AI-ModelScope/alpaca-gpt4-data-zh' \
--train_type lora \
--lora_rank 8 \
--lora_alpha 32 \
--target_modules all-linear \
--torch_dtype bfloat16 \
--deepspeed zero2 \
--per_device_train_batch_size 1 \
--gradient_accumulation_steps 16 \
--learning_rate 1e-4 \
--max_length 2048 \
--num_train_epochs 1 \
--output_dir output \
--attn_impl flash_attn
MLLM Example with DeepSpeed:
# 17G*8
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
NPROC_PER_NODE=8 \
MAX_PIXELS=602112 \
nohup swift sft \
--model Qwen/Qwen2.5-VL-7B-Instruct \
--dataset 'AI-ModelScope/LaTeX_OCR:human_handwrite' \
--train_type lora \
--deepspeed zero2 \
--per_device_train_batch_size 1 \
--gradient_accumulation_steps 8 \
--learning_rate 2e-5 \
--max_length 4096 \
--num_train_epochs 2 \
--output_dir output \
--attn_impl flash_attn
Model Export¶
Merge LoRA Adapters:
swift export \
--adapters output/checkpoint-xxx \
--merge_lora true
Push to ModelScope Hub:
swift export \
--adapters output/checkpoint-xxx \
--push_to_hub true \
--hub_model_id '<your-namespace>/<model-name>' \
--hub_token '<your-access-token>'