量化模型效果评估¶
注意
Qwen2.5结果待更新
本部分介绍Qwen2量化模型(包括GPTQ与AWQ量化方案)的效果评估,有以下数据集
MMLU (准确率)
C-Eval (准确率)
IFEval (提示词级的严格准确率,Strict Prompt-Level Accuracy)
所有模型均使用贪心解码。
量化模型 |
平均 |
MMLU |
C-Eval |
IFEval |
|
---|---|---|---|---|---|
Qwen2-72B-Instruct |
BF16 |
81.3 |
82.3 |
83.8 |
77.6 |
GPTQ-Int8 |
80.7 |
81.3 |
83.4 |
77.5 |
|
GPTQ-Int4 |
81.2 |
80.8 |
83.9 |
78.9 |
|
AWQ |
80.4 |
80.5 |
83.9 |
76.9 |
|
Qwen2-7B-Instruct |
BF16 |
66.9 |
70.5 |
77.2 |
53.1 |
GPTQ-Int8 |
66.2 |
69.1 |
76.7 |
52.9 |
|
GPTQ-Int4 |
64.1 |
67.8 |
75.2 |
49.4 |
|
AWQ |
64.1 |
67.4 |
73.6 |
51.4 |
|
Qwen2-1.5B-Instruct |
BF16 |
48.4 |
52.4 |
63.8 |
29.0 |
GPTQ-Int8 |
48.1 |
53.0 |
62.5 |
28.8 |
|
GPTQ-Int4 |
45.0 |
50.7 |
57.4 |
27.0 |
|
AWQ |
46.5 |
51.6 |
58.1 |
29.9 |
|
Qwen2-0.5B-Instruct |
BF16 |
34.4 |
37.9 |
45.2 |
20.0 |
GPTQ-Int8 |
32.6 |
35.6 |
43.9 |
18.1 |
|
GPTQ-Int4 |
29.7 |
33.0 |
39.2 |
16.8 |
|
AWQ |
31.1 |
34.4 |
42.1 |
16.7 |