MLX LM¶

注意

仍需为Qwen3更新。

mlx-lm helps you run LLMs locally on Apple Silicon. It is available at macOS. It has already supported Qwen models and this time, we have also provided checkpoints that you can directly use with it.

准备工作¶

首先需要安装mlx-lm包：

使用pip：
```
pip install mlx-lm
```
使用conda：
```
conda install -c conda-forge mlx-lm
```

Running with Qwen MLX Files¶

我们已在Hugging Face提供了适用于mlx-lm的模型文件，请搜索带-MLX的存储库。

这里我们展示了一个代码样例，其中使用了apply_chat_template来应用对话模板。

from mlx_lm import load, generate

model, tokenizer = load('Qwen/Qwen2.5-7B-Instruct-MLX', tokenizer_config={"eos_token": "<|im_end|>"})

prompt = "Give me a short introduction to large language models."
messages = [
    {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

response = generate(model, tokenizer, prompt=text, verbose=True, top_p=0.8, temp=0.7, repetition_penalty=1.05, max_tokens=512)

自行制作MLX格式模型¶

You can make MLX files with just one command:

mlx_lm.convert --hf-path Qwen/Qwen2.5-7B-Instruct --mlx-path mlx/Qwen2.5-7B-Instruct/ -q

参数含义分别是

--hf-path: Hugging Face Hub上的模型名或本地路径
--mlx-path: 输出模型文件的存储路径
-q: 启用量化