MLX-LM¶
mlx-lm helps you run LLMs locally on Apple Silicon. It is available at MacOS. It has already supported Qwen models and this time, we have also provided checkpoints that you can directly use with it.
Prerequisites¶
The easiest way to get started is to install the mlx-lm
package:
with
pip
:pip install mlx-lm
with
conda
:conda install -c conda-forge mlx-lm
Running with Qwen MLX Files¶
We provide model checkpoints with mlx-lm
in our Hugging Face organization, and to search for what you need you can search the repo names with -MLX
.
Here provides a code snippet with apply_chat_template
to show you how to load the tokenizer and model and how to generate contents.
from mlx_lm import load, generate
model, tokenizer = load('Qwen/Qwen2.5-7B-Instruct-MLX', tokenizer_config={"eos_token": "<|im_end|>"})
prompt = "Give me a short introduction to large language model."
messages = [
{"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
response = generate(model, tokenizer, prompt=text, verbose=True, top_p=0.8, temp=0.7, repetition_penalty=1.05, max_tokens=512)
Make Your MLX files¶
You can make mlx files with just one command:
mlx_lm.convert --hf-path Qwen/Qwen2.5-7B-Instruct --mlx-path mlx/Qwen2.5-7B-Instruct/ -q
where
--hf-path
: the model name on Hugging Face Hub or the local path--mlx-path
: the path for output files-q
: enable quantization