OpenLLM ======= OpenLLM allows developers to run Qwen2.5 models of different sizes as OpenAI-compatible APIs with a single command. It features a built-in chat UI, state-of-the-art inference backends, and a simplified workflow for creating enterprise-grade cloud deployment with Qwen2.5. Visit `the OpenLLM repository `_ to learn more. Installation ------------ Install OpenLLM using ``pip``. .. code:: bash pip install openllm Verify the installation and display the help information: .. code:: bash openllm --help Quickstart ---------- Before you run any Qwen2.5 model, ensure your model repository is up to date by syncing it with OpenLLM's latest official repository. .. code:: bash openllm repo update List the supported Qwen2.5 models: .. code:: bash openllm model list --tag qwen2.5 The results also display the required GPU resources and supported platforms: .. code:: bash model version repo required GPU RAM platforms ------- --------------------- ------- ------------------ ----------- qwen2.5 qwen2.5:0.5b default 12G linux qwen2.5:1.5b default 12G linux qwen2.5:3b default 12G linux qwen2.5:7b default 24G linux qwen2.5:14b default 80G linux qwen2.5:14b-ggml-q4 default macos qwen2.5:14b-ggml-q8 default macos qwen2.5:32b default 80G linux qwen2.5:32b-ggml-fp16 default macos qwen2.5:72b default 80Gx2 linux qwen2.5:72b-ggml-q4 default macos To start a server with one of the models, use ``openllm serve`` like this: .. code:: bash openllm serve qwen2.5:7b By default, the server starts at ``http://localhost:3000/``. Interact with the model server ------------------------------ With the model server up and running, you can call its APIs in the following ways: .. tab-set:: .. tab-item:: CURL Send an HTTP request to its ``/generate`` endpoint via CURL: .. code-block:: bash curl -X 'POST' \ 'http://localhost:3000/api/generate' \ -H 'accept: text/event-stream' \ -H 'Content-Type: application/json' \ -d '{ "prompt": "Tell me something about large language models.", "model": "Qwen/Qwen2.5-7B-Instruct", "max_tokens": 2048, "stop": null }' .. tab-item:: Python client Call the OpenAI-compatible endpoints with frameworks and tools that support the OpenAI API protocol. Here is an example: .. code-block:: python from openai import OpenAI client = OpenAI(base_url='http://localhost:3000/v1', api_key='na') # Use the following func to get the available models # model_list = client.models.list() # print(model_list) chat_completion = client.chat.completions.create( model="Qwen/Qwen2.5-7B-Instruct", messages=[ { "role": "user", "content": "Tell me something about large language models." } ], stream=True, ) for chunk in chat_completion: print(chunk.choices[0].delta.content or "", end="") .. tab-item:: Chat UI OpenLLM provides a chat UI at the ``/chat`` endpoint for the LLM server at http://localhost:3000/chat. .. image:: ../../source/assets/qwen-openllm-ui-demo.png Model repository ---------------- A model repository in OpenLLM represents a catalog of available LLMs. You can add your own repository to OpenLLM with custom Qwen2.5 variants for your specific needs. See our `documentation to learn details `_.