函数调用¶

前言¶

使用大型语言模型进行函数调用 (Function Calling) 是一个庞大且不断发展的主题。这对AI应用尤为重要：

无论是为了绕过当前AI技术的局限性，而设计的原生AI应用，
还是为了提升性能、用户体验或效率，寻求整合AI技术的现有应用。

我们将讨论如何使用 Qwen3 来支持函数调用，以及如何利用它来实现您的目标，从用于开发应用程序的推理用法到针对硬核定制的内部工作机制。在本指南中，

我们首先将展示如何使用Qwen3进行函数调用。
接着，我们将介绍使用Qwen3行函数调用的技术细节，主要涉及模板的使用。

在开始之前，还有一件事我们尚未介绍，那就是…

什么是函数调用？¶

备注

这一概念也可能被称为“工具使用” (“tool use”)。虽然有人认为“工具”是“函数”的泛化形式，但在当前，它们的区别仅在技术层面上，表现为编程接口的不同输入输出类型。

大型语言模型（LLMs）确实强大。然而，有时候单靠大型语言模型的能力还是不够的。

一方面，大型语言模型存在建模局限性。首先，对于训练数据中没有的信息，包括训练结束后发生的事情，它们并不了解。此外，它们通过概率方式学习，这意味着对于有固定规则集的任务，如数学计算，可能不够精确。
另一方面，将大型语言模型作为即插即用服务与其它系统进行编程式协作，并非易事。大型语言模型的表达多含主观解释成分，因而产生歧义；而其他软件、应用或系统则通过预定义、固定和结构化的代码及编程接口进行沟通。

为此，函数调用确立了一个通用协议，规定了大型语言模型应与其他实体互动的流程。主要流程如下：

应用程序向大型语言模型提供一组函数及其使用说明。
大型语言模型根据用户查询，选择使用或不使用，或被迫使用一个或多个函数。
如果大型语言模型选择使用这些函数，它会根据函数说明如何使用。
应用程序按照选择使用这些函数，并获取结果。如果需要进一步互动，结果将提供给大型语言模型。

大型语言模型（LLMs）有许多方式来理解和遵循该协议。一如既往，关键在于提示工程或模型已内化的模板。我们建议对 Qwen3 使用 Hermes 风格的工具调用方法，以最大化函数调用性能。

使用函数调用进行推理¶

由于函数调用本质上是通过提示工程实现的，您可以手动构建Qwen3模型的输入。但是，支持函数调用的框架可以帮助您完成所有繁重的工作。

接下来，我们将介绍（通过专用的函数调用模板）使用

Qwen-Agent，
vLLM。

案例¶

我们同样通过一个示例来展示推理的使用方法。假设我们使用的编程语言是Python 3.11。

场景：假设我们要询问模型某个地点的温度。通常，模型会回答无法提供实时信息。但我们有两个工具，可以分别获取城市的当前温度和指定日期的温度，我们希望模型能够利用这些工具。

为了这个示例案例，您可以使用以下代码：

准备代码

import json

def get_current_temperature(location: str, unit: str = "celsius"):
    """Get current temperature at a location.

    Args:
        location: The location to get the temperature for, in the format "City, State, Country".
        unit: The unit to return the temperature in. Defaults to "celsius". (choices: ["celsius", "fahrenheit"])

    Returns:
        the temperature, the location, and the unit in a dict
    """
    return {
        "temperature": 26.1,
        "location": location,
        "unit": unit,
    }


def get_temperature_date(location: str, date: str, unit: str = "celsius"):
    """Get temperature at a location and date.

    Args:
        location: The location to get the temperature for, in the format "City, State, Country".
        date: The date to get the temperature for, in the format "Year-Month-Day".
        unit: The unit to return the temperature in. Defaults to "celsius". (choices: ["celsius", "fahrenheit"])

    Returns:
        the temperature, the location, the date and the unit in a dict
    """
    return {
        "temperature": 25.9,
        "location": location,
        "date": date,
        "unit": unit,
    }


def get_function_by_name(name):
    if name == "get_current_temperature":
        return get_current_temperature
    if name == "get_temperature_date":
        return get_temperature_date

TOOLS = [
    {
        "type": "function",
        "function": {
            "name": "get_current_temperature",
            "description": "Get current temperature at a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": 'The location to get the temperature for, in the format "City, State, Country".',
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": 'The unit to return the temperature in. Defaults to "celsius".',
                    },
                },
                "required": ["location"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "get_temperature_date",
            "description": "Get temperature at a location and date.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": 'The location to get the temperature for, in the format "City, State, Country".',
                    },
                    "date": {
                        "type": "string",
                        "description": 'The date to get the temperature for, in the format "Year-Month-Day".',
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": 'The unit to return the temperature in. Defaults to "celsius".',
                    },
                },
                "required": ["location", "date"],
            },
        },
    },
]
MESSAGES = [
    {"role": "user",  "content": "What's the temperature in San Francisco now? How about tomorrow? Current Date: 2024-09-30."},
]

工具应使用JSON Schema进行描述，消息应包含尽可能多的有效信息。您可以在下面找到工具和消息的解释：

Qwen-Agent¶

Qwen-Agent 实际上是一个用于开发AI应用的Python智能体框架。尽管其设计用例比高效推理更高级，但它确实包含了Qwen3函数调用的规范实现。基于OpenAI兼容API，它可以通过模板为Qwen3提供了对用户透明的的函数调用能力。

It is worth noting that for reasoning models like Qwen3, it is not recommended to use tool call template based on stopwords, such as ReAct, because the model may output stopwords in the thought section, potentially leading to unexpected behavior in tool calls.

在开始之前，让我们确保已安装了最新的库：

pip install -U qwen-agent

准备工作¶

Qwen-Agent可以封装一个不支持函数调用的OpenAI兼容API。您可以使用大多数推理框架来提供此类API，或者从DashScope或Together等云提供商处获取一个。

假设在http://localhost:8000/v1处有一个OpenAI兼容API，Qwen-Agent提供了一个快捷函数get_chat_model，用于获取具有函数调用支持的模型推理类：

from qwen_agent.llm import get_chat_model

llm = get_chat_model({
    "model": "Qwen/Qwen3-8B",
    "model_server": "http://localhost:8000/v1",
    "api_key": "EMPTY",
    "generate_cfg": {
      "extra_body": {
        "chat_template_kwargs": {"enable_thinking": False}  # default to True
      }
    }
})

在上述代码中，model_server 是其他兼容 OpenAI 的 API 客户端常用的 api_base。建议提供 api_key（但不要以代码中的明文形式提供），即使 API 服务器不检查它，在这种情况下，您可以将其设置为任意值。您可以通过 generate_cfg 将模型参数传递给模型。在此我们演示如何控制 Qwen3 的思考与非思考模式。不同的 API 可能有不同的控制方法。

对于模型输入，应使用系统、用户和助手历史记录的通用消息结构：

messages = MESSAGES[:]

目前，Qwen-Agent使用“函数”而非“工具”。这需要对我们工具描述进行一些小的更改，即提取函数字段：

functions = [tool["function"] for tool in TOOLS]

工具调用和工具结果¶

为了与模型交互，应使用chat方法：

for responses in llm.chat(
    messages=messages,
    functions=functions,
):
    pass
messages.extend(responses)

chat方法返回一个列表的生成器，每个列表可能包含多条消息。

The results of no_think mode:

[
    {"role": "assistant", "content": "", "function_call": {"name": "get_current_temperature", "arguments": "{\"location\": \"San Francisco, California, United States\", \"unit\": \"celsius\"}"}},
    {"role": "assistant", "content": "", "function_call": {"name": "get_temperature_date", "arguments": "{\"location\": \"San Francisco, California, United States\", \"date\": \"2024-10-01\", \"unit\": \"celsius\"}"}},
]

The results of think mode:

[
    {"role": "assistant", "content": "", "reasoning_content": "Okay, the user is asking for the current temperature in San Francisco and the temperature for tomorrow. Let me check the available tools.\n\nFirst, there's the get_current_temperature function. It requires the location and optionally the unit. Since the user didn't specify the unit, I'll default to celsius. The location should be \"San Francisco, State, Country\". Wait, the example format is \"City, State, Country\", but San Francisco is a city in California, USA. So the location parameter would be \"San Francisco, California, United States\".\n\nThen, for tomorrow's temperature, the user mentioned the current date is 2024-09-30, so tomorrow would be 2024-10-01. The get_temperature_date function requires location, date, and unit. Again, using the same location and default unit. I need to format the date as \"Year-Month-Day\", which is 2024-10-01.\n\nWait, the current date given is 2024-09-30. If today is September 30, then tomorrow is October 1st. So the date parameter for the second function call should be \"2024-10-01\".\n\nI should make two separate function calls: one for the current temperature and another for tomorrow's date. Let me structure the JSON for both tool calls accordingly."},
    {"role": "assistant", "content": "", "function_call": {"name": "get_current_temperature", "arguments": "{\"location\": \"San Francisco, California, United States\", \"unit\": \"celsius\"}"}},
    {"role": "assistant", "content": "", "function_call": {"name": "get_temperature_date", "arguments": "{\"location\": \"San Francisco, California, United States\", \"date\": \"2024-10-01\", \"unit\": \"celsius\"}"}},
]

我们可以看到，Qwen-Agent试图以更易于使用的结构化格式解析模型生成。与函数调用相关的详细信息被放置在消息的function_call字段中：

name：代表要调用的函数的字符串
arguments：表示函数应带有的参数的JSON格式字符串

In the thinking mode, it will first generate a thought and then generate the tool call(s).

接下来是关键部分——检查和应用函数调用：

for message in responses:
    if fn_call := message.get("function_call", None):
        fn_name: str = fn_call['name']
        fn_args: dict = json.loads(fn_call["arguments"])

        fn_res: str = json.dumps(get_function_by_name(fn_name)(**fn_args))

        messages.append({
            "role": "function",
            "name": fn_name,
            "content": fn_res,
        })

获取工具结果：

第1行：我们应该按模型生成它们的顺序迭代函数调用。
第2行：通过检查生成消息的function_call字段，我们可以查看是否需要按模型判断进行函数调用。
第3-4行：相关详情，包括函数名称和参数，也可以在那里找到，分别是name和arguments。
第6行：有了这些细节，应该调用函数并获取结果。这里，我们假设有一个名为get_function_by_name的函数来帮助我们根据名称获取相关函数。
第8-12行：获得结果后，将函数结果作为content添加到消息中，并将role设置为"function"。

Now the messages are:

no_think mode:

[
    {"role": "user", "content": "What's the temperature in San Francisco now? How about tomorrow? Current Date: 2024-09-30."},
    {"role": "assistant", "content": "", "function_call": {"name": "get_current_temperature", "arguments": "{\"location\": \"San Francisco, California, United States\", \"unit\": \"celsius\"}"}},
    {"role": "assistant", "content": "", "function_call": {"name": "get_temperature_date", "arguments": "{\"location\": \"San Francisco, California, United States\", \"date\": \"2024-10-01\", \"unit\": \"celsius\"}"}},
    {"role": "function", "name": "get_current_temperature", "content": '{"temperature": 26.1, "location": "San Francisco, California, United States", "unit": "celsius"}'},
    {"role": "function", "name": "get_temperature_date", "content": '{"temperature": 25.9, "location": "San Francisco, California, United States", "date": "2024-10-01", "unit": "celsius"}'},
]

think mode:

[
    {"role": "user", "content": "What's the temperature in San Francisco now? How about tomorrow? Current Date: 2024-09-30."},
    {"role": "assistant", "content": "", "reasoning_content": "Okay, the user is asking for the current temperature in San Francisco and the temperature for tomorrow. Let me check the available tools.\n\nFirst, there's the get_current_temperature function. It requires the location and optionally the unit. Since the user didn't specify the unit, I'll default to celsius. The location should be \"San Francisco, State, Country\". Wait, the example format is \"City, State, Country\", but San Francisco is a city in California, USA. So the location parameter would be \"San Francisco, California, United States\".\n\nThen, for tomorrow's temperature, the user mentioned the current date is 2024-09-30, so tomorrow would be 2024-10-01. The get_temperature_date function requires location, date, and unit. Again, using the same location and default unit. I need to format the date as \"Year-Month-Day\", which is 2024-10-01.\n\nWait, the current date given is 2024-09-30. If today is September 30, then tomorrow is October 1st. So the date parameter for the second function call should be \"2024-10-01\".\n\nI should make two separate function calls: one for the current temperature and another for tomorrow's date. Let me structure the JSON for both tool calls accordingly."},
    {"role": "assistant", "content": "", "function_call": {"name": "get_current_temperature", "arguments": "{\"location\": \"San Francisco, California, United States\", \"unit\": \"celsius\"}"}},
    {"role": "assistant", "content": "", "function_call": {"name": "get_temperature_date", "arguments": "{\"location\": \"San Francisco, California, United States\", \"date\": \"2024-10-01\", \"unit\": \"celsius\"}"}},
    {"role": "function", "name": "get_current_temperature", "content": '{"temperature": 26.1, "location": "San Francisco, California, United States", "unit": "celsius"}'},
    {"role": "function", "name": "get_temperature_date", "content": '{"temperature": 25.9, "location": "San Francisco, California, United States", "date": "2024-10-01", "unit": "celsius"}'},
]

最终响应¶

最后，再次运行模型以获取最终的模型结果：

for responses in llm.chat(messages=messages, functions=functions):
    pass
messages.extend(responses)

最终响应应如下所示

no_think mode:

[
    {"role": "assistant", "content": "The current temperature in San Francisco, CA, USA is **26.1°C**.  \n\nFor tomorrow (2024-10-01), the temperature is projected to be **25.9°C**.  \n\nThere is a slight decrease in temperature expected from today to tomorrow."}
]

think mode:

[
    {"role": "assistant", "content": "", "reasoning_content": "Okay, the user asked for the current temperature in San Francisco and tomorrow's temperature. I called the get_current_temperature function for now and get_temperature_date for tomorrow. The responses came back with 26.1°C today and 25.9°C tomorrow. Let me present this info clearly.\n\nFirst, confirm the location to make sure there's no confusion. The current temp is 26.1°C, so I'll state that. Then, tomorrow's date is 2024-10-01, which is October 1st, so I'll mention the date in a user-friendly way. The temp drops slightly to 25.9°C. I should note the unit is Celsius as per the default. Keep the answer concise but informative. Maybe add a brief note about the slight decrease. Make sure the dates are correctly formatted and the temperatures are accurate based on the data provided."}, 
    {"role": "assistant", "content": "The current temperature in San Francisco, CA, USA is **26.1°C**.  \n\nFor tomorrow (2024-10-01), the temperature is projected to be **25.9°C**.  \n\nThere is a slight decrease in temperature expected from today to tomorrow."}
]

vLLM¶

vLLM 是一个快速且易于使用的库，用于大型语言模型的推理和部署。它使用 transformers 中的分词器来格式化输入，因此我们在准备输入时应该不会遇到任何问题。此外，vLLM 还实现了辅助函数，以便在支持的情况下自动解析生成的工具调用。

vllm >= v0.8.5.

更多信息，请查阅 vLLM 文档

在本指南中，我们将使用 vllm 提供的 OpenAI 兼容 API，并通过 openai Python 库的 API 客户端来进行操作。

准备工作¶

对于 Qwen3，tokenizer_config.json 中的聊天模板已经包含了对 Hermes 风格工具调用的支持。我们只需要启动一个由 vLLM 提供的 OpenAI 兼容 API 即可：

vllm serve Qwen/Qwen3-8B --enable-auto-tool-choice --tool-call-parser hermes --reasoning-parser deepseek_r1

输入与准备代码中的相同：

tools = TOOLS
messages = MESSAGES

我们先初始化API客户端：

from openai import OpenAI

openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"

client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)

model_name = "Qwen/Qwen3-8B"

工具调用和工具结果¶

我们可以使用create chat completions endpoint直接查询底层API。以下是使用非思考模式的例子：

response = client.chat.completions.create(
    model=model_name,
    messages=messages,
    tools=tools,
    temperature=0.7,
    top_p=0.8,
    max_tokens=512,
    extra_body={
        "repetition_penalty": 1.05,
        "chat_template_kwargs": {"enable_thinking": False}  # default to True
    },
)

vLLM应当可以为我们解析工具调用，回复的主要字段(response.choices[0])应如下所示：

Choice(
    finish_reason='tool_calls', 
    index=0, 
    logprobs=None, 
    message=ChatCompletionMessage(
        content=None, 
        role='assistant', 
        function_call=None, 
        tool_calls=[
            ChatCompletionMessageToolCall(
                id='chatcmpl-tool-924d705adb044ff88e0ef3afdd155f15', 
                function=Function(arguments='{"location": "San Francisco, CA, USA"}', name='get_current_temperature'), 
                type='function',
            ), 
            ChatCompletionMessageToolCall(
                id='chatcmpl-tool-7e30313081944b11b6e5ebfd02e8e501', 
                function=Function(arguments='{"location": "San Francisco, CA, USA", "date": "2024-10-01"}', name='get_temperature_date'), 
                type='function',
            ),
        ],
    ), 
    stop_reason=None,
)

请注意这里函数的参数是JSON格式字符串，Qwen-Agent与其一致。

如前所述，有可能存在边界情况，模型生成了工具调用但格式不良也无法被解析。对于生产代码，我们需要尝试自行解析。

随后，我们可以调用工具并获得结果，然后将它们加入消息中：

messages.append(response.choices[0].message.model_dump())

if tool_calls := messages[-1].get("tool_calls", None):
    for tool_call in tool_calls:
        call_id: str = tool_call["id"]
        if fn_call := tool_call.get("function"):
            fn_name: str = fn_call["name"]
            fn_args: dict = json.loads(fn_call["arguments"])
        
            fn_res: str = json.dumps(get_function_by_name(fn_name)(**fn_args))

            messages.append({
                "role": "tool",
                "content": fn_res,
                "tool_call_id": call_id,
            })

这里需要注意OpenAI API使用tool_call_id字段来识别工具结果和工具调用间的联系。

现在消息如下：

[
    {'role': 'user', 'content': "What's the temperature in San Francisco now? How about tomorrow? Current Date: 2024-09-30."},
    {'content': None, 'role': 'assistant', 'function_call': None, 'tool_calls': [
        {'id': 'chatcmpl-tool-924d705adb044ff88e0ef3afdd155f15', 'function': {'arguments': '{"location": "San Francisco, CA, USA"}', 'name': 'get_current_temperature'}, 'type': 'function'},
        {'id': 'chatcmpl-tool-7e30313081944b11b6e5ebfd02e8e501', 'function': {'arguments': '{"location": "San Francisco, CA, USA", "date": "2024-10-01"}', 'name': 'get_temperature_date'}, 'type': 'function'},
    ]},
    {'role': 'tool', 'content': '{"temperature": 26.1, "location": "San Francisco, CA, USA", "unit": "celsius"}', 'tool_call_id': 'chatcmpl-tool-924d705adb044ff88e0ef3afdd155f15'},
    {'role': 'tool', 'content': '{"temperature": 25.9, "location": "San Francisco, CA, USA", "date": "2024-10-01", "unit": "celsius"}', 'tool_call_id': 'chatcmpl-tool-7e30313081944b11b6e5ebfd02e8e501'},
]

最终响应¶

让我们再次查询接口，以给模型提供工具结果并获得回复：

response = client.chat.completions.create(
    model=model_name,
    messages=messages,
    tools=tools,
    temperature=0.7,
    top_p=0.8,
    max_tokens=512,
    extra_body={
        "repetition_penalty": 1.05,
    },
)

messages.append(response.choices[0].message.model_dump())

最终响应 (response.choices[0].message.content)应如

The current temperature in San Francisco is approximately 26.1°C. For tomorrow, the forecasted temperature is around 25.9°C.

最后¶

无论你选择哪种方式在Qwen3中使用函数调用，请记住提示工程的限制和优势适用：

无法保证模型生成将始终遵循协议，即使有适当的提示或模板。特别是对于那些更复杂且更多依赖于模型本身思考和保持方向的模板，而非那些更简单且依赖于模板以及控制或特殊标记使用的模板。当然，后者需要某种训练。在生产代码中，要准备好如果出现问题，采取补救措施或修正措施。
如果在某些场景下，生成结果未达到预期，你可以细化模板以添加更多指令或约束。尽管这里提到的模板足够通用，但对于你的具体使用案例，它们可能不是最佳的、最具体的或最简洁的。最终解决方案是使用你自己的数据进行微调。

享受提示的乐趣吧！