xinference v2.4.0+vllm0.18.0 工具自动调用问题

### System Info / 系統信息

环境： xinference v2.4.0+vllm0.18.0 部署qwen3.5-Qwen3.5-35B-A3B-FP8
场景：agent挂多个工具，工具调用是串行的，系统提示词指定最先开始执行的工具，返回结果中指明下个工具，例如set_template、create_slide、write_slide、generate_slide。
现象：调用中断，不能完整执行上述流程。例如执行完create_slide，就不往下执行了。

注：上述逻辑在vllm0.18.0部署的模型上执行，可以完整、正常执行
CUDA_VISIBLE_DEVICES=2,3 \
PYTORCH_ALLOC_CONF=expandable_segments:True \
vllm serve /pdata/models/Qwen/Qwen3.5-35B-A3B-FP8 \
--served-model-name Qwen3.5-35B-A3B-FP8 \
--tensor-parallel-size 2 \
--gpu-memory-utilization 0.95 \
--max-num-batched-tokens 131072 \
--max-model-len 131072 \
--reasoning-parser qwen3 \
--enable-auto-tool-choice \
--tool-call-parser qwen3_coder \
--dtype auto \
--load-format safetensors \
--disable-log-stats \
--host 0.0.0.0 \
--port 8000 \
--disable-custom-all-reduce \
--attention-backend flash_attn \
--max-num-seqs 8 \
--enforce-eager

### Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

- [ ] docker / docker
- [ ] pip install / 通过 pip install 安装
- [ ] installation from source / 从源码安装

### Version info / 版本信息

v2.4.0

### The command used to start Xinference / 用以启动 xinference 的命令

xinference launch --model-name qwen3.5 --model-type LLM --n-gpu 2 --replica 1 --reasoning_content false --n-worker 1 --model-engine vLLM --model-format fp8 --size-in-billions 35 --quantization FP8 --gpu-idx 0,1 --model-path /pdata/models/Qwen/Qwen3.5-35B-A3B-FP8

### Reproduction / 复现过程

如上说明

### Expected behavior / 期待表现

与vllm单独启动的效果一致

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

xinference v2.4.0+vllm0.18.0 工具自动调用问题 #4761

System Info / 系統信息

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

Version info / 版本信息

The command used to start Xinference / 用以启动 xinference 的命令

Reproduction / 复现过程

Expected behavior / 期待表现

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

xinference v2.4.0+vllm0.18.0 工具自动调用问题 #4761

Description

System Info / 系統信息

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

Version info / 版本信息

The command used to start Xinference / 用以启动 xinference 的命令

Reproduction / 复现过程

Expected behavior / 期待表现

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions