System Info / 系統信息
环境: xinference v2.4.0+vllm0.17.0
Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?
Version info / 版本信息
xprobe/xinference:v2.4.0
docker run --rm xprobe/xinference:v2.4.0 pip list | grep vllm
vllm 0.17.1
The command used to start Xinference / 用以启动 xinference 的命令
curl -X POST http://localhost:9997/v1/models
-H "Content-Type: application/json"
-d '{
"model_uid": "qwen3-vl-embed-2b",
"model_name": "Qwen3-VL-Embedding-2B",
"model_type": "embedding",
"model_format": "pytorch",
"model_engine": "vllm",
"enable_virtual_env": true,
"virtual_env_packages": [],
"envs": {
"CUDA_VISIBLE_DEVICES": "0",
"XINFERENCE_MODEL_SRC": "modelscope"
},
"vllm_engine_kwargs": {
"gpu_memory_utilization": 0.85,
"max_model_len": 4096,
"enforce_eager": true
}
}'
Reproduction / 复现过程
annot import name 'Qwen3VLForConditionalGeneration' from 'transformers' (/usr/local/lib/python3.12/dist-packages/transformers/init.py)
Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/xinference/api/restful_api.py", line 560, in launch_model
model_uid = await (await self._get_supervisor_ref()).launch_builtin_model(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/xoscar/backends/context.py", line 262, in send
return self._process_result_message(result)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/xoscar/backends/context.py", line 111, in _process_result_message
raise message.as_instanceof_cause()
Expected behavior / 期待表现
怎么显示是0.17,是拉取的不对吗
System Info / 系統信息
环境: xinference v2.4.0+vllm0.17.0
Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?
Version info / 版本信息
xprobe/xinference:v2.4.0
docker run --rm xprobe/xinference:v2.4.0 pip list | grep vllm
vllm 0.17.1
The command used to start Xinference / 用以启动 xinference 的命令
curl -X POST http://localhost:9997/v1/models
-H "Content-Type: application/json"
-d '{
"model_uid": "qwen3-vl-embed-2b",
"model_name": "Qwen3-VL-Embedding-2B",
"model_type": "embedding",
"model_format": "pytorch",
"model_engine": "vllm",
"enable_virtual_env": true,
"virtual_env_packages": [],
"envs": {
"CUDA_VISIBLE_DEVICES": "0",
"XINFERENCE_MODEL_SRC": "modelscope"
},
"vllm_engine_kwargs": {
"gpu_memory_utilization": 0.85,
"max_model_len": 4096,
"enforce_eager": true
}
}'
Reproduction / 复现过程
annot import name 'Qwen3VLForConditionalGeneration' from 'transformers' (/usr/local/lib/python3.12/dist-packages/transformers/init.py)
Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/xinference/api/restful_api.py", line 560, in launch_model
model_uid = await (await self._get_supervisor_ref()).launch_builtin_model(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/xoscar/backends/context.py", line 262, in send
return self._process_result_message(result)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/xoscar/backends/context.py", line 111, in _process_result_message
raise message.as_instanceof_cause()
Expected behavior / 期待表现
怎么显示是0.17,是拉取的不对吗