Skip to content

Commit ff47701

Browse files
authored
[BugFix][PD Disaggregation][KVCache] Fix low cache hit rate in PD split scenario (#7364)
## Motivation 在 PD 分离场景下,decode 节点在接收 prefill 节点转发的请求后,没有及时更新 cache block 的命中信息, 导致 prefix cache 命中率低,影响推理性能。 ## Modifications 1. 在 `_free_blocks_when_stop` 方法中,额外排除 prefill 节点(`splitwise_role == "prefill"`) 的 cache block 更新,避免 prefill 节点重复更新 cache 导致状态混乱。 2. 在 decode 节点分配请求(`_alloc_requests_with_cache`)成功后,主动调用 `update_cache_blocks` 使用 `need_prefill_tokens` 更新 cache block 信息, 确保 decode 节点能正确感知已命中的 prefix cache。
1 parent 9c23e61 commit ff47701

File tree

1 file changed

+6
-0
lines changed

1 file changed

+6
-0
lines changed

fastdeploy/engine/sched/resource_manager_v1.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -927,6 +927,7 @@ def _allocate_decode_and_extend():
927927
if (
928928
self.config.cache_config.enable_prefix_caching
929929
and self.config.scheduler_config.splitwise_role != "decode"
930+
and self.config.scheduler_config.splitwise_role != "prefill"
930931
):
931932
self.cache_manager.update_cache_blocks(
932933
request, self.config.cache_config.block_size, request.num_computed_tokens
@@ -1374,6 +1375,11 @@ def preallocate_resource_in_p(self, request: Request):
13741375
self.stop_flags[request.idx] = False
13751376
self.requests[request.request_id] = request
13761377
self.req_dict[request.request_id] = allocated_position
1378+
1379+
self.cache_manager.update_cache_blocks(
1380+
request, self.config.cache_config.block_size, request.need_prefill_tokens
1381+
)
1382+
13771383
return True
13781384
else:
13791385
self._free_blocks(request)

0 commit comments

Comments
 (0)