Skip to content

Optimize scheduler for chunk prefill#7408

Open
rainyfly wants to merge 1 commit intoPaddlePaddle:developfrom
rainyfly:optimize_chunk_prefill_scheduler_for_long_input
Open

Optimize scheduler for chunk prefill#7408
rainyfly wants to merge 1 commit intoPaddlePaddle:developfrom
rainyfly:optimize_chunk_prefill_scheduler_for_long_input

Conversation

@rainyfly
Copy link
Copy Markdown
Collaborator

@rainyfly rainyfly commented Apr 15, 2026

Motivation

优化 Prefill 请求的调度逻辑。
当 某条请求触发首次chunk prefill时,会从 waiting 队列 -> running 队列。
如果后续 chunk prefill 的资源不满足,会把还在进行 chunk prefill 的请求,当做 decode 请求一样,触发抢占进行处理。
从而陷入 chunk prefill -> 抢占 -> chunk prefill 的循环,使得每个 decode step 都带上了 chunk prefill,性能严重下降。
需要解决长输入-> 短输出时候的调度问题

Modifications

对于 prefill 的请求,每次都判断 chunk prefill 的准入条件是否满足。在 running 队列里的 prefill 请求补充准入条件的判断,避免陷入资源不足时候,也一直 chunk prefill。

Usage or Command

None

Accuracy Tests

None

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@paddle-bot
Copy link
Copy Markdown

paddle-bot bot commented Apr 15, 2026

Thanks for your contribution!

Copy link
Copy Markdown

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Review | 2026-04-15

📋 Review 摘要

PR 概述:优化 chunk prefill 场景下的调度策略,移除抢占逻辑以减少上下文切换

变更范围fastdeploy/engine/sched/resource_manager_v1.py

影响面 Tag[Scheduler]

📝 PR 规范检查

  • 标题:缺少官方 Tag,建议修改为 [Scheduler] Optimize scheduler for chunk prefill[Optimization] Optimize scheduler for chunk prefill
  • 描述:Motivation 和 Modifications 为空,建议补充:

Motivation 示例:

  • Chunk prefill 长输入场景下,抢占策略会导致频繁的请求切换,增加调度开销
  • 通过跳过抢占可以减少上下文切换,提高吞吐量

Modifications 示例:

  • 新增 chunk_prefill_in_running_not_satisfied 标志
  • 当 running 请求无法满足 chunk prefill block 需求时,不再触发抢占,直接跳过后续 waiting 队列调度

问题

级别 文件 概述
❓ 疑问 resource_manager_v1.py:919 可能导致资源利用率降低
🟡 建议 resource_manager_v1.py:919 缺少测试用例

总体评价

变更逻辑清晰,但可能存在资源利用率降低的风险,建议作者评估场景影响并补充测试。

# Prepare prefill task
scheduled_reqs.append(self._prepare_prefill_task(request, num_new_tokens))
else: # Not enough blocks to allocate
chunk_prefill_in_running_not_satisfied = True
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ 疑问 此变更可能导致资源利用率降低:当某个 chunk prefill 请求因 block 不足而跳过时,等待队列中的其他请求(可能需要更少 block)也不会被调度。

是否考虑过以下场景:

  • Running 队列中有一个长输入的 chunk prefill 请求需要 1000 blocks
  • 系统剩余 500 blocks
  • 等待队列中有多个短输入请求,每个只需要 50 blocks

按照新逻辑,所有等待请求都无法被调度。是否需要更精细的判断?

# Prepare prefill task
scheduled_reqs.append(self._prepare_prefill_task(request, num_new_tokens))
else: # Not enough blocks to allocate
chunk_prefill_in_running_not_satisfied = True
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 缺少针对此变更的测试用例,建议添加测试覆盖以下场景:

  1. Running 队列中 chunk prefill 请求 block 不足时的行为
  2. 此时等待队列的调度是否正确被跳过
  3. 资源紧张情况下的吞吐量对比

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 66.66667% with 2 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@8995a38). Learn more about missing BASE report.

Files with missing lines Patch % Lines
fastdeploy/engine/sched/resource_manager_v1.py 66.66% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #7408   +/-   ##
==========================================
  Coverage           ?   73.84%           
==========================================
  Files              ?      383           
  Lines              ?    53630           
  Branches           ?     8415           
==========================================
  Hits               ?    39604           
  Misses             ?    11327           
  Partials           ?     2699           
Flag Coverage Δ
GPU 73.84% <66.66%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants