Skip to content

feat: add prefer_partial_sort config to enable PartialSortExec for bo…#21805

Open
SubhamSinghal wants to merge 3 commits intoapache:mainfrom
SubhamSinghal:partial-sort-bounded-inputs
Open

feat: add prefer_partial_sort config to enable PartialSortExec for bo…#21805
SubhamSinghal wants to merge 3 commits intoapache:mainfrom
SubhamSinghal:partial-sort-bounded-inputs

Conversation

@SubhamSinghal
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

  • New config option: datafusion.optimizer.prefer_partial_sort (default false) in OptimizerOptions
  • Optimizer change: replace_with_partial_sort in EnforceSorting now accepts ConfigOptions and allows bounded inputs
    when the config is enabled
  • Unit tests: Two new tests in enforce_sorting.rs — bounded input with config enabled (expects PartialSortExec), and
    bounded input with no common prefix (stays SortExec)
  • SLT tests: Three scenarios covering different prefix/suffix combinations:
    • prefix=2, suffix=1: sorted on (a, b), ORDER BY a, b, c
    • prefix=1, suffix=2: sorted on (a), ORDER BY a, b, c
    • prefix=3, suffix=1: sorted on (a, b, c), ORDER BY a, b, c, d

we can enable datafusion.optimizer.prefer_partial_sort by default after adding spill support.

Are these changes tested?

Yes.

  • 2 new unit tests in enforce_sorting.rs (bounded with config, no-prefix with config)
  • 1 new SLT test file partial_sort_bounded.slt with 3 scenarios covering correctness, EXPLAIN plans, LIMIT, and config reset
  • All existing tests pass unchanged (default is false, so no behavior change)

Are there any user-facing changes?

Yes — a new configuration option:

SET datafusion.optimizer.prefer_partial_sort = true;      
                                                                                                                               
When enabled, queries with ORDER BY that extend an existing sort prefix will use PartialSortExec instead of SortExec for       
bounded inputs (e.g., Parquet files with declared sort order). This reduces memory usage and enables streaming output. Default 
is false (existing behavior unchanged).                                                                                        

@github-actions github-actions Bot added documentation Improvements or additions to documentation optimizer Optimizer rules core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) common Related to common crate labels Apr 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to common crate core Core DataFusion crate documentation Improvements or additions to documentation optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant