You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: Optimize ORDER BY by Pruning Functionally Redundant Sort Keys (#21362)
## Which issue does this PR close?
- Closes#21361 .
## Rationale for this change
This PR adds functional-dependency-based simplification for `ORDER BY`
clauses. When an earlier sort key already functionally determines a
later key, the later key is redundant and can be removed without
changing query semantics. This reduces unnecessary sorting work and
avoids carrying extra sort keys through planning and execution.
## What changes are included in this PR?
This PR extends the existing functional dependency utilities with a
helper for pruning redundant sort keys, and wires that helper into
`eliminate_duplicated_expr` so `Sort` nodes can be simplified during
optimization. It also adds regression coverage for both the positive
case, where a trailing sort key is removed, and the negative case, where
sort order prevents pruning.
## Are these changes tested?
Yes. I added unit tests covering:
- removal of a functionally redundant trailing `ORDER BY` key
- preservation of ordering when the dependent column appears before its
determinant
I also ran `cargo test -p datafusion-optimizer eliminate_duplicated_expr
-- --nocapture` successfully, and `cargo fmt --all` passes.
## Are there any user-facing changes?
Yes, but only in query planning behavior. Some queries with redundant
`ORDER BY` keys may produce simpler plans and run more efficiently.
There are no public API changes.
---------
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
01)Sort: rn1 ASC NULLS LAST, aggregate_test_100.c9 ASC NULLS LAST, fetch=5
2440
+
01)Sort: rn1 ASC NULLS LAST, fetch=5
2443
2441
02)--Projection: aggregate_test_100.c9, row_number() ORDER BY [aggregate_test_100.c9 DESC NULLS FIRST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW AS rn1
2444
2442
03)----WindowAggr: windowExpr=[[row_number() ORDER BY [aggregate_test_100.c9 DESC NULLS FIRST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW]]
02)--ProjectionExec: expr=[c9@0 as c9, row_number() ORDER BY [aggregate_test_100.c9 DESC NULLS FIRST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW@1 as rn1]
2445
+
01)ProjectionExec: expr=[c9@0 as c9, row_number() ORDER BY [aggregate_test_100.c9 DESC NULLS FIRST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW@1 as rn1]
2446
+
02)--GlobalLimitExec: skip=0, fetch=5
2449
2447
03)----BoundedWindowAggExec: wdw=[row_number() ORDER BY [aggregate_test_100.c9 DESC NULLS FIRST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW: Field { "row_number() ORDER BY [aggregate_test_100.c9 DESC NULLS FIRST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW": UInt64 }, frame: RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW], mode=[Sorted]
01)Sort: rn1 ASC NULLS LAST, aggregate_test_100.c9 DESC NULLS FIRST, fetch=5
2515
+
01)Sort: rn1 ASC NULLS LAST, fetch=5
2518
2516
02)--Projection: aggregate_test_100.c9, row_number() ORDER BY [aggregate_test_100.c9 DESC NULLS FIRST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW AS rn1
2519
2517
03)----WindowAggr: windowExpr=[[row_number() ORDER BY [aggregate_test_100.c9 DESC NULLS FIRST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW]]
0 commit comments