Commit 9c6a35f
feat: add ExtractLeafExpressions optimizer rule for get_field pushdown (#20117)
## Summary
Adds a two-pass optimizer pipeline (`ExtractLeafExpressions` +
`PushDownLeafProjections`) that pushes cheap `MoveTowardsLeafNodes`
expressions (like `get_field` for struct field access) closer to data
sources, enabling source-level optimizations such as Parquet column
pruning for nested struct fields.
### Motivation
Previously, `get_field(s, 'label')` stayed in the top-level
`ProjectionExec`, forcing the scan to read the entire struct column `s`.
With this change, `get_field` is pushed all the way down to
`DataSourceExec`, allowing Parquet to read only the referenced
sub-columns.
### Example
```sql
SELECT id, s['label'] FROM t WHERE s['value'] > 150
```
**Before:** `get_field(s, 'label')` stayed in ProjectionExec, reading
full struct `s`
**After:** Both `get_field` expressions pushed to DataSourceExec:
```
DataSourceExec: projection=[get_field(s, value) as __datafusion_extracted_1, get_field(s, label) as __datafusion_extracted_2, id]
```
### How It Works
**Pass 1 — `ExtractLeafExpressions`** (top-down):
For non-projection nodes (Filter, Sort, Limit, Aggregate, Join),
extracts `MoveTowardsLeafNodes` sub-expressions into **extraction
projections** below the node with `__datafusion_extracted_N` aliases,
and adds **recovery projections** above to restore the original output
schema.
```text
-- Before:
Filter: user['status'] = 'active'
TableScan: t [id, user]
-- After:
Projection: id, user ← recovery projection
Filter: __datafusion_extracted_1 = 'active'
Projection: user['status'] AS __datafusion_extracted_1, id, user ← extraction projection
TableScan: t [id, user]
```
**Pass 2 — `PushDownLeafProjections`** (top-down):
Pushes extraction projections down through schema-preserving nodes
(Filter, Sort, Limit) and merges them into existing projections. Also
handles:
- **Mixed projections** containing `MoveTowardsLeafNodes`
sub-expressions — splits them into recovery + extraction, then pushes
the extraction down
- **Multi-input nodes** (Join, SubqueryAlias) — routes each extracted
expression to the correct input based on column references
- **SubqueryAlias** — remaps qualifiers from alias-space to input-space
before routing
After both passes, `OptimizeProjections` (which runs next) merges
consecutive projections and pushes `get_field` to the scan.
### Changes by file
- **`extract_leaf_expressions.rs`** (new, ~2800 lines): Both optimizer
rules plus extensive unit tests
- **`push_down_filter.rs`**: Teaches `PushDownFilter` to not push
filters through extraction projections (would undo the extraction by
rewriting `__datafusion_extracted_1 > 150` back to `get_field(s,'value')
> 150`). Adds 2 unit tests for this behavior.
- **`utils.rs`**: Adds `EXTRACTED_EXPR_PREFIX` constant and
`is_extracted_expr_projection()` detection helper
- **`optimizer.rs`**: Registers both new rules after
`CommonSubexprEliminate` and before `OptimizeProjections`
- **`expr.rs`**: Makes `Expr::Alias` delegate to the inner expression
for `placement()`, so aliases around `get_field` are correctly
classified as `MoveTowardsLeafNodes`
- **`test/mod.rs`**: Adds `test_table_scan_with_struct()` test helper
- **SLT files**: Updated explain plans reflecting extraction aliases in
logical plans and pushed-down `get_field` in physical plans
### Interaction with other optimizer rules
- **`PushDownFilter`**: Extraction projections are detected via
`is_extracted_expr_projection()` and filters are NOT pushed through them
- **`CommonSubexprEliminate`**: Runs before extraction; CSE aliases
(`__common_expr_N`) are preserved and correctly handled during merge
- **`OptimizeProjections`**: Runs after extraction; merges the recovery
+ extraction projections and pushes `get_field` to the scan
## Test plan
- [x] ~1400 lines of unit tests in `extract_leaf_expressions.rs`
covering: Filter, Sort, Limit, Aggregate, Join, SubqueryAlias, Union,
nested projections, deduplication, idempotency, mixed projections, and
multi-input routing
- [x] 2 new unit tests in `push_down_filter.rs` for
filter-through-extraction blocking
- [x] Updated sqllogictest expectations in `projection_pushdown.slt`,
`push_down_filter.slt`, `explain.slt`, `projection.slt`, `struct.slt`,
`unnest.slt`
- [x] All optimizer tests pass (`cargo test -p datafusion-optimizer`)
🤖 Generated with [Claude Code](https://claude.ai/code)
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>1 parent f48dc72 commit 9c6a35f
9 files changed
Lines changed: 1951 additions & 286 deletions
File tree
- datafusion
- optimizer/src
- optimize_projections
- sqllogictest/test_files
Large diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
559 | 559 | | |
560 | 560 | | |
561 | 561 | | |
562 | | - | |
| 562 | + | |
| 563 | + | |
| 564 | + | |
| 565 | + | |
| 566 | + | |
| 567 | + | |
| 568 | + | |
| 569 | + | |
| 570 | + | |
| 571 | + | |
| 572 | + | |
| 573 | + | |
| 574 | + | |
563 | 575 | | |
564 | 576 | | |
565 | 577 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
43 | 43 | | |
44 | 44 | | |
45 | 45 | | |
| 46 | + | |
46 | 47 | | |
47 | 48 | | |
48 | 49 | | |
| |||
260 | 261 | | |
261 | 262 | | |
262 | 263 | | |
| 264 | + | |
| 265 | + | |
263 | 266 | | |
264 | 267 | | |
265 | 268 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
197 | 197 | | |
198 | 198 | | |
199 | 199 | | |
| 200 | + | |
| 201 | + | |
200 | 202 | | |
201 | 203 | | |
202 | 204 | | |
| |||
219 | 221 | | |
220 | 222 | | |
221 | 223 | | |
| 224 | + | |
| 225 | + | |
222 | 226 | | |
223 | 227 | | |
224 | 228 | | |
| |||
558 | 562 | | |
559 | 563 | | |
560 | 564 | | |
| 565 | + | |
| 566 | + | |
561 | 567 | | |
562 | 568 | | |
563 | 569 | | |
| |||
580 | 586 | | |
581 | 587 | | |
582 | 588 | | |
| 589 | + | |
| 590 | + | |
583 | 591 | | |
584 | 592 | | |
585 | 593 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
165 | 165 | | |
166 | 166 | | |
167 | 167 | | |
168 | | - | |
| 168 | + | |
169 | 169 | | |
170 | 170 | | |
171 | 171 | | |
| |||
0 commit comments