Commit 23b88fb
authored
Allow filters on struct fields to be pushed down into Parquet scan (#20822)
## Which issue does this PR close?
- Related to #20603
## Rationale for this change
This PR enables Parquet row-level filter pushdown for struct field
access expressions, which previously fell back to a full scan followed
by a separate filtering pass, a significant perf penalty for queries
filtering on struct fields in large Parquet files (like Variant types!)
Filters on struct fields like `WHERE s['foo'] > 67` were not being
pushed into the Parquet decoder. This is because `PushdownChecker` sees
the underlying `Column("s")` has a `Struct` type and unconditionally
rejects it, without considering that `get_field` resolves to a primitive
leaf. With this change, deeply nested access like `s['outer']['inner']`
will also get pushed down because the logical simplifier flattens it
before it reaches the physical plan
Note: this does not address the projection side and should not be
blocked by it. `SELECT s['foo']` still reads the entire struct rather
than just the needed leaf column. That requires separate changes to how
the opener builds its projection mask.1 parent af79d14 commit 23b88fb
4 files changed
Lines changed: 336 additions & 20 deletions
File tree
- datafusion
- datasource-parquet
- src
- sqllogictest/test_files
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
39 | 39 | | |
40 | 40 | | |
41 | 41 | | |
| 42 | + | |
42 | 43 | | |
43 | 44 | | |
44 | 45 | | |
| |||
0 commit comments