Skip to content

Commit 78b9eac

Browse files
adriangbclaude
andcommitted
feat(pruning): add StatisticsSource trait with two-phase resolve/evaluate API
Introduces a new expression-based statistics API for pruning that separates async data resolution from sync predicate evaluation. - StatisticsSource trait: accepts &[Expr], returns Vec<Option<ArrayRef>> - ResolvedStatistics: HashMap<Expr, ArrayRef> cache for pre-resolved stats - PruningPredicate::evaluate(): sync evaluation against pre-resolved cache - PruningPredicate::all_required_expressions(): exposes needed Expr list - Blanket impl bridges existing PruningStatistics implementations - prune() refactored to delegate through resolve_all_sync + evaluate This enables async statistics sources (external metastores, runtime sampling) while keeping the evaluation path synchronous for use in Stream::poll_next() contexts like EarlyStoppingStream. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent fb12029 commit 78b9eac

5 files changed

Lines changed: 817 additions & 30 deletions

File tree

Cargo.lock

Lines changed: 3 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

datafusion/pruning/Cargo.toml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,9 +17,13 @@ workspace = true
1717

1818
[dependencies]
1919
arrow = { workspace = true }
20+
async-trait = { workspace = true }
2021
datafusion-common = { workspace = true, default-features = true }
2122
datafusion-datasource = { workspace = true }
23+
datafusion-expr = { workspace = true, default-features = true }
2224
datafusion-expr-common = { workspace = true, default-features = true }
25+
datafusion-functions-aggregate = { workspace = true, default-features = true }
26+
datafusion-functions-nested = { workspace = true, default-features = true }
2327
datafusion-physical-expr = { workspace = true }
2428
datafusion-physical-expr-common = { workspace = true }
2529
datafusion-physical-plan = { workspace = true }
@@ -30,3 +34,4 @@ datafusion-expr = { workspace = true }
3034
datafusion-functions-nested = { workspace = true }
3135
insta = { workspace = true }
3236
itertools = { workspace = true }
37+
tokio = { workspace = true }

datafusion/pruning/src/lib.rs

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,9 +19,11 @@
1919

2020
mod file_pruner;
2121
mod pruning_predicate;
22+
mod statistics;
2223

2324
pub use file_pruner::FilePruner;
2425
pub use pruning_predicate::{
2526
PredicateRewriter, PruningPredicate, PruningStatistics, RequiredColumns,
2627
UnhandledPredicateHook, build_pruning_predicate,
2728
};
29+
pub use statistics::{ResolvedStatistics, StatisticsSource};

0 commit comments

Comments
 (0)