Skip to content

Commit 9b91988

Browse files
committed
Clarify enable_expression_analyzer scope in config and doc comments
1 parent 3f10a1d commit 9b91988

8 files changed

Lines changed: 40 additions & 25 deletions

File tree

datafusion/common/src/config.rs

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1113,11 +1113,12 @@ config_namespace! {
11131113
/// So if you disable `enable_topk_dynamic_filter_pushdown`, then enable `enable_dynamic_filter_pushdown`, the `enable_topk_dynamic_filter_pushdown` will be overridden.
11141114
pub enable_dynamic_filter_pushdown: bool, default = true
11151115

1116-
/// When set to true, the physical planner will use the ExpressionAnalyzer
1116+
/// When set to true, the physical planner uses the ExpressionAnalyzer
11171117
/// framework for expression-level statistics estimation (NDV, selectivity,
1118-
/// min/max, null fraction). When false, existing behavior without
1119-
/// expression-level statistics support is used.
1120-
pub enable_expression_analyzer: bool, default = false
1118+
/// min/max, null fraction). When `use_statistics_registry` is also enabled,
1119+
/// the registry providers (filters, projections) also use it.
1120+
/// When false, existing behavior is unchanged.
1121+
pub use_expression_analyzer: bool, default = false
11211122

11221123
/// When set to true, the optimizer will insert filters before a join between
11231124
/// a nullable and non-nullable column to filter out nulls on the nullable side. This

datafusion/core/src/physical_planner.rs

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2865,7 +2865,7 @@ impl DefaultPhysicalPlanner {
28652865
session_state
28662866
.config_options()
28672867
.optimizer
2868-
.enable_expression_analyzer
2868+
.use_expression_analyzer
28692869
.then(|| Arc::clone(session_state.expression_analyzer_registry()))
28702870
}
28712871

@@ -2949,7 +2949,7 @@ impl DefaultPhysicalPlanner {
29492949
if session_state
29502950
.config_options()
29512951
.optimizer
2952-
.enable_expression_analyzer
2952+
.use_expression_analyzer
29532953
{
29542954
new_proj_exec = new_proj_exec.with_expression_analyzer_registry(
29552955
Arc::clone(session_state.expression_analyzer_registry()),

datafusion/physical-expr/src/projection.rs

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -201,12 +201,9 @@ impl ProjectionExprs {
201201

202202
/// Set the expression analyzer registry for statistics estimation.
203203
///
204-
/// The physical planner injects the registry from `SessionState` when
205-
/// creating projections. Projections created later by optimizer rules
206-
/// do not receive the registry and fall back to
207-
/// `DefaultExpressionAnalyzer`. Propagating the registry to all
208-
/// operator construction sites requires an operator-level statistics
209-
/// registry, which is orthogonal to this work.
204+
/// The physical planner injects the registry at plan creation time and
205+
/// re-injects it after each physical optimizer rule, so projections
206+
/// created by optimizer rules also receive the registry.
210207
pub fn with_expression_analyzer_registry(
211208
mut self,
212209
registry: Arc<ExpressionAnalyzerRegistry>,
@@ -883,6 +880,11 @@ impl Projector {
883880
) {
884881
self.projection.expression_analyzer_registry = Some(registry);
885882
}
883+
884+
/// Get the expression analyzer registry, if set
885+
pub fn expression_analyzer_registry(&self) -> Option<&ExpressionAnalyzerRegistry> {
886+
self.projection.expression_analyzer_registry.as_deref()
887+
}
886888
}
887889

888890
/// Describes an immutable reference counted projection.

datafusion/physical-plan/src/filter.rs

Lines changed: 13 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -183,13 +183,12 @@ impl FilterExecBuilder {
183183

184184
/// Set the expression analyzer registry for selectivity estimation.
185185
///
186-
/// Same limitation as [`ProjectionExprs::with_expression_analyzer_registry`]:
187-
/// the planner injects this from `SessionState`, but filters created
188-
/// by optimizer rules (e.g., filter pushdown into unions) fall back to
189-
/// the default selectivity. An operator-level statistics registry is
190-
/// needed for full coverage.
191-
///
192-
/// [`ProjectionExprs::with_expression_analyzer_registry`]: datafusion_physical_expr::projection::ProjectionExprs::with_expression_analyzer_registry
186+
/// The physical planner injects the registry from `SessionState` when
187+
/// creating filters. When `use_statistics_registry` is also enabled,
188+
/// [`FilterStatisticsProvider`](crate::operator_statistics::FilterStatisticsProvider)
189+
/// uses this registry for all filters it handles. Filters created by
190+
/// optimizer rules that do not call this method fall back to the
191+
/// default selectivity.
193192
pub fn with_expression_analyzer_registry(
194193
mut self,
195194
registry: Arc<
@@ -338,6 +337,13 @@ impl FilterExec {
338337
&self.projection
339338
}
340339

340+
/// Expression analyzer registry for selectivity estimation
341+
pub fn expression_analyzer_registry(
342+
&self,
343+
) -> Option<&datafusion_physical_expr::expression_analyzer::ExpressionAnalyzerRegistry> {
344+
self.expression_analyzer_registry.as_deref()
345+
}
346+
341347
/// Calculates `Statistics` for `FilterExec`, by applying selectivity (either default, or estimated) to input statistics.
342348
pub(crate) fn statistics_helper(
343349
schema: &SchemaRef,

datafusion/physical-plan/src/operator_statistics/mod.rs

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -549,7 +549,7 @@ impl StatisticsProvider for FilterStatisticsProvider {
549549
input_stats,
550550
filter.predicate(),
551551
filter.default_selectivity(),
552-
// TODO: pass filter.expression_analyzer_registry() once #21122 lands
552+
filter.expression_analyzer_registry(),
553553
)?;
554554

555555
// Adjust distinct_count for each column using the selectivity ratio
@@ -600,8 +600,6 @@ impl StatisticsProvider for ProjectionStatisticsProvider {
600600

601601
let input_stats = (*child_stats[0].base).clone();
602602
let output_schema = proj.schema();
603-
// TODO: pass proj.expression_analyzer_registry() once #21122 lands,
604-
// so expression-level NDV/min/max feeds into projected column stats.
605603
let stats = proj
606604
.projection_expr()
607605
.project_statistics(input_stats, &output_schema)?;

datafusion/physical-plan/src/projection.rs

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -189,6 +189,14 @@ impl ProjectionExec {
189189
self
190190
}
191191

192+
/// Get the expression analyzer registry, if set
193+
pub fn expression_analyzer_registry(
194+
&self,
195+
) -> Option<&datafusion_physical_expr::expression_analyzer::ExpressionAnalyzerRegistry>
196+
{
197+
self.projector.expression_analyzer_registry()
198+
}
199+
192200
/// This function creates the cache object that stores the plan properties such as schema, equivalence properties, ordering, partitioning, etc.
193201
fn compute_properties(
194202
input: &Arc<dyn ExecutionPlan>,

datafusion/sqllogictest/test_files/information_schema.slt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -300,7 +300,7 @@ datafusion.optimizer.default_filter_selectivity 20
300300
datafusion.optimizer.enable_aggregate_dynamic_filter_pushdown true
301301
datafusion.optimizer.enable_distinct_aggregation_soft_limit true
302302
datafusion.optimizer.enable_dynamic_filter_pushdown true
303-
datafusion.optimizer.enable_expression_analyzer false
303+
datafusion.optimizer.use_expression_analyzer false
304304
datafusion.optimizer.enable_join_dynamic_filter_pushdown true
305305
datafusion.optimizer.enable_leaf_expression_pushdown true
306306
datafusion.optimizer.enable_piecewise_merge_join false
@@ -446,7 +446,7 @@ datafusion.optimizer.default_filter_selectivity 20 The default filter selectivit
446446
datafusion.optimizer.enable_aggregate_dynamic_filter_pushdown true When set to true, the optimizer will attempt to push down Aggregate dynamic filters into the file scan phase.
447447
datafusion.optimizer.enable_distinct_aggregation_soft_limit true When set to true, the optimizer will push a limit operation into grouped aggregations which have no aggregate expressions, as a soft limit, emitting groups once the limit is reached, before all rows in the group are read.
448448
datafusion.optimizer.enable_dynamic_filter_pushdown true When set to true attempts to push down dynamic filters generated by operators (TopK, Join & Aggregate) into the file scan phase. For example, for a query such as `SELECT * FROM t ORDER BY timestamp DESC LIMIT 10`, the optimizer will attempt to push down the current top 10 timestamps that the TopK operator references into the file scans. This means that if we already have 10 timestamps in the year 2025 any files that only have timestamps in the year 2024 can be skipped / pruned at various stages in the scan. The config will suppress `enable_join_dynamic_filter_pushdown`, `enable_topk_dynamic_filter_pushdown` & `enable_aggregate_dynamic_filter_pushdown` So if you disable `enable_topk_dynamic_filter_pushdown`, then enable `enable_dynamic_filter_pushdown`, the `enable_topk_dynamic_filter_pushdown` will be overridden.
449-
datafusion.optimizer.enable_expression_analyzer false When set to true, the physical planner will use the ExpressionAnalyzer framework for expression-level statistics estimation (NDV, selectivity, min/max, null fraction). When false, existing behavior without expression-level statistics support is used.
449+
datafusion.optimizer.use_expression_analyzer false When set to true, the physical planner uses the ExpressionAnalyzer framework for expression-level statistics estimation (NDV, selectivity, min/max, null fraction). When `use_statistics_registry` is also enabled, the registry providers (filters, projections) also use it. When false, existing behavior is unchanged.
450450
datafusion.optimizer.enable_join_dynamic_filter_pushdown true When set to true, the optimizer will attempt to push down Join dynamic filters into the file scan phase.
451451
datafusion.optimizer.enable_leaf_expression_pushdown true When set to true, the optimizer will extract leaf expressions (such as `get_field`) from filter/sort/join nodes into projections closer to the leaf table scans, and push those projections down towards the leaf nodes.
452452
datafusion.optimizer.enable_piecewise_merge_join false When set to true, piecewise merge join is enabled. PiecewiseMergeJoin is currently experimental. Physical planner will opt for PiecewiseMergeJoin when there is only one range filter.

docs/source/user-guide/configs.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -145,7 +145,7 @@ The following configuration settings are available:
145145
| datafusion.optimizer.enable_join_dynamic_filter_pushdown | true | When set to true, the optimizer will attempt to push down Join dynamic filters into the file scan phase. |
146146
| datafusion.optimizer.enable_aggregate_dynamic_filter_pushdown | true | When set to true, the optimizer will attempt to push down Aggregate dynamic filters into the file scan phase. |
147147
| datafusion.optimizer.enable_dynamic_filter_pushdown | true | When set to true attempts to push down dynamic filters generated by operators (TopK, Join & Aggregate) into the file scan phase. For example, for a query such as `SELECT * FROM t ORDER BY timestamp DESC LIMIT 10`, the optimizer will attempt to push down the current top 10 timestamps that the TopK operator references into the file scans. This means that if we already have 10 timestamps in the year 2025 any files that only have timestamps in the year 2024 can be skipped / pruned at various stages in the scan. The config will suppress `enable_join_dynamic_filter_pushdown`, `enable_topk_dynamic_filter_pushdown` & `enable_aggregate_dynamic_filter_pushdown` So if you disable `enable_topk_dynamic_filter_pushdown`, then enable `enable_dynamic_filter_pushdown`, the `enable_topk_dynamic_filter_pushdown` will be overridden. |
148-
| datafusion.optimizer.enable_expression_analyzer | false | When set to true, the physical planner will use the ExpressionAnalyzer framework for expression-level statistics estimation (NDV, selectivity, min/max, null fraction). When false, existing behavior without expression-level statistics support is used. |
148+
| datafusion.optimizer.use_expression_analyzer | false | When set to true, the physical planner uses the ExpressionAnalyzer framework for expression-level statistics estimation (NDV, selectivity, min/max, null fraction). When `use_statistics_registry` is also enabled, the registry providers (filters, projections) also use it. When false, existing behavior is unchanged. |
149149
| datafusion.optimizer.filter_null_join_keys | false | When set to true, the optimizer will insert filters before a join between a nullable and non-nullable column to filter out nulls on the nullable side. This filter can add additional overhead when the file format does not fully support predicate push down. |
150150
| datafusion.optimizer.repartition_aggregations | true | Should DataFusion repartition data using the aggregate keys to execute aggregates in parallel using the provided `target_partitions` level |
151151
| datafusion.optimizer.repartition_file_min_size | 10485760 | Minimum total files size in bytes to perform file scan repartitioning. |

0 commit comments

Comments
 (0)