Skip to content

Commit a02e683

Browse files
authored
Reduce number of traversals per node in PhysicalExprSimplifier (#20082)
## Which issue does this PR close? - Closes #20081. ## Rationale for this change Ran into this while investigating #20078, in my benchmarks this saves around %10, which can quickly add up to a few ms per query. The benchmarking code is [here](https://gist.github.com/AdamGS/4fc6612814786a95a673378de7d61202), measuring it on my laptop the effect is: ``` tpc-ds/q76/cs/16 time: [927.65 µs 928.86 µs 930.14 µs] change: [−12.309% −11.983% −11.667%] (p = 0.00 < 0.05) Performance has improved. tpc-ds/q76/ws/16 time: [929.00 µs 930.51 µs 932.13 µs] change: [−12.102% −11.808% −11.475%] (p = 0.00 < 0.05) tpc-ds/q76/cs/128 time: [6.8376 ms 6.8460 ms 6.8552 ms] change: [−12.626% −12.411% −12.207%] (p = 0.00 < 0.05) tpc-ds/q76/ws/128 time: [6.8394 ms 6.8536 ms 6.8710 ms] change: [−12.039% −11.813% −11.575%] (p = 0.00 < 0.05) ``` ## What changes are included in this PR? I've also changed `[cfg(test)]` to a debug assertion, which seems more in the spirit of #18001, and will make the check fail even if a test in another crate triggers, which effectively increases the coverage. ## Are these changes tested? The functionality is identical, and in addition to existing tests I've changed the defensive assertion into a debug check and ran tests for other crates that make use of this codepath (like `datafusion-datasource-parquet` and `datafusion-pruning`). ## Are there any user-facing changes? None
1 parent 4a63659 commit a02e683

2 files changed

Lines changed: 19 additions & 4 deletions

File tree

datafusion/physical-expr/src/simplifier/const_evaluator.rs

Lines changed: 17 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,6 @@ use arrow::record_batch::RecordBatch;
2525
use datafusion_common::tree_node::{Transformed, TreeNode, TreeNodeRecursion};
2626
use datafusion_common::{Result, ScalarValue};
2727
use datafusion_expr_common::columnar_value::ColumnarValue;
28-
use datafusion_physical_expr_common::physical_expr::is_volatile;
2928

3029
use crate::PhysicalExpr;
3130
use crate::expressions::{Column, Literal};
@@ -43,7 +42,7 @@ use crate::expressions::{Column, Literal};
4342
pub fn simplify_const_expr(
4443
expr: &Arc<dyn PhysicalExpr>,
4544
) -> Result<Transformed<Arc<dyn PhysicalExpr>>> {
46-
if is_volatile(expr) || has_column_references(expr) {
45+
if !can_evaluate_as_constant(expr) {
4746
return Ok(Transformed::no(Arc::clone(expr)));
4847
}
4948

@@ -73,6 +72,22 @@ pub fn simplify_const_expr(
7372
}
7473
}
7574

75+
fn can_evaluate_as_constant(expr: &Arc<dyn PhysicalExpr>) -> bool {
76+
let mut can_evaluate = true;
77+
78+
expr.apply(|e| {
79+
if e.as_any().is::<Column>() || e.is_volatile_node() {
80+
can_evaluate = false;
81+
Ok(TreeNodeRecursion::Stop)
82+
} else {
83+
Ok(TreeNodeRecursion::Continue)
84+
}
85+
})
86+
.expect("apply should not fail");
87+
88+
can_evaluate
89+
}
90+
7691
/// Create a 1-row dummy RecordBatch for evaluating constant expressions.
7792
///
7893
/// The batch is never actually accessed for data - it's just needed because

datafusion/physical-expr/src/simplifier/mod.rs

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ impl<'a> PhysicalExprSimplifier<'a> {
5353
while count < MAX_LOOP_COUNT {
5454
count += 1;
5555
let result = current_expr.transform(|node| {
56-
#[cfg(test)]
56+
#[cfg(debug_assertions)]
5757
let original_type = node.data_type(schema).unwrap();
5858

5959
// Apply NOT expression simplification first, then unwrap cast optimization,
@@ -64,7 +64,7 @@ impl<'a> PhysicalExprSimplifier<'a> {
6464
})?
6565
.transform_data(|node| const_evaluator::simplify_const_expr(&node))?;
6666

67-
#[cfg(test)]
67+
#[cfg(debug_assertions)]
6868
assert_eq!(
6969
rewritten.data.data_type(schema).unwrap(),
7070
original_type,

0 commit comments

Comments
 (0)