Skip to content

Commit fdd36d0

Browse files
authored
Update comments on OptimizerRule about function name matching (#20346)
## Which issue does this PR close? - Related to #20180 ## Rationale for this change I gave feedback to @devanshu0987 https://github.com/apache/datafusion/pull/20180/changes#r2800720037 that it was not a good idea to check for function names in optimizer rules, but then I realized that the rationale for this is not written down anywhere. ## What changes are included in this PR? Document why checking for function names in optimizer rules is not good and offer alternatives ## Are these changes tested? By CI ## Are there any user-facing changes? Just docs, no functional changes
1 parent b16ad9b commit fdd36d0

1 file changed

Lines changed: 39 additions & 7 deletions

File tree

datafusion/optimizer/src/optimizer.rs

Lines changed: 39 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -58,12 +58,12 @@ use crate::simplify_expressions::SimplifyExpressions;
5858
use crate::single_distinct_to_groupby::SingleDistinctToGroupBy;
5959
use crate::utils::log_plan;
6060

61-
/// `OptimizerRule`s transforms one [`LogicalPlan`] into another which
62-
/// computes the same results, but in a potentially more efficient
63-
/// way. If there are no suitable transformations for the input plan,
64-
/// the optimizer should simply return it unmodified.
61+
/// Transforms one [`LogicalPlan`] into another which computes the same results,
62+
/// but in a potentially more efficient way.
6563
///
66-
/// To change the semantics of a `LogicalPlan`, see [`AnalyzerRule`]
64+
/// See notes on [`Self::rewrite`] for details on how to implement an `OptimizerRule`.
65+
///
66+
/// To change the semantics of a `LogicalPlan`, see [`AnalyzerRule`].
6767
///
6868
/// Use [`SessionState::add_optimizer_rule`] to register additional
6969
/// `OptimizerRule`s.
@@ -88,8 +88,40 @@ pub trait OptimizerRule: Debug {
8888
true
8989
}
9090

91-
/// Try to rewrite `plan` to an optimized form, returning `Transformed::yes`
92-
/// if the plan was rewritten and `Transformed::no` if it was not.
91+
/// Try to rewrite `plan` to an optimized form, returning [`Transformed::yes`]
92+
/// if the plan was rewritten and [`Transformed::no`] if it was not.
93+
///
94+
/// # Notes for implementations:
95+
///
96+
/// ## Return the same plan if no changes were made
97+
///
98+
/// If there are no suitable transformations for the input plan,
99+
/// the optimizer should simply return it unmodified.
100+
///
101+
/// The optimizer will call `rewrite` several times until a fixed point is
102+
/// reached, so it is important that `rewrite` return [`Transformed::no`] if
103+
/// the output is the same.
104+
///
105+
/// ## Matching on functions
106+
///
107+
/// The rule should avoid function-specific transformations, and instead use
108+
/// methods on [`ScalarUDFImpl`] and [`AggregateUDFImpl`]. Specifically, the
109+
/// rule should not check function names as functions can be overridden, and
110+
/// may not have the same semantics as the functions provided with
111+
/// DataFusion.
112+
///
113+
/// For example, if a rule rewrites a function based on the check
114+
/// `func.name() == "sum"`, it may rewrite the plan incorrectly if the
115+
/// registered `sum` function has different semantics (for example, the
116+
/// `sum` function from the `datafusion-spark` crate).
117+
///
118+
/// There are still several cases that rely on function name checking in
119+
/// the rules included with DataFusion. Please see [#18643] for more details
120+
/// and to help remove these cases.
121+
///
122+
/// [`ScalarUDFImpl`]: datafusion_expr::ScalarUDFImpl
123+
/// [`AggregateUDFImpl`]: datafusion_expr::ScalarUDFImpl
124+
/// [#18643]: https://github.com/apache/datafusion/issues/18643
93125
fn rewrite(
94126
&self,
95127
_plan: LogicalPlan,

0 commit comments

Comments
 (0)