|
| 1 | +<!--- |
| 2 | + Licensed to the Apache Software Foundation (ASF) under one |
| 3 | + or more contributor license agreements. See the NOTICE file |
| 4 | + distributed with this work for additional information |
| 5 | + regarding copyright ownership. The ASF licenses this file |
| 6 | + to you under the Apache License, Version 2.0 (the |
| 7 | + "License"); you may not use this file except in compliance |
| 8 | + with the License. You may obtain a copy of the License at |
| 9 | +
|
| 10 | + http://www.apache.org/licenses/LICENSE-2.0 |
| 11 | +
|
| 12 | + Unless required by applicable law or agreed to in writing, |
| 13 | + software distributed under the License is distributed on an |
| 14 | + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| 15 | + KIND, either express or implied. See the License for the |
| 16 | + specific language governing permissions and limitations |
| 17 | + under the License. |
| 18 | +--> |
| 19 | + |
| 20 | +## Built-in Optimizer Rules |
| 21 | + |
| 22 | +DataFusion applies a default analyzer, logical optimizer, and physical |
| 23 | +optimizer pipeline. |
| 24 | + |
| 25 | +The rule names listed here match the names shown by `EXPLAIN VERBOSE`. |
| 26 | + |
| 27 | +Rule order matters. The default pipeline may change between releases. |
| 28 | + |
| 29 | +### Analyzer Rules |
| 30 | + |
| 31 | +| order | rule | summary | |
| 32 | +| ----- | --------------------------- | --------------------------------------------------------------------------------------- | |
| 33 | +| 1 | `resolve_grouping_function` | Rewrites `GROUPING(...)` calls into expressions over DataFusion's internal grouping id. | |
| 34 | +| 2 | `type_coercion` | Adds implicit casts so operators and functions receive valid input types. | |
| 35 | + |
| 36 | +### Logical Optimizer Rules |
| 37 | + |
| 38 | +| order | rule | summary | |
| 39 | +| ----- | ----------------------------------------- | --------------------------------------------------------------------------------------------------------------------------- | |
| 40 | +| 1 | `rewrite_set_comparison` | Rewrites `ANY` and `ALL` set-comparison subqueries into `EXISTS`-based boolean expressions with correct SQL NULL semantics. | |
| 41 | +| 2 | `optimize_unions` | Flattens nested unions and removes unions with a single input. | |
| 42 | +| 3 | `simplify_expressions` | Constant-folds and simplifies expressions while preserving output names. | |
| 43 | +| 4 | `replace_distinct_aggregate` | Rewrites `DISTINCT` and `DISTINCT ON` operators into aggregate-based plans that later rules can optimize further. | |
| 44 | +| 5 | `eliminate_join` | Replaces keyless inner joins with a literal `false` filter by an empty relation. | |
| 45 | +| 6 | `decorrelate_predicate_subquery` | Converts eligible `IN` and `EXISTS` predicate subqueries into semi or anti joins. | |
| 46 | +| 7 | `scalar_subquery_to_join` | Rewrites eligible scalar subqueries into joins and adds schema-preserving projections. | |
| 47 | +| 8 | `decorrelate_lateral_join` | Rewrites eligible lateral joins into regular joins. | |
| 48 | +| 9 | `extract_equijoin_predicate` | Splits join filters into equijoin keys and residual predicates. | |
| 49 | +| 10 | `eliminate_duplicated_expr` | Removes duplicate expressions from projections, aggregates, and similar operators. | |
| 50 | +| 11 | `eliminate_filter` | Drops always-true filters and replaces always-false or NULL filters with empty relations. | |
| 51 | +| 12 | `eliminate_cross_join` | Uses filter predicates to replace cross joins with inner joins when join keys can be found. | |
| 52 | +| 13 | `eliminate_limit` | Removes no-op limits and simplifies trivial limit shapes. | |
| 53 | +| 14 | `propagate_empty_relation` | Pushes empty-relation knowledge upward so operators fed by no rows collapse early. | |
| 54 | +| 15 | `filter_null_join_keys` | Adds `IS NOT NULL` filters to nullable equijoin keys that can never match. | |
| 55 | +| 16 | `eliminate_outer_join` | Rewrites outer joins to inner joins when later filters reject the NULL-extended rows. | |
| 56 | +| 17 | `push_down_limit` | Moves literal limits closer to scans and unions and merges adjacent limits. | |
| 57 | +| 18 | `push_down_filter` | Moves filters as early as possible through filter-commutative operators. | |
| 58 | +| 19 | `single_distinct_aggregation_to_group_by` | Rewrites single-column `DISTINCT` aggregations into two-stage `GROUP BY` plans. | |
| 59 | +| 20 | `eliminate_group_by_constant` | Removes constant or functionally redundant expressions from `GROUP BY`. | |
| 60 | +| 21 | `common_sub_expression_eliminate` | Computes repeated subexpressions once and reuses the result. | |
| 61 | +| 22 | `extract_leaf_expressions` | Pulls cheap leaf expressions closer to data sources so later pruning and filter rules can act earlier. | |
| 62 | +| 23 | `push_down_leaf_projections` | Pushes the helper projections created by leaf extraction toward leaf inputs. | |
| 63 | +| 24 | `optimize_projections` | Prunes unused columns and removes unnecessary logical projections. | |
| 64 | + |
| 65 | +### Physical Optimizer Rules |
| 66 | + |
| 67 | +The same rule name may appear more than once when the default pipeline runs it |
| 68 | +in multiple phases. |
| 69 | + |
| 70 | +| order | rule | phase | summary | |
| 71 | +| ----- | ------------------------------ | ----------------------- | ------------------------------------------------------------------------------------------------------------ | |
| 72 | +| 1 | `OutputRequirements` | add phase | Adds helper nodes so output requirements survive later physical rewrites. | |
| 73 | +| 2 | `aggregate_statistics` | - | Uses exact source statistics to answer some aggregates without scanning data. | |
| 74 | +| 3 | `join_selection` | - | Chooses join implementation, build side, and partition mode from statistics and stream properties. | |
| 75 | +| 4 | `LimitedDistinctAggregation` | - | Pushes limit hints into grouped distinct-style aggregations when only a small result is needed. | |
| 76 | +| 5 | `FilterPushdown` | pre-optimization phase | Pushes supported physical filters down toward data sources before distribution and sorting are enforced. | |
| 77 | +| 6 | `EnforceDistribution` | - | Adds repartitioning only where needed to satisfy physical distribution requirements. | |
| 78 | +| 7 | `CombinePartialFinalAggregate` | - | Collapses adjacent partial and final aggregates when the distributed shape makes them redundant. | |
| 79 | +| 8 | `EnforceSorting` | - | Adds or removes local sorts to satisfy required input orderings. | |
| 80 | +| 9 | `OptimizeAggregateOrder` | - | Updates aggregate expressions to use the best ordering once sort requirements are known. | |
| 81 | +| 10 | `WindowTopN` | - | Replaces eligible row-number window and filter patterns with per-partition TopK execution. | |
| 82 | +| 11 | `ProjectionPushdown` | early pass | Pushes projections toward inputs before later physical rewrites add more limit and TopK structure. | |
| 83 | +| 12 | `OutputRequirements` | remove phase | Removes the temporary output-requirement helper nodes after requirement-sensitive planning is done. | |
| 84 | +| 13 | `LimitAggregation` | - | Passes a limit hint into eligible aggregations so they can keep fewer accumulator buckets. | |
| 85 | +| 14 | `LimitPushPastWindows` | - | Pushes fetch limits through bounded window operators when doing so keeps the result correct. | |
| 86 | +| 15 | `HashJoinBuffering` | - | Adds buffering on the probe side of hash joins so probing can start before build completion. | |
| 87 | +| 16 | `LimitPushdown` | - | Moves physical limits into child operators or fetch-enabled variants to cut data early. | |
| 88 | +| 17 | `TopKRepartition` | - | Pushes TopK below hash repartition when the partition key is a prefix of the sort key. | |
| 89 | +| 18 | `ProjectionPushdown` | late pass | Runs projection pushdown again after limit and TopK rewrites expose new pruning opportunities. | |
| 90 | +| 19 | `PushdownSort` | - | Pushes sort requirements into data sources that can already return sorted output. | |
| 91 | +| 20 | `EnsureCooperative` | - | Wraps non-cooperative plan parts so long-running tasks yield fairly. | |
| 92 | +| 21 | `FilterPushdown(Post)` | post-optimization phase | Pushes dynamic filters at the end of optimization, after plan references stop moving. | |
| 93 | +| 22 | `SanityCheckPlan` | - | Validates that the final physical plan meets ordering, distribution, and infinite-input safety requirements. | |
0 commit comments