Skip to content

Commit c046597

Browse files
committed
change enable_unions_to_filter to false
1 parent 03145bd commit c046597

5 files changed

Lines changed: 39 additions & 5 deletions

File tree

datafusion/common/src/config.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1354,7 +1354,7 @@ config_namespace! {
13541354
/// with a combined filter. This optimization is conservative and only applies when the
13551355
/// branches share the same source and compatible wrapper nodes such as identical
13561356
/// projections or aliases.
1357-
pub enable_unions_to_filter: bool, default = true
1357+
pub enable_unions_to_filter: bool, default = false
13581358
}
13591359
}
13601360

datafusion/sqllogictest/test_files/information_schema.slt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -308,7 +308,7 @@ datafusion.optimizer.enable_sort_pushdown true
308308
datafusion.optimizer.enable_topk_aggregation true
309309
datafusion.optimizer.enable_topk_dynamic_filter_pushdown true
310310
datafusion.optimizer.enable_topk_repartition true
311-
datafusion.optimizer.enable_unions_to_filter true
311+
datafusion.optimizer.enable_unions_to_filter false
312312
datafusion.optimizer.enable_window_limits true
313313
datafusion.optimizer.enable_window_topn false
314314
datafusion.optimizer.expand_views_at_output false
@@ -456,7 +456,7 @@ datafusion.optimizer.enable_sort_pushdown true Enable sort pushdown optimization
456456
datafusion.optimizer.enable_topk_aggregation true When set to true, the optimizer will attempt to perform limit operations during aggregations, if possible
457457
datafusion.optimizer.enable_topk_dynamic_filter_pushdown true When set to true, the optimizer will attempt to push down TopK dynamic filters into the file scan phase.
458458
datafusion.optimizer.enable_topk_repartition true When set to true, the optimizer will push TopK (Sort with fetch) below hash repartition when the partition key is a prefix of the sort key, reducing data volume before the shuffle.
459-
datafusion.optimizer.enable_unions_to_filter true When set to true, the logical optimizer will rewrite `UNION DISTINCT` branches that read from the same source and differ only by filter predicates into a single branch with a combined filter. This optimization is conservative and only applies when the branches share the same source and compatible wrapper nodes such as identical projections or aliases.
459+
datafusion.optimizer.enable_unions_to_filter false When set to true, the logical optimizer will rewrite `UNION DISTINCT` branches that read from the same source and differ only by filter predicates into a single branch with a combined filter. This optimization is conservative and only applies when the branches share the same source and compatible wrapper nodes such as identical projections or aliases.
460460
datafusion.optimizer.enable_window_limits true When set to true, the optimizer will attempt to push limit operations past window functions, if possible
461461
datafusion.optimizer.enable_window_topn false When set to true, the optimizer will replace Filter(rn<=K) → Window(ROW_NUMBER) → Sort patterns with a PartitionedTopKExec that maintains per-partition heaps, avoiding a full sort of the input. When the window partition key has low cardinality, enabling this optimization can improve performance. However, for high cardinality keys, it may cause regressions in both memory usage and runtime.
462462
datafusion.optimizer.expand_views_at_output false When set to true, if the returned type is a view type then the output will be coerced to a non-view. Coerces `Utf8View` to `LargeUtf8`, and `BinaryView` to `LargeBinary`.

datafusion/sqllogictest/test_files/union.slt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -297,7 +297,7 @@ physical_plan
297297
04)--ProjectionExec: expr=[name@0 || _new as name]
298298
05)----DataSourceExec: partitions=1, partition_sizes=[1]
299299

300-
# unions_to_filter is enabled by default
300+
# unions_to_filter is disabled by default
301301

302302
statement ok
303303
set datafusion.optimizer.enable_unions_to_filter = false;

docs/source/library-user-guide/upgrading/54.0.0.md

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -372,3 +372,37 @@ impl Default for MyTreeNode {
372372
}
373373
}
374374
```
375+
376+
[#21075]: https://github.com/apache/datafusion/pull/21075
377+
378+
### `UnionsToFilter` optimizer rule is now disabled by default
379+
380+
The `datafusion.optimizer.enable_unions_to_filter` option now defaults to
381+
`false`. When enabled, the rule rewrites `UNION DISTINCT` branches that read the
382+
same source and differ only by filter predicates into a single scan with a
383+
combined `OR` predicate:
384+
385+
```sql
386+
-- Before: two separate scans
387+
SELECT * FROM t WHERE a = 1
388+
UNION
389+
SELECT * FROM t WHERE a = 2
390+
391+
-- After: one scan
392+
SELECT DISTINCT * FROM t WHERE a = 1 OR a = 2
393+
```
394+
395+
**Who is affected:**
396+
397+
- Queries using `UNION` against the same table with different filter
398+
conditions may benefit from enabling this rule.
399+
400+
**Migration guide:**
401+
402+
Enable the rule when your `UNION` queries scan the same large table
403+
multiple times with different predicates. Avoid it when the data source handles individual equality predicates more efficiently than
404+
a combined `OR` (e.g., index-backed sources).
405+
406+
```sql
407+
SET datafusion.optimizer.enable_unions_to_filter = true;
408+
```

docs/source/user-guide/configs.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -173,7 +173,7 @@ The following configuration settings are available:
173173
| datafusion.optimizer.expand_views_at_output | false | When set to true, if the returned type is a view type then the output will be coerced to a non-view. Coerces `Utf8View` to `LargeUtf8`, and `BinaryView` to `LargeBinary`. |
174174
| datafusion.optimizer.enable_sort_pushdown | true | Enable sort pushdown optimization. When enabled, attempts to push sort requirements down to data sources that can natively handle them (e.g., by reversing file/row group read order). Returns **inexact ordering**: Sort operator is kept for correctness, but optimized input enables early termination for TopK queries (ORDER BY ... LIMIT N), providing significant speedup. Memory: No additional overhead (only changes read order). Future: Will add option to detect perfectly sorted data and eliminate Sort completely. Default: true |
175175
| datafusion.optimizer.enable_leaf_expression_pushdown | true | When set to true, the optimizer will extract leaf expressions (such as `get_field`) from filter/sort/join nodes into projections closer to the leaf table scans, and push those projections down towards the leaf nodes. |
176-
| datafusion.optimizer.enable_unions_to_filter | true | When set to true, the logical optimizer will rewrite `UNION DISTINCT` branches that read from the same source and differ only by filter predicates into a single branch with a combined filter. This optimization is conservative and only applies when the branches share the same source and compatible wrapper nodes such as identical projections or aliases. |
176+
| datafusion.optimizer.enable_unions_to_filter | false | When set to true, the logical optimizer will rewrite `UNION DISTINCT` branches that read from the same source and differ only by filter predicates into a single branch with a combined filter. This optimization is conservative and only applies when the branches share the same source and compatible wrapper nodes such as identical projections or aliases. |
177177
| datafusion.explain.logical_plan_only | false | When set to true, the explain statement will only print logical plans |
178178
| datafusion.explain.physical_plan_only | false | When set to true, the explain statement will only print physical plans |
179179
| datafusion.explain.show_statistics | false | When set to true, the explain statement will print operator statistics for physical plans |

0 commit comments

Comments
 (0)