You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: adaptive filter selectivity tracking for Parquet pushdown
Introduces a runtime adaptive filter selectivity tracking system for
Parquet pushdown. Each filter is monitored with Welford online stats
and moves through a state machine: New -> RowFilter|PostScan ->
(promoted / demoted / dropped).
Key changes:
- New selectivity.rs module (SelectivityTracker, TrackerConfig,
SelectivityStats, FilterState, PartitionedFilters, FilterId).
- New OptionalFilterPhysicalExpr wrapper in physical_expr_common.
HashJoinExec wraps dynamic join filters in it.
- Removes reorder_filters config + supporting code.
- Adds filter_pushdown_min_bytes_per_sec,
filter_collecting_byte_ratio_threshold, filter_confidence_z config.
- Predicate storage: Option<Arc<PhysicalExpr>> -> Option<Vec<(FilterId,
Arc<PhysicalExpr>)>> on ParquetSource/ParquetOpener.
- build_row_filter takes Vec<(FilterId,...)> + SelectivityTracker,
returns RowFilterWithMetrics. DatafusionArrowPredicate reports
per-batch stats back to the tracker.
- ParquetOpener calls tracker.partition_filters() and
apply_post_scan_filters_with_stats; records filter_apply_time.
- Proto reserves tag 6 (was reorder_filters); adds 3 new optional
fields.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
0 commit comments