You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
refactor(pruning): remove column param from PruningStatistics::row_counts (#21369)
## Which issue does this PR close?
N/A — standalone API improvement, prerequisite for #21157.
## Rationale for this change
`PruningStatistics::row_counts(&self, column: &Column)` takes a column
parameter, but row counts are container-level (same for all columns). 8
of 11 implementations ignore the parameter with `_column`. The Parquet
impl (`RowGroupPruningStatistics`) unnecessarily constructs a
`StatisticsConverter` from the column just to call
`row_group_row_counts()`, which doesn't use the column at all.
The existing code even has a comment acknowledging this:
> "row counts are the same for all columns in a row group"
And a test comment:
> "This is debatable, personally I think `row_count` should not take a
`Column` as an argument at all since all columns should have the same
number of rows."
## What changes are included in this PR?
**Breaking change**: `fn row_counts(&self, column: &Column) ->
Option<ArrayRef>` becomes `fn row_counts(&self) -> Option<ArrayRef>`.
- Remove `column` parameter from trait definition and all 11
implementations
- `RowGroupPruningStatistics`: read `num_rows()` directly from row group
metadata instead of routing through `StatisticsConverter`
- `PrunableStatistics`: remove column-exists validation (row count is
container-level)
- Update all call sites and tests
## Are these changes tested?
Yes — all existing tests updated and passing. The behavior change is:
- `row_counts()` on `PrunableStatistics` now returns data even for
non-existent columns (correct, since row count is container-level)
- `RowGroupPruningStatistics::row_counts()` always returns row counts
(previously could fail if column wasn't in Parquet schema)
## Are there any user-facing changes?
Yes — breaking change to `PruningStatistics` trait. Downstream
implementations need to remove the `column` parameter from their
`row_counts` method.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
0 commit comments