Skip to content

Commit 3d1f008

Browse files
committed
fix: address review feedback for approx_top_k
- Replace HashMap<Vec<u8>, usize> with hashbrown::HashTable<(u64, usize)> to eliminate key cloning, double-hashing, and O(n) rebuild allocations - Fix merge() to use Parallel Space-Saving m1/m2 algorithm (min counter count correction) matching ClickHouse's SpaceSaving::merge() - Fix serialize() to select top-N counters and fold evicted ones into alpha_map when requested_capacity < counters.len() < target_capacity - Harden from_bytes() with validation: requested_capacity > 0, num_counters <= requested_capacity, alpha_map_size is power-of-two, and overflow-safe bounds checking throughout - Replace all raw += on count/error/alpha with saturating_add - Track item_heap_bytes incrementally for O(1) size() accounting - Pre-allocate serialize() output buffer - Add TIMEZONE_WILDCARD timestamp variants to signature - Remove unused k and data_type from state_fields/state/merge_batch - Replace all manual DataFusionError construction with exec_err!/plan_err! - Replace .unwrap() downcasts in update_batch with proper error returns - Fix test_accumulator_large_utf8_input to use DataType::LargeUtf8 - Update #[user_doc] to document approximate counts, NULL handling, supported types, and return shape - Use with_timezone() instead of arrow::compute::cast for timestamps - Adapt to upstream API changes (remove as_any from AggregateUDFImpl, use direct downcast_ref for Literal)
1 parent d8d171c commit 3d1f008

5 files changed

Lines changed: 327 additions & 271 deletions

File tree

Cargo.lock

Lines changed: 1 addition & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

datafusion/functions-aggregate/Cargo.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,7 @@ datafusion-functions-aggregate-common = { workspace = true }
5050
datafusion-macros = { workspace = true }
5151
datafusion-physical-expr = { workspace = true }
5252
datafusion-physical-expr-common = { workspace = true }
53+
hashbrown = { workspace = true }
5354
foldhash = "0.2"
5455
half = { workspace = true }
5556
log = { workspace = true }

0 commit comments

Comments
 (0)