Commit 3d1f008
committed
fix: address review feedback for approx_top_k
- Replace HashMap<Vec<u8>, usize> with hashbrown::HashTable<(u64, usize)>
to eliminate key cloning, double-hashing, and O(n) rebuild allocations
- Fix merge() to use Parallel Space-Saving m1/m2 algorithm (min counter
count correction) matching ClickHouse's SpaceSaving::merge()
- Fix serialize() to select top-N counters and fold evicted ones into
alpha_map when requested_capacity < counters.len() < target_capacity
- Harden from_bytes() with validation: requested_capacity > 0,
num_counters <= requested_capacity, alpha_map_size is power-of-two,
and overflow-safe bounds checking throughout
- Replace all raw += on count/error/alpha with saturating_add
- Track item_heap_bytes incrementally for O(1) size() accounting
- Pre-allocate serialize() output buffer
- Add TIMEZONE_WILDCARD timestamp variants to signature
- Remove unused k and data_type from state_fields/state/merge_batch
- Replace all manual DataFusionError construction with exec_err!/plan_err!
- Replace .unwrap() downcasts in update_batch with proper error returns
- Fix test_accumulator_large_utf8_input to use DataType::LargeUtf8
- Update #[user_doc] to document approximate counts, NULL handling,
supported types, and return shape
- Use with_timezone() instead of arrow::compute::cast for timestamps
- Adapt to upstream API changes (remove as_any from AggregateUDFImpl,
use direct downcast_ref for Literal)1 parent d8d171c commit 3d1f008
5 files changed
Lines changed: 327 additions & 271 deletions
File tree
- datafusion/functions-aggregate
- src
- docs/source/user-guide/sql
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
50 | 50 | | |
51 | 51 | | |
52 | 52 | | |
| 53 | + | |
53 | 54 | | |
54 | 55 | | |
55 | 56 | | |
| |||
0 commit comments