Commit d8d171c
committed
feat: add approx_top_k aggregate function
Add a new approx_top_k(expression, k) aggregate function that returns
the approximate top-k most frequent values with their estimated counts,
using the Filtered Space-Saving algorithm.
The implementation uses a capacity multiplier of 3 (matching ClickHouse's
default) and includes an alpha map for improved accuracy by filtering
low-frequency noise before it enters the main summary.
Return type is List(Struct({value: T, count: UInt64})) ordered by count
descending, where T matches the input column type.
Closes #209671 parent ff844be commit d8d171c
7 files changed
Lines changed: 1566 additions & 4 deletions
File tree
- datafusion
- core/tests/dataframe
- functions-aggregate/src
- proto/tests/cases
- sqllogictest/test_files
- docs/source/user-guide
- sql
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
32 | 32 | | |
33 | 33 | | |
34 | 34 | | |
35 | | - | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
36 | 38 | | |
37 | 39 | | |
38 | 40 | | |
| |||
409 | 411 | | |
410 | 412 | | |
411 | 413 | | |
| 414 | + | |
| 415 | + | |
| 416 | + | |
| 417 | + | |
| 418 | + | |
| 419 | + | |
| 420 | + | |
| 421 | + | |
| 422 | + | |
| 423 | + | |
| 424 | + | |
| 425 | + | |
| 426 | + | |
| 427 | + | |
| 428 | + | |
| 429 | + | |
| 430 | + | |
| 431 | + | |
| 432 | + | |
| 433 | + | |
| 434 | + | |
| 435 | + | |
412 | 436 | | |
413 | 437 | | |
414 | 438 | | |
| |||
0 commit comments