Commit 1624d63
## Which issue does this PR close?
- Closes #17789.
## Rationale for this change
`string_agg` previously didn't support the `GroupsAccumulator` API;
adding support for it can significantly improve performance,
particularly when there are many groups.
Benchmarks (M4 Max):
- string_agg_query_group_by_few_groups (~10): 645 µs → 564 µs, -11%
- string_agg_query_group_by_mid_groups (~1,000): 2,692 µs → 871 µs, -68%
- string_agg_query_group_by_many_groups (~65,000): 16,606 µs → 1,147 µs,
-93%
## What changes are included in this PR?
* Add end-to-end benchmark for `string_agg`
* Implement `GroupsAccumulator` API for `string_agg`
* Add unit tests
* Minor code cleanup for existing `string_agg` code paths
## Are these changes tested?
Yes.
## Are there any user-facing changes?
No, other than a change to an error message string.
---------
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
1 parent fb12029 commit 1624d63
3 files changed
Lines changed: 384 additions & 56 deletions
File tree
- datafusion
- core/benches
- functions-aggregate/src
- sqllogictest/test_files
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
295 | 295 | | |
296 | 296 | | |
297 | 297 | | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
298 | 331 | | |
299 | 332 | | |
300 | 333 | | |
| |||
0 commit comments