Commit 3a23bb2
authored
## Which issue does this PR close?
- Closes #20465.
- Closes #17446.
## Rationale for this change
This PR optimizes the performance of `array_agg()` by adding support for
the `GroupsAccumulator` API.
The design tries to minimize the amount of per-batch work done in
`update_batch()`: we store a reference to the batch, and a `(group_idx,
row_idx)` pair for each row. In `evaluate()`, we assemble all the
requested output with a single `interleave` call.
This turns out to be significantly faster, because we copy much less
data and assembling the results can be vectorized more effectively. For
example, on a benchmark with 5000 groups and 5000 int64 values per
group, this approach is roughly 190x faster than the previous approach.
Releasing memory after a partial emit is a little more involved than the
previous approach, but with some determination it is still possible.
## What changes are included in this PR?
* Implement the `GroupsAccumulator` API for `array_agg()`
* Add benchmark for `array_agg` of a named struct over a dict, following
the workload in #17446
* Add unit tests
* Improve SLT test coverage
* Remove a redundant SLT test
## Are these changes tested?
Yes, and benchmarked.
## Are there any user-facing changes?
No.
## AI usage
Iterated with the help of multiple AI tools; I've reviewed and
understand the resulting code.
1 parent 73fbd48 commit 3a23bb2
5 files changed
Lines changed: 773 additions & 17 deletions
File tree
- datafusion
- core/benches
- data_utils
- functions-aggregate-common/src/aggregate/groups_accumulator
- functions-aggregate/src
- sqllogictest/test_files
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
284 | 284 | | |
285 | 285 | | |
286 | 286 | | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
287 | 298 | | |
288 | 299 | | |
289 | 300 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
23 | | - | |
| 23 | + | |
24 | 24 | | |
| 25 | + | |
25 | 26 | | |
26 | 27 | | |
27 | 28 | | |
| |||
65 | 66 | | |
66 | 67 | | |
67 | 68 | | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
68 | 74 | | |
69 | 75 | | |
70 | 76 | | |
| |||
109 | 115 | | |
110 | 116 | | |
111 | 117 | | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
112 | 127 | | |
113 | 128 | | |
114 | 129 | | |
| |||
118 | 133 | | |
119 | 134 | | |
120 | 135 | | |
| 136 | + | |
121 | 137 | | |
122 | 138 | | |
123 | 139 | | |
| |||
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
44 | 44 | | |
45 | 45 | | |
46 | 46 | | |
47 | | - | |
| 47 | + | |
48 | 48 | | |
49 | 49 | | |
50 | 50 | | |
| |||
0 commit comments