Skip to content

Commit 1624d63

Browse files
neilconwayalamb
andauthored
perf: Add support for GroupsAccumulator to string_agg (#21154)
## Which issue does this PR close? - Closes #17789. ## Rationale for this change `string_agg` previously didn't support the `GroupsAccumulator` API; adding support for it can significantly improve performance, particularly when there are many groups. Benchmarks (M4 Max): - string_agg_query_group_by_few_groups (~10): 645 µs → 564 µs, -11% - string_agg_query_group_by_mid_groups (~1,000): 2,692 µs → 871 µs, -68% - string_agg_query_group_by_many_groups (~65,000): 16,606 µs → 1,147 µs, -93% ## What changes are included in this PR? * Add end-to-end benchmark for `string_agg` * Implement `GroupsAccumulator` API for `string_agg` * Add unit tests * Minor code cleanup for existing `string_agg` code paths ## Are these changes tested? Yes. ## Are there any user-facing changes? No, other than a change to an error message string. --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
1 parent fb12029 commit 1624d63

3 files changed

Lines changed: 384 additions & 56 deletions

File tree

datafusion/core/benches/aggregate_query_sql.rs

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -295,6 +295,39 @@ fn criterion_benchmark(c: &mut Criterion) {
295295
)
296296
})
297297
});
298+
299+
c.bench_function("string_agg_query_group_by_few_groups", |b| {
300+
b.iter(|| {
301+
query(
302+
ctx.clone(),
303+
&rt,
304+
"SELECT u64_narrow, string_agg(utf8, ',') \
305+
FROM t GROUP BY u64_narrow",
306+
)
307+
})
308+
});
309+
310+
c.bench_function("string_agg_query_group_by_mid_groups", |b| {
311+
b.iter(|| {
312+
query(
313+
ctx.clone(),
314+
&rt,
315+
"SELECT u64_mid, string_agg(utf8, ',') \
316+
FROM t GROUP BY u64_mid",
317+
)
318+
})
319+
});
320+
321+
c.bench_function("string_agg_query_group_by_many_groups", |b| {
322+
b.iter(|| {
323+
query(
324+
ctx.clone(),
325+
&rt,
326+
"SELECT u64_wide, string_agg(utf8, ',') \
327+
FROM t GROUP BY u64_wide",
328+
)
329+
})
330+
});
298331
}
299332

300333
criterion_group!(benches, criterion_benchmark);

0 commit comments

Comments
 (0)