perf: Optimize `array_min`, `array_max` for arrays of primitive types by neilconway · Pull Request #21101 · apache/datafusion

neilconway · 2026-03-22T16:10:49Z

Which issue does this PR close?

Closes Optimize array_min, array_max for primitive types #21100.

Rationale for this change

In the current implementation, we construct a PrimitiveArray for each row, feed it to the Arrow min / max kernel, and then collect the resulting ScalarValues in a Vec. We then construct a final PrimitiveArray for the result via ScalarValue::iter_to_array of the Vec.

We can do better for ListArrays of primitive types. First, we can iterate directly over the flat values buffer of the ListArray for the batch and compute the min/max from each row's slice directly. Second, Arrow's min / max kernels have a reasonable amount of per-call overhead; for small arrays, it is more efficient to compute the min/max ourselves via direct iteration.

Benchmarks (8192 rows, arrays of int64 values, M4 Max):

no_nulls / list_size=10: 309 µs → 26.6 µs (11.6x faster)
no_nulls / list_size=100: 392 µs → 150 µs (2.6x faster)
no_nulls / list_size=1000: 1.20 ms → 951 µs (1.26x faster)
nulls / list_size=10: 385 µs → 69.0 µs (5.6x faster)
nulls / list_size=100: 790 µs → 616 µs (1.28x faster)
nulls / list_size=1000: 5.34 ms → 5.21 ms (1.02x faster)

What changes are included in this PR?

Add benchmark for array_max
Expand SLT test coverage
Implement optimization

Are these changes tested?

Yes.

Are there any user-facing changes?

No.

neilconway · 2026-03-22T16:11:25Z

We could add a similar fastpath for arrays of strings, although maybe not worth it because array_min / max on arrays of strings is not particularly common?

neilconway · 2026-03-22T18:42:48Z

On an M4 Max, it looks like the crossover point between direct iteration and using the Arrow kernel is 32-40 list elements:

  ┌───────────┬──────────┬──────────┬─────────────────────┐
  │ List size │  Scalar  │  Kernel  │  Kernel vs Scalar   │
  ├───────────┼──────────┼──────────┼─────────────────────┤
  │ 8         │ 54.8 µs  │ 172.7 µs │ scalar 3.2x faster  │
  ├───────────┼──────────┼──────────┼─────────────────────┤
  │ 16        │ 105.3 µs │ 188.1 µs │ scalar 1.8x faster  │
  ├───────────┼──────────┼──────────┼─────────────────────┤
  │ 32        │ 232.5 µs │ 253.2 µs │ scalar 1.09x faster │
  ├───────────┼──────────┼──────────┼─────────────────────┤
  │ 48        │ 362.6 µs │ 329.6 µs │ kernel 1.10x faster │
  ├───────────┼──────────┼──────────┼─────────────────────┤
  │ 64        │ 492.8 µs │ 444.2 µs │ kernel 1.11x faster │
  ├───────────┼──────────┼──────────┼─────────────────────┤
  │ 96        │ 761.7 µs │ 589.0 µs │ kernel 1.29x faster │
  ├───────────┼──────────┼──────────┼─────────────────────┤
  │ 128       │ 1.032 ms │ 782.0 µs │ kernel 1.32x faster │
  ├───────────┼──────────┼──────────┼─────────────────────┤
  │ 256       │ 2.076 ms │ 1.428 ms │ kernel 1.45x faster │
  ├───────────┼──────────┼──────────┼─────────────────────┤
  │ 512       │ 4.138 ms │ 2.728 ms │ kernel 1.52x faster │
  └───────────┴──────────┴──────────┴─────────────────────┘

So I lowered the iteration -> kernel switchover threshold to 32.

coderfender · 2026-03-22T21:33:49Z

These are great numbers ! @neilconway . Could we perhaps also remove if conditions as well and see if those help out. Example :

Separate implementation for non null arrays ( to prevent if loop cycles inside the inner function)
Hot loopingARROW_COMPUTE_THRESHOLD if calls
3.min/max check (separate max vs min impl)

neilconway · 2026-03-23T00:51:27Z

@coderfender Thanks for the feedback!

I quickly checked 1 and 3 and they don't yield any improvement; I'd suspect the compiler will hoist loop-invariant branches like this out of the loop. The threshold check should be similar: it should be branch-predicted effectively.

Lmk if you disagree!

…min-max

coderfender · 2026-03-23T05:45:26Z

Sure @neilconway . Thank you for trying the approaches. When we tried to improve cast ops in comet (string to integer), taking out the if condition and implementing 3 separate functions for various eval modes ( ANSI , Try and Legacy modes) helped with the performance. Let me find that PR up and link it here for reference for reference. Let me pull the branch and see if there are other potential wins. Also, I was wondering the magic number 32 (between using arrow kernel vs for loop implementation) is dependent on the hardware ?

coderfender · 2026-03-23T05:50:22Z

+10
+NULL
+7
+100


nit : perhaps we could also validate the result's datatype ?

coderfender · 2026-03-23T05:51:36Z

+----
+NaN
+
+query R


nit : may be we could check + inf / - inf as well

coderfender · 2026-03-23T05:52:09Z

+
+# array_min with Int32 (exercises a different primitive type than Int64)
+query I
+select array_min(arrow_cast(make_array(10, -5, 3), 'List(Int32)'));


nice 👍🏽

alamb · 2026-03-24T20:42:23Z

Thanks @neilconway and @coderfender and @Dandandan

I thought @coderfender 's comments were good suggestions -- maybe we can make a follow on PR to add them

neilconway added 2 commits March 22, 2026 10:26

Add benchmark for array_max

f0ec26f

.

53ab1bd

github-actions bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Mar 22, 2026

neilconway added 2 commits March 22, 2026 12:32

Benchmark on 8192 row batches, for better fidelity

2f68986

Lower iteration -> kernel threshold from 64 -> 32

4153eb1

Merge remote-tracking branch 'origin/main' into neilc/optimize-array-…

640ce56

…min-max

coderfender reviewed Mar 23, 2026

View reviewed changes

coderfender approved these changes Mar 23, 2026

View reviewed changes

Dandandan approved these changes Mar 23, 2026

View reviewed changes

alamb added this pull request to the merge queue Mar 24, 2026

Merged via the queue into apache:main with commit 4d5aea4 Mar 24, 2026
33 checks passed

neilconway deleted the neilc/optimize-array-min-max branch March 25, 2026 13:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Optimize `array_min`, `array_max` for arrays of primitive types#21101

perf: Optimize `array_min`, `array_max` for arrays of primitive types#21101
alamb merged 5 commits intoapache:mainfrom
neilconway:neilc/optimize-array-min-max

neilconway commented Mar 22, 2026 •

edited

Loading

Uh oh!

neilconway commented Mar 22, 2026

Uh oh!

neilconway commented Mar 22, 2026

Uh oh!

coderfender commented Mar 22, 2026

Uh oh!

neilconway commented Mar 23, 2026

Uh oh!

coderfender commented Mar 23, 2026 •

edited

Loading

Uh oh!

coderfender Mar 23, 2026 •

edited

Loading

Uh oh!

coderfender Mar 23, 2026

Uh oh!

coderfender Mar 23, 2026

Uh oh!

Uh oh!

alamb commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

+
+              NULL
+
+

+              ----
+              NaN
+              query R

Conversation

neilconway commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

neilconway commented Mar 22, 2026

Uh oh!

neilconway commented Mar 22, 2026

Uh oh!

coderfender commented Mar 22, 2026

Uh oh!

neilconway commented Mar 23, 2026

Uh oh!

coderfender commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderfender Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderfender Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderfender Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

alamb commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

neilconway commented Mar 22, 2026 •

edited

Loading

coderfender commented Mar 23, 2026 •

edited

Loading

coderfender Mar 23, 2026 •

edited

Loading