Skip to content

perf: Optimize array_min, array_max for arrays of primitive types#21101

Merged
alamb merged 5 commits intoapache:mainfrom
neilconway:neilc/optimize-array-min-max
Mar 24, 2026
Merged

perf: Optimize array_min, array_max for arrays of primitive types#21101
alamb merged 5 commits intoapache:mainfrom
neilconway:neilc/optimize-array-min-max

Conversation

@neilconway
Copy link
Copy Markdown
Contributor

@neilconway neilconway commented Mar 22, 2026

Which issue does this PR close?

Rationale for this change

In the current implementation, we construct a PrimitiveArray for each row, feed it to the Arrow min / max kernel, and then collect the resulting ScalarValues in a Vec. We then construct a final PrimitiveArray for the result via ScalarValue::iter_to_array of the Vec.

We can do better for ListArrays of primitive types. First, we can iterate directly over the flat values buffer of the ListArray for the batch and compute the min/max from each row's slice directly. Second, Arrow's min / max kernels have a reasonable amount of per-call overhead; for small arrays, it is more efficient to compute the min/max ourselves via direct iteration.

Benchmarks (8192 rows, arrays of int64 values, M4 Max):

  • no_nulls / list_size=10: 309 µs → 26.6 µs (11.6x faster)
  • no_nulls / list_size=100: 392 µs → 150 µs (2.6x faster)
  • no_nulls / list_size=1000: 1.20 ms → 951 µs (1.26x faster)
  • nulls / list_size=10: 385 µs → 69.0 µs (5.6x faster)
  • nulls / list_size=100: 790 µs → 616 µs (1.28x faster)
  • nulls / list_size=1000: 5.34 ms → 5.21 ms (1.02x faster)

What changes are included in this PR?

  • Add benchmark for array_max
  • Expand SLT test coverage
  • Implement optimization

Are these changes tested?

Yes.

Are there any user-facing changes?

No.

@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Mar 22, 2026
@neilconway
Copy link
Copy Markdown
Contributor Author

We could add a similar fastpath for arrays of strings, although maybe not worth it because array_min / max on arrays of strings is not particularly common?

@neilconway
Copy link
Copy Markdown
Contributor Author

On an M4 Max, it looks like the crossover point between direct iteration and using the Arrow kernel is 32-40 list elements:

  ┌───────────┬──────────┬──────────┬─────────────────────┐
  │ List size │  Scalar  │  Kernel  │  Kernel vs Scalar   │
  ├───────────┼──────────┼──────────┼─────────────────────┤
  │ 8         │ 54.8 µs  │ 172.7 µs │ scalar 3.2x faster  │
  ├───────────┼──────────┼──────────┼─────────────────────┤
  │ 16        │ 105.3 µs │ 188.1 µs │ scalar 1.8x faster  │
  ├───────────┼──────────┼──────────┼─────────────────────┤
  │ 32        │ 232.5 µs │ 253.2 µs │ scalar 1.09x faster │
  ├───────────┼──────────┼──────────┼─────────────────────┤
  │ 48        │ 362.6 µs │ 329.6 µs │ kernel 1.10x faster │
  ├───────────┼──────────┼──────────┼─────────────────────┤
  │ 64        │ 492.8 µs │ 444.2 µs │ kernel 1.11x faster │
  ├───────────┼──────────┼──────────┼─────────────────────┤
  │ 96        │ 761.7 µs │ 589.0 µs │ kernel 1.29x faster │
  ├───────────┼──────────┼──────────┼─────────────────────┤
  │ 128       │ 1.032 ms │ 782.0 µs │ kernel 1.32x faster │
  ├───────────┼──────────┼──────────┼─────────────────────┤
  │ 256       │ 2.076 ms │ 1.428 ms │ kernel 1.45x faster │
  ├───────────┼──────────┼──────────┼─────────────────────┤
  │ 512       │ 4.138 ms │ 2.728 ms │ kernel 1.52x faster │
  └───────────┴──────────┴──────────┴─────────────────────┘

So I lowered the iteration -> kernel switchover threshold to 32.

@coderfender
Copy link
Copy Markdown
Contributor

These are great numbers ! @neilconway . Could we perhaps also remove if conditions as well and see if those help out. Example :

  1. Separate implementation for non null arrays ( to prevent if loop cycles inside the inner function)
  2. Hot loopingARROW_COMPUTE_THRESHOLD if calls
    3.min/max check (separate max vs min impl)

@neilconway
Copy link
Copy Markdown
Contributor Author

@coderfender Thanks for the feedback!

I quickly checked 1 and 3 and they don't yield any improvement; I'd suspect the compiler will hoist loop-invariant branches like this out of the loop. The threshold check should be similar: it should be branch-predicted effectively.

Lmk if you disagree!

@coderfender
Copy link
Copy Markdown
Contributor

coderfender commented Mar 23, 2026

Sure @neilconway . Thank you for trying the approaches. When we tried to improve cast ops in comet (string to integer), taking out the if condition and implementing 3 separate functions for various eval modes ( ANSI , Try and Legacy modes) helped with the performance. Let me find that PR up and link it here for reference for reference. Let me pull the branch and see if there are other potential wins. Also, I was wondering the magic number 32 (between using arrow kernel vs for loop implementation) is dependent on the hardware ?

10
NULL
7
100
Copy link
Copy Markdown
Contributor

@coderfender coderfender Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit : perhaps we could also validate the result's datatype ?

----
NaN

query R
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit : may be we could check + inf / - inf as well


# array_min with Int32 (exercises a different primitive type than Int64)
query I
select array_min(arrow_cast(make_array(10, -5, 3), 'List(Int32)'));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice 👍🏽

@alamb alamb added this pull request to the merge queue Mar 24, 2026
Merged via the queue into apache:main with commit 4d5aea4 Mar 24, 2026
33 checks passed
@alamb
Copy link
Copy Markdown
Contributor

alamb commented Mar 24, 2026

Thanks @neilconway and @coderfender and @Dandandan

I thought @coderfender 's comments were good suggestions -- maybe we can make a follow on PR to add them

@neilconway neilconway deleted the neilc/optimize-array-min-max branch March 25, 2026 13:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize array_min, array_max for primitive types

4 participants