perf: Optimize array_min, array_max for arrays of primitive types#21101
perf: Optimize array_min, array_max for arrays of primitive types#21101alamb merged 5 commits intoapache:mainfrom
array_min, array_max for arrays of primitive types#21101Conversation
|
We could add a similar fastpath for arrays of strings, although maybe not worth it because |
|
On an M4 Max, it looks like the crossover point between direct iteration and using the Arrow kernel is 32-40 list elements: So I lowered the iteration -> kernel switchover threshold to 32. |
|
These are great numbers ! @neilconway . Could we perhaps also remove if conditions as well and see if those help out. Example :
|
|
@coderfender Thanks for the feedback! I quickly checked 1 and 3 and they don't yield any improvement; I'd suspect the compiler will hoist loop-invariant branches like this out of the loop. The threshold check should be similar: it should be branch-predicted effectively. Lmk if you disagree! |
|
Sure @neilconway . Thank you for trying the approaches. When we tried to improve cast ops in comet (string to integer), taking out the if condition and implementing 3 separate functions for various eval modes ( ANSI , Try and Legacy modes) helped with the performance. Let me find that PR up and link it here for reference for reference. Let me pull the branch and see if there are other potential wins. Also, I was wondering the magic number 32 (between using arrow kernel vs for loop implementation) is dependent on the hardware ? |
| 10 | ||
| NULL | ||
| 7 | ||
| 100 |
There was a problem hiding this comment.
nit : perhaps we could also validate the result's datatype ?
| ---- | ||
| NaN | ||
|
|
||
| query R |
There was a problem hiding this comment.
nit : may be we could check + inf / - inf as well
|
|
||
| # array_min with Int32 (exercises a different primitive type than Int64) | ||
| query I | ||
| select array_min(arrow_cast(make_array(10, -5, 3), 'List(Int32)')); |
|
Thanks @neilconway and @coderfender and @Dandandan I thought @coderfender 's comments were good suggestions -- maybe we can make a follow on PR to add them |
Which issue does this PR close?
array_min,array_maxfor primitive types #21100.Rationale for this change
In the current implementation, we construct a
PrimitiveArrayfor each row, feed it to the Arrowmin/maxkernel, and then collect the resultingScalarValues in aVec. We then construct a finalPrimitiveArrayfor the result viaScalarValue::iter_to_arrayof theVec.We can do better for ListArrays of primitive types. First, we can iterate directly over the flat values buffer of the
ListArrayfor the batch and compute the min/max from each row's slice directly. Second, Arrow'smin/maxkernels have a reasonable amount of per-call overhead; for small arrays, it is more efficient to compute the min/max ourselves via direct iteration.Benchmarks (8192 rows, arrays of int64 values, M4 Max):
What changes are included in this PR?
array_maxAre these changes tested?
Yes.
Are there any user-facing changes?
No.