Skip to content

Commit 0294a22

Browse files
neilconwayJefffrey
andauthored
perf: Optimize array_has() for scalar needle (#20374)
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes #20377. ## Rationale for this change `compare_with_eq()` checks for matching array elements via a single pass across the entire flat values buffer, which is reasonably fast. The previous implementation then determined per-row results by creating a BooleanArray slice for each row and calling `true_count()` to check for any matches. It turns out that that's quite a lot of per-row work. Instead, we use `BooleanBuffer::set_indices()` to iterate over the set bits in the comparison result in a single forward pass. We walk this iterator in lockstep with the row offsets to determine whether each row contains a match, which does much less work per-row. This can be substantially faster, especially for short arrays. For example, for 10-element arrays of int64, it is 5-10x faster than the previous approach. 10-element string arrays are 1.8-5x faster. The improvement is smaller but non-zero for larger arrays (e.g., ~1.2x faster for 500 element arrays). ## What changes are included in this PR? In addition to the optimization, this commit adjusts the `array_has` benchmark code to actually benchmark `array_has` evaluation (!). The previous benchmark just constructed an `Expr`. ## Are these changes tested? Yes. Passes existing tests. Performance validated via several benchmark runs. ## Are there any user-facing changes? No. --------- Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com>
1 parent 0022d8e commit 0294a22

2 files changed

Lines changed: 455 additions & 268 deletions

File tree

0 commit comments

Comments
 (0)