Skip to content

Commit e9bcfb4

Browse files
authored
perf: Optimize compare_element_to_list (#20323)
## Which issue does this PR close? - Closes #20322 ## Rationale for this change `compare_element_to_list` is a utility function used by several of the array-related UDFs (e.g., array_position, array_positions, array_remove, and array_replace). The current implementation extracts a scalar from an array using `arrow::compute::take()`. This is slow; we can just use `slice` directly, which also avoids allocating an intermediate array of indices. ## What changes are included in this PR? ## Are these changes tested? Yes; microbenchmarks indicate 15-50% performance improvement for `array_remove`. ## Are there any user-facing changes? No.
1 parent f5a2ac3 commit e9bcfb4

2 files changed

Lines changed: 2 additions & 4 deletions

File tree

datafusion/functions-nested/src/position.rs

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -164,7 +164,6 @@ fn general_position_dispatch<O: OffsetSizeTrait>(args: &[ArrayRef]) -> Result<Ar
164164
let arr_from = if args.len() == 3 {
165165
as_int64_array(&args[2])?
166166
.values()
167-
.to_vec()
168167
.iter()
169168
.map(|&x| x - 1)
170169
.collect::<Vec<_>>()

datafusion/functions-nested/src/utils.rs

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ use std::sync::Arc;
2222
use arrow::datatypes::{DataType, Field, Fields};
2323

2424
use arrow::array::{
25-
Array, ArrayRef, BooleanArray, GenericListArray, OffsetSizeTrait, Scalar, UInt32Array,
25+
Array, ArrayRef, BooleanArray, GenericListArray, OffsetSizeTrait, Scalar,
2626
};
2727
use arrow::buffer::OffsetBuffer;
2828
use datafusion_common::cast::{
@@ -161,8 +161,7 @@ pub(crate) fn compare_element_to_list(
161161
);
162162
}
163163

164-
let indices = UInt32Array::from(vec![row_index as u32]);
165-
let element_array_row = arrow::compute::take(element_array, &indices, None)?;
164+
let element_array_row = element_array.slice(row_index, 1);
166165

167166
// Compute all positions in list_row_array (that is itself an
168167
// array) that are equal to `from_array_row`

0 commit comments

Comments
 (0)