Skip to content

Commit 85cdf53

Browse files
notashesDandandanJefffrey
authored
perf: various optimizations to eliminate branch misprediction in hash_utils (#20168)
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Part of #20152 ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> Compile time monomorphization helps bring `rehash` outside the hot loop where it's not required. ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> Currently the PR adds a specialized `hash_dictionary_inner()` function with const generic parameters that check for nulls in keys, values. It also handles specific edge cases of just nulls in keys or values. ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> There are no additional tests yet. But I will add 'em as I continue. The benchmark results seem promising. here's `cargo bench --bench with_hashes -- dictionary` for <details> <summary>origin/main</summary> ``` Gnuplot not found, using plotters backend Benchmarking dictionary_utf8_int32: single, no nulls Benchmarking dictionary_utf8_int32: single, no nulls: Warming up for 3.0000 s Benchmarking dictionary_utf8_int32: single, no nulls: Collecting 100 samples in estimated 5.0461 s (470k iterations) Benchmarking dictionary_utf8_int32: single, no nulls: Analyzing dictionary_utf8_int32: single, no nulls time: [10.668 µs 10.700 µs 10.734 µs] Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) low mild Benchmarking dictionary_utf8_int32: single, nulls Benchmarking dictionary_utf8_int32: single, nulls: Warming up for 3.0000 s Benchmarking dictionary_utf8_int32: single, nulls: Collecting 100 samples in estimated 5.0428 s (409k iterations) Benchmarking dictionary_utf8_int32: single, nulls: Analyzing dictionary_utf8_int32: single, nulls time: [12.269 µs 12.293 µs 12.322 µs] Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) high mild Benchmarking dictionary_utf8_int32: multiple, no nulls Benchmarking dictionary_utf8_int32: multiple, no nulls: Warming up for 3.0000 s Benchmarking dictionary_utf8_int32: multiple, no nulls: Collecting 100 samples in estimated 5.0864 s (162k iterations) Benchmarking dictionary_utf8_int32: multiple, no nulls: Analyzing dictionary_utf8_int32: multiple, no nulls time: [31.357 µs 31.426 µs 31.506 µs] Found 7 outliers among 100 measurements (7.00%) 1 (1.00%) low mild 5 (5.00%) high mild 1 (1.00%) high severe Benchmarking dictionary_utf8_int32: multiple, nulls Benchmarking dictionary_utf8_int32: multiple, nulls: Warming up for 3.0000 s Benchmarking dictionary_utf8_int32: multiple, nulls: Collecting 100 samples in estimated 5.0842 s (141k iterations) Benchmarking dictionary_utf8_int32: multiple, nulls: Analyzing dictionary_utf8_int32: multiple, nulls time: [36.060 µs 36.135 µs 36.220 µs] Found 10 outliers among 100 measurements (10.00%) 3 (3.00%) low severe 1 (1.00%) low mild 1 (1.00%) high mild 5 (5.00%) high severe ``` </details> <details> <summary>feat/brunch-prediction</summary> ``` Gnuplot not found, using plotters backend Benchmarking dictionary_utf8_int32: single, no nulls Benchmarking dictionary_utf8_int32: single, no nulls: Warming up for 3.0000 s Benchmarking dictionary_utf8_int32: single, no nulls: Collecting 100 samples in estimated 5.0176 s (1.1M iterations) Benchmarking dictionary_utf8_int32: single, no nulls: Analyzing dictionary_utf8_int32: single, no nulls time: [4.7186 µs 4.7496 µs 4.7821 µs] change: [−55.829% −55.537% −55.240%] (p = 0.00 < 0.05) Performance has improved. Benchmarking dictionary_utf8_int32: single, nulls Benchmarking dictionary_utf8_int32: single, nulls: Warming up for 3.0000 s Benchmarking dictionary_utf8_int32: single, nulls: Collecting 100 samples in estimated 5.0295 s (712k iterations) Benchmarking dictionary_utf8_int32: single, nulls: Analyzing dictionary_utf8_int32: single, nulls time: [6.9647 µs 7.0426 µs 7.1281 µs] change: [−43.806% −43.445% −42.993%] (p = 0.00 < 0.05) Performance has improved. Found 16 outliers among 100 measurements (16.00%) 1 (1.00%) low severe 4 (4.00%) low mild 1 (1.00%) high mild 10 (10.00%) high severe Benchmarking dictionary_utf8_int32: multiple, no nulls Benchmarking dictionary_utf8_int32: multiple, no nulls: Warming up for 3.0000 s Benchmarking dictionary_utf8_int32: multiple, no nulls: Collecting 100 samples in estimated 5.0600 s (348k iterations) Benchmarking dictionary_utf8_int32: multiple, no nulls: Analyzing dictionary_utf8_int32: multiple, no nulls time: [13.365 µs 13.384 µs 13.404 µs] change: [−57.610% −57.464% −57.313%] (p = 0.00 < 0.05) Performance has improved. Found 12 outliers among 100 measurements (12.00%) 2 (2.00%) low severe 4 (4.00%) low mild 4 (4.00%) high mild 2 (2.00%) high severe Benchmarking dictionary_utf8_int32: multiple, nulls Benchmarking dictionary_utf8_int32: multiple, nulls: Warming up for 3.0000 s Benchmarking dictionary_utf8_int32: multiple, nulls: Collecting 100 samples in estimated 5.0569 s (242k iterations) Benchmarking dictionary_utf8_int32: multiple, nulls: Analyzing dictionary_utf8_int32: multiple, nulls time: [20.785 µs 20.962 µs 21.173 µs] change: [−42.370% −42.001% −41.579%] (p = 0.00 < 0.05) Performance has improved. Found 18 outliers among 100 measurements (18.00%) 1 (1.00%) low severe 3 (3.00%) high mild 14 (14.00%) high severe ``` </details> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> --------- Co-authored-by: Daniël Heres <danielheres@gmail.com> Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com>
1 parent 5fccac1 commit 85cdf53

1 file changed

Lines changed: 148 additions & 60 deletions

File tree

datafusion/common/src/hash_utils.rs

Lines changed: 148 additions & 60 deletions
Original file line numberDiff line numberDiff line change
@@ -392,33 +392,22 @@ fn hash_generic_byte_view_array<T: ByteViewType>(
392392
}
393393
}
394394

395-
/// Helper function to update hash for a dictionary key if the value is valid
396-
#[cfg(not(feature = "force_hash_collisions"))]
397-
#[inline]
398-
fn update_hash_for_dict_key(
399-
hash: &mut u64,
400-
dict_hashes: &[u64],
401-
dict_values: &dyn Array,
402-
idx: usize,
403-
multi_col: bool,
404-
) {
405-
if dict_values.is_valid(idx) {
406-
if multi_col {
407-
*hash = combine_hashes(dict_hashes[idx], *hash);
408-
} else {
409-
*hash = dict_hashes[idx];
410-
}
411-
}
412-
// no update for invalid dictionary value
413-
}
414-
415-
/// Hash the values in a dictionary array
416-
#[cfg(not(feature = "force_hash_collisions"))]
417-
fn hash_dictionary<K: ArrowDictionaryKeyType>(
395+
/// Hash dictionary array with compile-time specialization for null handling.
396+
///
397+
/// Uses const generics to eliminate runtim branching in the hot loop:
398+
/// - `HAS_NULL_KEYS`: Whether to check for null dictionary keys
399+
/// - `HAS_NULL_VALUES`: Whether to check for null dictionary values
400+
/// - `MULTI_COL`: Whether to combine with existing hash (true) or initialize (false)
401+
#[inline(never)]
402+
fn hash_dictionary_inner<
403+
K: ArrowDictionaryKeyType,
404+
const HAS_NULL_KEYS: bool,
405+
const HAS_NULL_VALUES: bool,
406+
const MULTI_COL: bool,
407+
>(
418408
array: &DictionaryArray<K>,
419409
random_state: &RandomState,
420410
hashes_buffer: &mut [u64],
421-
multi_col: bool,
422411
) -> Result<()> {
423412
// Hash each dictionary value once, and then use that computed
424413
// hash for each key value to avoid a potentially expensive
@@ -427,22 +416,91 @@ fn hash_dictionary<K: ArrowDictionaryKeyType>(
427416
let mut dict_hashes = vec![0; dict_values.len()];
428417
create_hashes([dict_values], random_state, &mut dict_hashes)?;
429418

430-
// combine hash for each index in values
431-
for (hash, key) in hashes_buffer.iter_mut().zip(array.keys().iter()) {
432-
if let Some(key) = key {
419+
if HAS_NULL_KEYS {
420+
for (hash, key) in hashes_buffer.iter_mut().zip(array.keys().iter()) {
421+
if let Some(key) = key {
422+
let idx = key.as_usize();
423+
if !HAS_NULL_VALUES || dict_values.is_valid(idx) {
424+
if MULTI_COL {
425+
*hash = combine_hashes(dict_hashes[idx], *hash);
426+
} else {
427+
*hash = dict_hashes[idx];
428+
}
429+
}
430+
}
431+
}
432+
} else {
433+
for (hash, key) in hashes_buffer.iter_mut().zip(array.keys().values()) {
433434
let idx = key.as_usize();
434-
update_hash_for_dict_key(
435-
hash,
436-
&dict_hashes,
437-
dict_values.as_ref(),
438-
idx,
439-
multi_col,
440-
);
441-
} // no update for Null key
435+
if !HAS_NULL_VALUES || dict_values.is_valid(idx) {
436+
if MULTI_COL {
437+
*hash = combine_hashes(dict_hashes[idx], *hash);
438+
} else {
439+
*hash = dict_hashes[idx];
440+
}
441+
}
442+
}
442443
}
443444
Ok(())
444445
}
445446

447+
/// Hash the values in a dictionary array
448+
#[cfg(not(feature = "force_hash_collisions"))]
449+
fn hash_dictionary<K: ArrowDictionaryKeyType>(
450+
array: &DictionaryArray<K>,
451+
random_state: &RandomState,
452+
hashes_buffer: &mut [u64],
453+
multi_col: bool,
454+
) -> Result<()> {
455+
let has_null_keys = array.keys().null_count() != 0;
456+
let has_null_values = array.values().null_count() != 0;
457+
458+
// Dispatcher based on null presence and multi-column mode
459+
// Should reduce branching within hot loops
460+
match (has_null_keys, has_null_values, multi_col) {
461+
(false, false, false) => hash_dictionary_inner::<K, false, false, false>(
462+
array,
463+
random_state,
464+
hashes_buffer,
465+
),
466+
(false, false, true) => hash_dictionary_inner::<K, false, false, true>(
467+
array,
468+
random_state,
469+
hashes_buffer,
470+
),
471+
(false, true, false) => hash_dictionary_inner::<K, false, true, false>(
472+
array,
473+
random_state,
474+
hashes_buffer,
475+
),
476+
(false, true, true) => hash_dictionary_inner::<K, false, true, true>(
477+
array,
478+
random_state,
479+
hashes_buffer,
480+
),
481+
(true, false, false) => hash_dictionary_inner::<K, true, false, false>(
482+
array,
483+
random_state,
484+
hashes_buffer,
485+
),
486+
(true, false, true) => hash_dictionary_inner::<K, true, false, true>(
487+
array,
488+
random_state,
489+
hashes_buffer,
490+
),
491+
(true, true, false) => hash_dictionary_inner::<K, true, true, false>(
492+
array,
493+
random_state,
494+
hashes_buffer,
495+
),
496+
(true, true, true) => hash_dictionary_inner::<K, true, true, true>(
497+
array,
498+
random_state,
499+
hashes_buffer,
500+
),
501+
}
502+
}
503+
446504
#[cfg(not(feature = "force_hash_collisions"))]
447505
fn hash_struct_array(
448506
array: &StructArray,
@@ -452,19 +510,21 @@ fn hash_struct_array(
452510
let nulls = array.nulls();
453511
let row_len = array.len();
454512

455-
let valid_row_indices: Vec<usize> = if let Some(nulls) = nulls {
456-
nulls.valid_indices().collect()
457-
} else {
458-
(0..row_len).collect()
459-
};
460-
461513
// Create hashes for each row that combines the hashes over all the column at that row.
462514
let mut values_hashes = vec![0u64; row_len];
463515
create_hashes(array.columns(), random_state, &mut values_hashes)?;
464516

465-
for i in valid_row_indices {
466-
let hash = &mut hashes_buffer[i];
467-
*hash = combine_hashes(*hash, values_hashes[i]);
517+
// Separate paths to avoid allocating Vec when there are no nulls
518+
if let Some(nulls) = nulls {
519+
for i in nulls.valid_indices() {
520+
let hash = &mut hashes_buffer[i];
521+
*hash = combine_hashes(*hash, values_hashes[i]);
522+
}
523+
} else {
524+
for i in 0..row_len {
525+
let hash = &mut hashes_buffer[i];
526+
*hash = combine_hashes(*hash, values_hashes[i]);
527+
}
468528
}
469529

470530
Ok(())
@@ -663,12 +723,17 @@ fn hash_fixed_list_array(
663723
Ok(())
664724
}
665725

726+
/// Inner hash function for RunArray
727+
#[inline(never)]
666728
#[cfg(not(feature = "force_hash_collisions"))]
667-
fn hash_run_array<R: RunEndIndexType>(
729+
fn hash_run_array_inner<
730+
R: RunEndIndexType,
731+
const HAS_NULL_VALUES: bool,
732+
const REHASH: bool,
733+
>(
668734
array: &RunArray<R>,
669735
random_state: &RandomState,
670736
hashes_buffer: &mut [u64],
671-
rehash: bool,
672737
) -> Result<()> {
673738
// We find the relevant runs that cover potentially sliced arrays, so we can only hash those
674739
// values. Then we find the runs that refer to the original runs and ensure that we apply
@@ -706,25 +771,23 @@ fn hash_run_array<R: RunEndIndexType>(
706771
.iter()
707772
.enumerate()
708773
{
709-
let is_null_value = sliced_values.is_null(adjusted_physical_index);
710774
let absolute_run_end = absolute_run_end.as_usize();
711-
712775
let end_in_slice = (absolute_run_end - array_offset).min(array_len);
713776

714-
if rehash {
715-
if !is_null_value {
716-
let value_hash = values_hashes[adjusted_physical_index];
717-
for hash in hashes_buffer
718-
.iter_mut()
719-
.take(end_in_slice)
720-
.skip(start_in_slice)
721-
{
722-
*hash = combine_hashes(value_hash, *hash);
723-
}
777+
if HAS_NULL_VALUES && sliced_values.is_null(adjusted_physical_index) {
778+
start_in_slice = end_in_slice;
779+
continue;
780+
}
781+
782+
let value_hash = values_hashes[adjusted_physical_index];
783+
let run_slice = &mut hashes_buffer[start_in_slice..end_in_slice];
784+
785+
if REHASH {
786+
for hash in run_slice.iter_mut() {
787+
*hash = combine_hashes(value_hash, *hash);
724788
}
725789
} else {
726-
let value_hash = values_hashes[adjusted_physical_index];
727-
hashes_buffer[start_in_slice..end_in_slice].fill(value_hash);
790+
run_slice.fill(value_hash);
728791
}
729792

730793
start_in_slice = end_in_slice;
@@ -733,6 +796,31 @@ fn hash_run_array<R: RunEndIndexType>(
733796
Ok(())
734797
}
735798

799+
#[cfg(not(feature = "force_hash_collisions"))]
800+
fn hash_run_array<R: RunEndIndexType>(
801+
array: &RunArray<R>,
802+
random_state: &RandomState,
803+
hashes_buffer: &mut [u64],
804+
rehash: bool,
805+
) -> Result<()> {
806+
let has_null_values = array.values().null_count() != 0;
807+
808+
match (has_null_values, rehash) {
809+
(false, false) => {
810+
hash_run_array_inner::<R, false, false>(array, random_state, hashes_buffer)
811+
}
812+
(false, true) => {
813+
hash_run_array_inner::<R, false, true>(array, random_state, hashes_buffer)
814+
}
815+
(true, false) => {
816+
hash_run_array_inner::<R, true, false>(array, random_state, hashes_buffer)
817+
}
818+
(true, true) => {
819+
hash_run_array_inner::<R, true, true>(array, random_state, hashes_buffer)
820+
}
821+
}
822+
}
823+
736824
/// Internal helper function that hashes a single array and either initializes or combines
737825
/// the hash values in the buffer.
738826
#[cfg(not(feature = "force_hash_collisions"))]

0 commit comments

Comments
 (0)