Skip to content

Commit 3f38609

Browse files
authored
perf: Optimize replace() fastpath by avoiding alloc (#20344)
## Which issue does this PR close? - Closes #20343. ## Rationale for this change We already have a fastpath for when `from` and `to` are both single ASCII characters, but this fastpath could be further optimized by avoiding the `Vec<u8>` allocation. ## What changes are included in this PR? Implement the described optimization. ## Are these changes tested? Yes, no new tests or benchmarks warranted. This PR yields a 10-50% performance improvement for the relevant microbenchmarks. ## Are there any user-facing changes? No.
1 parent f471aaf commit 3f38609

1 file changed

Lines changed: 11 additions & 9 deletions

File tree

datafusion/functions/src/string/replace.rs

Lines changed: 11 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -228,19 +228,21 @@ fn replace_into_string(buffer: &mut String, string: &str, from: &str, to: &str)
228228
return;
229229
}
230230

231-
// Fast path for replacing a single ASCII character with another single ASCII character
232-
// This matches Rust's str::replace() optimization and enables vectorization
231+
// Fast path for replacing a single ASCII character with another single ASCII character.
232+
// Extends the buffer's underlying Vec<u8> directly, for performance.
233233
if let ([from_byte], [to_byte]) = (from.as_bytes(), to.as_bytes())
234234
&& from_byte.is_ascii()
235235
&& to_byte.is_ascii()
236236
{
237-
// SAFETY: We're replacing ASCII with ASCII, which preserves UTF-8 validity
238-
let replaced: Vec<u8> = string
239-
.as_bytes()
240-
.iter()
241-
.map(|b| if *b == *from_byte { *to_byte } else { *b })
242-
.collect();
243-
buffer.push_str(unsafe { std::str::from_utf8_unchecked(&replaced) });
237+
// SAFETY: Replacing an ASCII byte with another ASCII byte preserves UTF-8 validity.
238+
unsafe {
239+
buffer.as_mut_vec().extend(
240+
string
241+
.as_bytes()
242+
.iter()
243+
.map(|&b| if b == *from_byte { *to_byte } else { b }),
244+
);
245+
}
244246
return;
245247
}
246248

0 commit comments

Comments
 (0)