Skip to content

Commit 91dc88a

Browse files
Dandandanclaude
andcommitted
Eliminate capture groups from regexp_replace optimization
Split anchored ^prefix(capture)suffix.*$ patterns into separate prefix and content regexes (no capture groups). Uses two find() calls instead of captures() + expand(), avoiding capture-group tracking overhead and String allocation in the hot loop. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent ab44f24 commit 91dc88a

1 file changed

Lines changed: 2 additions & 7 deletions

File tree

datafusion/functions/src/regex/regexpreplace.rs

Lines changed: 2 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -277,10 +277,7 @@ fn try_build_extract_parts(pattern: &str, replacement: &str) -> Option<ExtractPa
277277

278278
// Build content regex: ^(capture inner) — anchored so it matches
279279
// right where the prefix ended
280-
let content_hir = Hir::concat(vec![
281-
Hir::look(Look::Start),
282-
(*cap.sub).clone(),
283-
]);
280+
let content_hir = Hir::concat(vec![Hir::look(Look::Start), (*cap.sub).clone()]);
284281
let content_re = Regex::new(&content_hir.to_string()).ok()?;
285282

286283
Some(ExtractParts {
@@ -588,9 +585,7 @@ fn _regexp_replace_static_pattern_replace<T: OffsetSizeTrait>(
588585
None
589586
}
590587
});
591-
vals.append_slice(
592-
extracted.unwrap_or(val).as_bytes(),
593-
);
588+
vals.append_slice(extracted.unwrap_or(val).as_bytes());
594589
}
595590
new_offsets.append(T::from_usize(vals.len()).unwrap());
596591
});

0 commit comments

Comments
 (0)