Commit ec92925
perf: Optimize
## Which issue does this PR close?
- Closes #21876.
## Rationale for this change
As with other recent optimizations, we can optimize NULL handling in
`substr_index` by using the new bulk-NULL string builders.
Benchmarks:
Utf8
- utf8_100_array_long_delimiter: 10.0 µs → 10.1 µs (+1.00%)
- utf8_100_array_single_delimiter: 2.9 µs → 2.5 µs (−13.79%)
- utf8_100_scalar_long_delimiter_neg: 4.1 µs → 3.5 µs (−14.63%)
- utf8_100_scalar_long_delimiter_pos: 2.9 µs → 2.7 µs (−6.90%)
- utf8_100_scalar_single_delimiter_neg: 2.2 µs → 1.993 µs (−9.41%)
- utf8_100_scalar_single_delimiter_pos: 2.1 µs → 1.845 µs (−12.13%)
- utf8_1000_array_long_delimiter: 101.0 µs → 101.1 µs (+0.10%)
- utf8_1000_array_single_delimiter: 36.8 µs → 31.7 µs (−13.86%)
- utf8_1000_scalar_long_delimiter_neg: 38.9 µs → 36.9 µs (−5.14%)
- utf8_1000_scalar_long_delimiter_pos: 25.1 µs → 23.3 µs (−7.17%)
- utf8_1000_scalar_single_delimiter_neg: 19.3 µs → 17.7 µs (−8.29%)
- utf8_1000_scalar_single_delimiter_pos: 18.2 µs → 16.6 µs (−8.79%)
- utf8_10000_array_long_delimiter: 1083.4 µs → 1038.2 µs (−4.17%)
- utf8_10000_array_single_delimiter: 461.8 µs → 414.7 µs (−10.20%)
- utf8_10000_scalar_long_delimiter_neg: 392.4 µs → 379.3 µs (−3.34%)
- utf8_10000_scalar_long_delimiter_pos: 246.5 µs → 227.4 µs (−7.75%)
- utf8_10000_scalar_single_delimiter_neg: 191.3 µs → 177.5 µs (−7.21%)
- utf8_10000_scalar_single_delimiter_pos: 179.4 µs → 168.8 µs (−5.91%)
Utf8View
- utf8view_100_array_long_delimiter: 9.5 µs → 9.8 µs (+3.16%)
- utf8view_100_array_single_delimiter: 2.6 µs → 2.6 µs (0.00%)
- utf8view_100_scalar_long_delimiter_neg: 4.0 µs → 4.0 µs (0.00%)
- utf8view_100_scalar_long_delimiter_pos: 2.8 µs → 2.8 µs (0.00%)
- utf8view_100_scalar_single_delimiter_neg: 2.3 µs → 2.3 µs (0.00%)
- utf8view_100_scalar_single_delimiter_pos: 2.2 µs → 2.1 µs (−4.55%)
- utf8view_1000_array_long_delimiter: 94.8 µs → 99.2 µs (+4.64%)
- utf8view_1000_array_single_delimiter: 31.5 µs → 32.0 µs (+1.59%)
- utf8view_1000_scalar_long_delimiter_neg: 38.7 µs → 39.0 µs (+0.78%)
- utf8view_1000_scalar_long_delimiter_pos: 25.4 µs → 25.4 µs (0.00%)
- utf8view_1000_scalar_single_delimiter_neg: 21.4 µs → 21.8 µs (+1.87%)
- utf8view_1000_scalar_single_delimiter_pos: 20.8 µs → 20.9 µs (+0.48%)
- utf8view_10000_array_long_delimiter: 998.4 µs → 1025.4 µs (+2.70%)
- utf8view_10000_array_single_delimiter: 414.9 µs → 415.7 µs (+0.19%)
- utf8view_10000_scalar_long_delimiter_neg: 393.7 µs → 395.9 µs (+0.56%)
- utf8view_10000_scalar_long_delimiter_pos: 253.4 µs → 252.7 µs (−0.28%)
- utf8view_10000_scalar_single_delimiter_neg: 214.5 µs → 217.3 µs
(+1.31%)
- utf8view_10000_scalar_single_delimiter_pos: 207.9 µs → 208.7 µs
(+0.38%)
This PR doesn't touch the Utf8View code path, so the Utf8View
regressions above are likely measurement noise.
## What changes are included in this PR?
* Optimize `substr_index` by switching from Arrow string builders to
bulk-NULL string builders
## Are these changes tested?
Yes, covered by existing tests.
## Are there any user-facing changes?
No.
---------
Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>substr_index to use bulk-NULL string builder (#21877)1 parent 22bb4e6 commit ec92925
1 file changed
Lines changed: 47 additions & 31 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
21 | | - | |
22 | | - | |
23 | | - | |
| 21 | + | |
| 22 | + | |
24 | 23 | | |
25 | 24 | | |
26 | 25 | | |
27 | 26 | | |
28 | 27 | | |
| 28 | + | |
29 | 29 | | |
30 | 30 | | |
31 | 31 | | |
| |||
151 | 151 | | |
152 | 152 | | |
153 | 153 | | |
154 | | - | |
| 154 | + | |
155 | 155 | | |
156 | 156 | | |
157 | 157 | | |
| |||
165 | 165 | | |
166 | 166 | | |
167 | 167 | | |
168 | | - | |
| 168 | + | |
169 | 169 | | |
170 | 170 | | |
171 | 171 | | |
| |||
229 | 229 | | |
230 | 230 | | |
231 | 231 | | |
232 | | - | |
| 232 | + | |
233 | 233 | | |
234 | 234 | | |
235 | 235 | | |
| |||
241 | 241 | | |
242 | 242 | | |
243 | 243 | | |
244 | | - | |
| 244 | + | |
245 | 245 | | |
246 | 246 | | |
247 | 247 | | |
| |||
261 | 261 | | |
262 | 262 | | |
263 | 263 | | |
264 | | - | |
| 264 | + | |
265 | 265 | | |
266 | 266 | | |
267 | 267 | | |
268 | | - | |
| 268 | + | |
269 | 269 | | |
270 | 270 | | |
271 | 271 | | |
272 | | - | |
| 272 | + | |
273 | 273 | | |
274 | | - | |
275 | | - | |
276 | | - | |
277 | | - | |
278 | | - | |
279 | | - | |
280 | | - | |
281 | | - | |
282 | | - | |
283 | | - | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
284 | 285 | | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
285 | 292 | | |
286 | 293 | | |
287 | | - | |
| 294 | + | |
288 | 295 | | |
289 | 296 | | |
290 | 297 | | |
| |||
332 | 339 | | |
333 | 340 | | |
334 | 341 | | |
335 | | - | |
| 342 | + | |
336 | 343 | | |
337 | 344 | | |
338 | 345 | | |
339 | | - | |
| 346 | + | |
340 | 347 | | |
341 | 348 | | |
342 | 349 | | |
343 | | - | |
| 350 | + | |
344 | 351 | | |
345 | 352 | | |
346 | 353 | | |
| |||
465 | 472 | | |
466 | 473 | | |
467 | 474 | | |
468 | | - | |
| 475 | + | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
469 | 480 | | |
470 | 481 | | |
471 | | - | |
| 482 | + | |
472 | 483 | | |
473 | 484 | | |
474 | | - | |
475 | | - | |
476 | | - | |
477 | | - | |
| 485 | + | |
| 486 | + | |
| 487 | + | |
| 488 | + | |
| 489 | + | |
478 | 490 | | |
| 491 | + | |
| 492 | + | |
| 493 | + | |
| 494 | + | |
479 | 495 | | |
480 | | - | |
| 496 | + | |
481 | 497 | | |
482 | 498 | | |
483 | 499 | | |
| |||
0 commit comments