Commit 0144570
Add a DataFusion-side trait that abstracts over the bulk-NULL string
array builders (GenericStringArrayBuilder<O> and
StringViewArrayBuilder), so that functions which dispatch over
Utf8/LargeUtf8/Utf8View can adopt the new builders without giving up
their single-bodied generic implementation.
Convert `repeat` as the first call site. The output is null iff either
input is null, so the per-row null match becomes a single
NullBuffer::union over the input null buffers, evaluated once before the
loop.
Also mark the inherent append_value/append_placeholder methods on the
new builders as #[inline]; without this, calls through the trait wrapper
end up going through a non-inlined inherent and slow down small-output
paths.
## Which issue does this PR close?
- Closes #21853.
## Rationale for this change
Optimize NULL handling in `repeat` using the bulk-NULL string builders
that have recently been added. This requires adding
`BulkNullStringArrayBuilder`, a trait that is similar in spirit to
Arrow's `StringLikeArrayBuilder`.
Benchmarks:
- repeat_string overflow [size=1024, repeat_times=1073741824]: 1022.5ns
→ 1054.5ns (+3.13%)
- repeat_string overflow [size=4096, repeat_times=1073741824]: 1016.6ns
→ 1055.3ns (+3.81%)
- repeat_large_string [size=1024, repeat_times=3]: 32.4µs → 26.6µs
(−17.90%)
- repeat_large_string [size=4096, repeat_times=3]: 127.4µs → 104.0µs
(−18.37%)
- repeat_string [size=1024, repeat_times=3]: 32.6µs → 26.8µs (−17.79%)
- repeat_string [size=4096, repeat_times=3]: 127.4µs → 105.5µs (−17.19%)
- repeat_string_view [size=1024, repeat_times=3]: 37.3µs → 31.7µs
(−15.01%)
- repeat_string_view [size=4096, repeat_times=3]: 146.5µs → 124.5µs
(−15.02%)
- repeat_large_string [size=1024, repeat_times=30]: 82.0µs → 80.4µs
(−1.95%)
- repeat_large_string [size=4096, repeat_times=30]: 344.2µs → 338.7µs
(−1.60%)
- repeat_string [size=1024, repeat_times=30]: 81.7µs → 79.7µs (−2.45%)
- repeat_string [size=4096, repeat_times=30]: 352.2µs → 334.7µs (−4.97%)
- repeat_string_view [size=1024, repeat_times=30]: 88.1µs → 83.1µs
(−5.68%)
- repeat_string_view [size=4096, repeat_times=30]: 368.8µs → 342.6µs
(−7.10%)
- repeat/scalar_utf8: 174.7ns → 179.2ns (+2.58%)
- repeat/scalar_utf8view: 174.5ns → 180.5ns (+3.44%)
## What changes are included in this PR?
* Add `BulkNullStringArrayBuilder`
* Optimize `repeat` using `BulkNullStringArrayBuilder`
* Inline some functions in GenericStringBuilder; benchmarking suggests
this is a win
## Are these changes tested?
Yes.
## Are there any user-facing changes?
No.
---------
Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>
Co-authored-by: Dmitrii Blaginin <dmitrii@blaginin.me>
1 parent 0bb17bc commit 0144570
3 files changed
Lines changed: 282 additions & 47 deletions
File tree
- datafusion
- functions/src
- string
- sqllogictest/test_files/string
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
18 | | - | |
19 | | - | |
20 | | - | |
21 | | - | |
22 | | - | |
23 | | - | |
| 18 | + | |
| 19 | + | |
24 | 20 | | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
25 | 24 | | |
26 | 25 | | |
27 | 26 | | |
| |||
190 | 189 | | |
191 | 190 | | |
192 | 191 | | |
193 | | - | |
194 | | - | |
195 | | - | |
196 | | - | |
197 | | - | |
198 | | - | |
199 | | - | |
| 192 | + | |
| 193 | + | |
200 | 194 | | |
201 | 195 | | |
202 | 196 | | |
203 | 197 | | |
204 | 198 | | |
205 | | - | |
| 199 | + | |
206 | 200 | | |
207 | 201 | | |
208 | 202 | | |
209 | | - | |
210 | | - | |
211 | | - | |
212 | | - | |
213 | | - | |
214 | | - | |
| 203 | + | |
215 | 204 | | |
216 | 205 | | |
217 | 206 | | |
218 | 207 | | |
219 | 208 | | |
220 | | - | |
| 209 | + | |
221 | 210 | | |
222 | 211 | | |
223 | 212 | | |
224 | | - | |
225 | | - | |
226 | | - | |
227 | | - | |
228 | | - | |
229 | | - | |
| 213 | + | |
230 | 214 | | |
231 | 215 | | |
232 | 216 | | |
| |||
278 | 262 | | |
279 | 263 | | |
280 | 264 | | |
281 | | - | |
| 265 | + | |
282 | 266 | | |
283 | 267 | | |
284 | 268 | | |
| |||
301 | 285 | | |
302 | 286 | | |
303 | 287 | | |
304 | | - | |
305 | | - | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
306 | 292 | | |
307 | | - | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
308 | 298 | | |
309 | | - | |
| 299 | + | |
310 | 300 | | |
311 | 301 | | |
312 | 302 | | |
| |||
316 | 306 | | |
317 | 307 | | |
318 | 308 | | |
319 | | - | |
320 | | - | |
321 | | - | |
322 | | - | |
323 | | - | |
324 | | - | |
325 | | - | |
326 | | - | |
327 | | - | |
328 | | - | |
329 | | - | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
330 | 319 | | |
331 | 320 | | |
332 | 321 | | |
333 | 322 | | |
334 | | - | |
| 323 | + | |
335 | 324 | | |
336 | 325 | | |
337 | 326 | | |
338 | 327 | | |
339 | | - | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
340 | 333 | | |
341 | 334 | | |
342 | 335 | | |
| |||
444 | 437 | | |
445 | 438 | | |
446 | 439 | | |
| 440 | + | |
| 441 | + | |
| 442 | + | |
| 443 | + | |
| 444 | + | |
| 445 | + | |
| 446 | + | |
| 447 | + | |
| 448 | + | |
| 449 | + | |
| 450 | + | |
| 451 | + | |
| 452 | + | |
| 453 | + | |
| 454 | + | |
| 455 | + | |
| 456 | + | |
| 457 | + | |
| 458 | + | |
| 459 | + | |
| 460 | + | |
| 461 | + | |
| 462 | + | |
| 463 | + | |
| 464 | + | |
| 465 | + | |
| 466 | + | |
| 467 | + | |
| 468 | + | |
| 469 | + | |
| 470 | + | |
| 471 | + | |
| 472 | + | |
| 473 | + | |
| 474 | + | |
| 475 | + | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
| 485 | + | |
| 486 | + | |
| 487 | + | |
| 488 | + | |
| 489 | + | |
| 490 | + | |
| 491 | + | |
| 492 | + | |
| 493 | + | |
| 494 | + | |
| 495 | + | |
| 496 | + | |
| 497 | + | |
| 498 | + | |
| 499 | + | |
| 500 | + | |
| 501 | + | |
| 502 | + | |
| 503 | + | |
| 504 | + | |
447 | 505 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
| 20 | + | |
20 | 21 | | |
21 | 22 | | |
22 | 23 | | |
23 | 24 | | |
24 | | - | |
25 | | - | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
26 | 28 | | |
27 | 29 | | |
28 | 30 | | |
| |||
473 | 475 | | |
474 | 476 | | |
475 | 477 | | |
| 478 | + | |
476 | 479 | | |
477 | 480 | | |
478 | 481 | | |
| |||
482 | 485 | | |
483 | 486 | | |
484 | 487 | | |
| 488 | + | |
485 | 489 | | |
486 | 490 | | |
487 | 491 | | |
| |||
672 | 676 | | |
673 | 677 | | |
674 | 678 | | |
| 679 | + | |
| 680 | + | |
| 681 | + | |
| 682 | + | |
| 683 | + | |
| 684 | + | |
| 685 | + | |
| 686 | + | |
| 687 | + | |
| 688 | + | |
| 689 | + | |
| 690 | + | |
| 691 | + | |
| 692 | + | |
| 693 | + | |
| 694 | + | |
| 695 | + | |
| 696 | + | |
| 697 | + | |
| 698 | + | |
| 699 | + | |
| 700 | + | |
| 701 | + | |
| 702 | + | |
| 703 | + | |
| 704 | + | |
| 705 | + | |
| 706 | + | |
| 707 | + | |
| 708 | + | |
| 709 | + | |
| 710 | + | |
| 711 | + | |
| 712 | + | |
| 713 | + | |
| 714 | + | |
| 715 | + | |
| 716 | + | |
| 717 | + | |
| 718 | + | |
| 719 | + | |
675 | 720 | | |
676 | 721 | | |
677 | 722 | | |
| |||
962 | 1007 | | |
963 | 1008 | | |
964 | 1009 | | |
| 1010 | + | |
| 1011 | + | |
| 1012 | + | |
| 1013 | + | |
| 1014 | + | |
| 1015 | + | |
| 1016 | + | |
| 1017 | + | |
| 1018 | + | |
| 1019 | + | |
| 1020 | + | |
| 1021 | + | |
| 1022 | + | |
| 1023 | + | |
| 1024 | + | |
| 1025 | + | |
| 1026 | + | |
| 1027 | + | |
| 1028 | + | |
| 1029 | + | |
| 1030 | + | |
| 1031 | + | |
| 1032 | + | |
| 1033 | + | |
| 1034 | + | |
| 1035 | + | |
| 1036 | + | |
| 1037 | + | |
| 1038 | + | |
| 1039 | + | |
| 1040 | + | |
| 1041 | + | |
| 1042 | + | |
| 1043 | + | |
| 1044 | + | |
| 1045 | + | |
| 1046 | + | |
| 1047 | + | |
| 1048 | + | |
| 1049 | + | |
| 1050 | + | |
| 1051 | + | |
| 1052 | + | |
| 1053 | + | |
| 1054 | + | |
| 1055 | + | |
| 1056 | + | |
| 1057 | + | |
| 1058 | + | |
| 1059 | + | |
| 1060 | + | |
| 1061 | + | |
| 1062 | + | |
| 1063 | + | |
| 1064 | + | |
| 1065 | + | |
| 1066 | + | |
| 1067 | + | |
| 1068 | + | |
| 1069 | + | |
| 1070 | + | |
965 | 1071 | | |
0 commit comments