Commit eb33141
authored
feat: unify left and right functions and benches (#20114)
## Which issue does this PR close?
- Closes #20103
## Rationale for this change
A refactoring PR for performance improvement PRs for left #19749 and
right #20068.
## What changes are included in this PR?
1. Removed a lot of code duplication by extracting a common stringarray
/ stringview implementation. Now left and right UDFs entry points are
leaner. Differences are only in slicing - from the left or from the
right - which is implemented in a generic trait parameter, following the
design of trim.
2. Switched `left` to use `make_view` to avoid buffer tinkering in
datafusion code.
4. Combine left and right benches together
## Are these changes tested?
- Existing unit tests
- Existing SLTs passed
- Benches show the same performance improvement of 60-85%
Bench results against pre-optimisation commit
458b491:
<details>
left size=1024/string_array positive n/1024
time: [34.150 µs 34.694 µs 35.251 µs]
change: [−71.694% −70.722% −69.818%] (p = 0.00 < 0.05)
Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
left size=1024/string_array negative n/1024
time: [30.860 µs 31.396 µs 31.998 µs]
change: [−85.846% −85.294% −84.759%] (p = 0.00 < 0.05)
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
2 (2.00%) low mild
4 (4.00%) high mild
2 (2.00%) high severe
left size=4096/string_array positive n/4096
time: [112.19 µs 114.28 µs 116.98 µs]
change: [−71.673% −70.934% −70.107%] (p = 0.00 < 0.05)
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severe
left size=4096/string_array negative n/4096
time: [126.71 µs 129.06 µs 131.26 µs]
change: [−84.204% −83.809% −83.455%] (p = 0.00 < 0.05)
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
3 (3.00%) low mild
2 (2.00%) high mild
left size=1024/string_view_array positive n/1024
time: [30.249 µs 30.887 µs 31.461 µs]
change: [−75.288% −74.499% −73.743%] (p = 0.00 < 0.05)
Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) low mild
1 (1.00%) high mild
left size=1024/string_view_array negative n/1024
time: [48.404 µs 49.007 µs 49.608 µs]
change: [−66.827% −65.727% −64.652%] (p = 0.00 < 0.05)
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) low mild
1 (1.00%) high mild
1 (1.00%) high severe
left size=4096/string_view_array positive n/4096
time: [145.25 µs 148.47 µs 151.85 µs]
change: [−68.913% −67.836% −66.770%] (p = 0.00 < 0.05)
Performance has improved.
left size=4096/string_view_array negative n/4096
time: [203.11 µs 206.31 µs 209.98 µs]
change: [−57.411% −56.773% −56.142%] (p = 0.00 < 0.05)
Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
1 (1.00%) low mild
13 (13.00%) high mild
1 (1.00%) high severe
right size=1024/string_array positive n/1024
time: [30.820 µs 31.674 µs 32.627 µs]
change: [−84.230% −83.842% −83.402%] (p = 0.00 < 0.05)
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
5 (5.00%) high mild
right size=1024/string_array negative n/1024
time: [32.434 µs 33.170 µs 33.846 µs]
change: [−88.796% −88.460% −88.164%] (p = 0.00 < 0.05)
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild
right size=4096/string_array positive n/4096
time: [124.71 µs 126.54 µs 128.27 µs]
change: [−83.321% −82.902% −82.537%] (p = 0.00 < 0.05)
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild
right size=4096/string_array negative n/4096
time: [125.05 µs 127.67 µs 130.35 µs]
change: [−89.376% −89.193% −89.004%] (p = 0.00 < 0.05)
Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
right size=1024/string_view_array positive n/1024
time: [29.110 µs 29.608 µs 30.141 µs]
change: [−79.807% −79.330% −78.683%] (p = 0.00 < 0.05)
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
6 (6.00%) high mild
2 (2.00%) high severe
right size=1024/string_view_array negative n/1024
time: [44.883 µs 45.656 µs 46.511 µs]
change: [−71.157% −70.546% −69.874%] (p = 0.00 < 0.05)
Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
5 (5.00%) high mild
1 (1.00%) high severe
right size=4096/string_view_array positive n/4096
time: [139.57 µs 142.18 µs 144.96 µs]
change: [−75.610% −75.088% −74.549%] (p = 0.00 < 0.05)
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high severe
right size=4096/string_view_array negative n/4096
time: [221.47 µs 224.47 µs 227.72 µs]
change: [−64.625% −64.047% −63.504%] (p = 0.00 < 0.05)
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild
</details>
## Are there any user-facing changes?
<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->
<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->1 parent 2f90194 commit eb33141
8 files changed
Lines changed: 327 additions & 580 deletions
File tree
- datafusion/functions
- benches
- src/unicode
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
308 | 308 | | |
309 | 309 | | |
310 | 310 | | |
311 | | - | |
312 | | - | |
313 | | - | |
314 | | - | |
315 | | - | |
316 | | - | |
| 311 | + | |
317 | 312 | | |
318 | 313 | | |
319 | 314 | | |
| |||
This file was deleted.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
0 commit comments