Commit e2d08ef
committed
GH-3503: Optimize ByteStreamSplitValuesWriter with batched scatter writes
The current ByteStreamSplitValuesWriter.writeFloat/writeDouble/writeInteger/
writeLong path allocates a new byte[4] or byte[8] per value via
BytesUtils.intToBytes / BytesUtils.longToBytes, then dispatches one
single-byte CapacityByteArrayOutputStream.write(int) call per byte per
value (4 calls per float/int, 8 per double/long). For a 100k-value page
that is up to 800k single-byte virtual dispatches plus 100k short-lived
byte[] allocations.
This change collapses that hot path in two stacked steps:
1. Eliminate the per-value byte[] allocation by inlining the
little-endian decomposition with bit shifts into helper methods
bufferInt(int) / bufferLong(long), instead of going through
BytesUtils.intToBytes / BytesUtils.longToBytes which allocate
byte[4] / byte[8] on every call.
2. Batch values into a small per-instance scratch buffer (BATCH_SIZE = 128)
and flush them as N bulk write(byte[], off, len) calls per stream per
flush, replacing N * elementSizeInBytes single-byte virtual dispatches
with elementSizeInBytes bulk writes. The batch is flushed automatically
when full, on getBytes(), and is included in getBufferedSize() so page
sizing decisions remain correct. reset() and close() clear the pending
batch. The constant was selected by sweeping 16/32/64/128/256/512/1024;
128 maximises FLOAT throughput while still capturing most of the
DOUBLE/LONG gains.
Only one of intBatch / longBatch is used per writer instance; the four
numeric subclasses (Float/Double/Integer/Long) each call exactly one of
bufferInt / bufferLong via their writeXxx implementations. The
FixedLenByteArrayByteStreamSplitValuesWriter still uses scatterBytes(byte[])
since its values arrive as already-laid-out byte arrays.
Benchmark (new ByteStreamSplitEncodingBenchmark, 100k values per
invocation, JDK 18, JMH -wi 5 -i 10 -f 3, 30 samples per row):
Type Before (ops/s) After (ops/s) Improvement Alloc B/op
Float 15,080,427 65,060,920 +331% (4.31x) 33.27 -> 9.27 (-72%)
Double 6,994,501 49,475,535 +608% (7.07x) 42.54 -> 18.55 (-56%)
Int 15,641,334 68,128,560 +335% (4.36x) 33.27 -> 9.27 (-72%)
Long 7,090,154 53,225,645 +651% (7.51x) 42.54 -> 18.55 (-56%)
The remaining per-op allocation (~9 B/op for Int/Float, ~19 B/op for
Long/Double) is the BytesInput[] returned by getBytes() and the streams'
internal slabs, which are amortised across the page rather than per value.
All 573 parquet-column tests pass.1 parent 53d7842 commit e2d08ef
1 file changed
Lines changed: 88 additions & 6 deletions
File tree
- parquet-column/src/main/java/org/apache/parquet/column/values/bytestreamsplit
Lines changed: 88 additions & 6 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
23 | | - | |
24 | 23 | | |
25 | 24 | | |
26 | 25 | | |
| |||
29 | 28 | | |
30 | 29 | | |
31 | 30 | | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
32 | 38 | | |
33 | 39 | | |
34 | 40 | | |
35 | 41 | | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
36 | 49 | | |
37 | 50 | | |
38 | 51 | | |
| |||
53 | 66 | | |
54 | 67 | | |
55 | 68 | | |
56 | | - | |
| 69 | + | |
| 70 | + | |
57 | 71 | | |
58 | 72 | | |
59 | 73 | | |
| |||
62 | 76 | | |
63 | 77 | | |
64 | 78 | | |
| 79 | + | |
65 | 80 | | |
66 | 81 | | |
67 | 82 | | |
| |||
76 | 91 | | |
77 | 92 | | |
78 | 93 | | |
| 94 | + | |
79 | 95 | | |
80 | 96 | | |
81 | 97 | | |
82 | 98 | | |
83 | 99 | | |
84 | 100 | | |
85 | 101 | | |
| 102 | + | |
86 | 103 | | |
87 | 104 | | |
88 | 105 | | |
| |||
99 | 116 | | |
100 | 117 | | |
101 | 118 | | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
102 | 184 | | |
103 | 185 | | |
104 | 186 | | |
| |||
116 | 198 | | |
117 | 199 | | |
118 | 200 | | |
119 | | - | |
| 201 | + | |
120 | 202 | | |
121 | 203 | | |
122 | 204 | | |
| |||
133 | 215 | | |
134 | 216 | | |
135 | 217 | | |
136 | | - | |
| 218 | + | |
137 | 219 | | |
138 | 220 | | |
139 | 221 | | |
| |||
149 | 231 | | |
150 | 232 | | |
151 | 233 | | |
152 | | - | |
| 234 | + | |
153 | 235 | | |
154 | 236 | | |
155 | 237 | | |
| |||
165 | 247 | | |
166 | 248 | | |
167 | 249 | | |
168 | | - | |
| 250 | + | |
169 | 251 | | |
170 | 252 | | |
171 | 253 | | |
| |||
0 commit comments