Commit 23633d4
Fix massive spill files for StringView/BinaryView columns
Add garbage collection for StringView and BinaryView arrays before spilling
to disk. This prevents sliced arrays from carrying their entire original
buffers when written to spill files.
Changes:
- Add gc_view_arrays() function to apply GC on view arrays
- Integrate GC into InProgressSpillFile::append_batch()
- Use simple threshold-based heuristic (100+ rows, 10KB+ buffer size)
Fixes #19414 where GROUP BY on StringView columns created 820MB spill files
instead of 33MB due to sliced arrays maintaining references to original buffers.
Testing shows 80-98% reduction in spill file sizes for typical GROUP BY workloads.1 parent 2818abb commit 23633d4
2 files changed
Lines changed: 520 additions & 3 deletions
Lines changed: 6 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
24 | 24 | | |
25 | 25 | | |
26 | 26 | | |
27 | | - | |
| 27 | + | |
28 | 28 | | |
29 | 29 | | |
30 | 30 | | |
| |||
50 | 50 | | |
51 | 51 | | |
52 | 52 | | |
| 53 | + | |
53 | 54 | | |
54 | 55 | | |
55 | 56 | | |
| |||
61 | 62 | | |
62 | 63 | | |
63 | 64 | | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
64 | 68 | | |
65 | 69 | | |
66 | 70 | | |
| |||
87 | 91 | | |
88 | 92 | | |
89 | 93 | | |
90 | | - | |
| 94 | + | |
91 | 95 | | |
92 | 96 | | |
93 | 97 | | |
| |||
0 commit comments