Commit e4e11b2
committed
perf: move stats init before RG pruning so first file also benefits
Move try_init_topk_threshold() from build_stream() to
prune_row_groups(), before prune_by_statistics(). This way:
- File 1: stats init sets threshold from ALL its RG statistics,
then prune_by_statistics uses it to prune file 1's own RGs.
Only the best RG(s) are read, rest skipped with zero I/O.
- File 2+: dynamic filter already has tight threshold from file 1,
most RGs pruned immediately.
This effectively achieves dynamic RG pruning without needing morsel-
level scheduling — the threshold is computed from statistics (no data
read), then used to prune RGs in the same file.1 parent bccc42b commit e4e11b2
1 file changed
Lines changed: 16 additions & 17 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
905 | 905 | | |
906 | 906 | | |
907 | 907 | | |
| 908 | + | |
| 909 | + | |
| 910 | + | |
| 911 | + | |
| 912 | + | |
| 913 | + | |
| 914 | + | |
| 915 | + | |
| 916 | + | |
| 917 | + | |
| 918 | + | |
| 919 | + | |
| 920 | + | |
| 921 | + | |
| 922 | + | |
| 923 | + | |
908 | 924 | | |
909 | 925 | | |
910 | 926 | | |
| |||
1087 | 1103 | | |
1088 | 1104 | | |
1089 | 1105 | | |
1090 | | - | |
1091 | | - | |
1092 | | - | |
1093 | | - | |
1094 | | - | |
1095 | | - | |
1096 | | - | |
1097 | | - | |
1098 | | - | |
1099 | | - | |
1100 | | - | |
1101 | | - | |
1102 | | - | |
1103 | | - | |
1104 | | - | |
1105 | | - | |
1106 | | - | |
1107 | 1106 | | |
1108 | 1107 | | |
1109 | 1108 | | |
| |||
0 commit comments