feat: statistics-driven TopK optimization for parquet (file reorder + RG reorder + threshold init + cumulative prune) by zhuqi-lucas · Pull Request #21580 · apache/datafusion

zhuqi-lucas · 2026-04-13T06:40:11Z

Which issue does this PR close?

Closes #21691
Partial fix for #21399

Rationale for this change

TopK queries (ORDER BY col DESC/ASC LIMIT K) on parquet data have several inefficiencies:

Files and RGs are read in arbitrary order, not optimized for the sort direction
The dynamic filter threshold starts as lit(true), so early RGs are never pruned
All RGs are opened even when the top-K values are concentrated in a few RGs

What changes are included in this PR?

A chain of composable optimizations that minimize I/O for TopK queries:

1. Global file reorder (`FileSource::reorder_files`)

Sort files in the shared work queue by column statistics. DESC: highest min first; ASC: lowest max first. Works for ALL TopK via DynamicFilterPhysicalExpr.sort_options. Bails fast when sort column not in file schema (GROUP BY + ORDER BY).

2. RG reorder within file (`reorder_by_statistics`)

Reorder row groups by min values (ASC). Works for all TopK via DynamicFilter sort_options (with file schema check). Combined with reverse for DESC queries.

3. TopK threshold init from statistics (`try_init_topk_threshold`)

Before reading data, compute threshold from RG min/max stats. Runs BEFORE PruningPredicate build so the threshold is compiled into the predicate. Uses GtEq/LtEq to include boundary values. Null-aware filter for NULLS FIRST. Uses df.fetch() (TopK K value) so stats init skips when K spans multiple RGs. Restricted to sort pushdown + no WHERE (pure DynamicFilter predicate).

4. Cumulative RG pruning (`truncate_row_groups`)

After reorder + reverse, accumulate rows from the front until >= K, prune the rest. For non-sort-pushdown TopK, guarded by a non-overlap check (max(i) <= min(i+1)). Only when predicate is pure DynamicFilter (no WHERE).

5. Compose reorder + reverse

Sequential steps instead of mutually exclusive. Reverse only triggers when reorder succeeds (sort column found in file schema).

How they work together

File reorder (best file first in shared queue)
  → RG reorder (best RG first within file)
    → Reverse (flip for DESC)
      → Stats init (threshold from RG stats → PruningPredicate)
        → RG pruning (60 of 61 RGs skipped, zero I/O!)
          → Cumulative prune (confirm enough rows for K)
            → Read only 1 RG

Coverage matrix

Scenario	File reorder	RG reorder	Reverse	Stats init	Cumulative prune
Non-overlapping + no WHERE	✅	✅	✅	✅ (17-60x)	✅
Non-overlapping + WHERE	✅	✅	✅	❌	❌
Overlapping RGs	✅	✅	✅	❌	❌
Sort column not in parquet	❌ fast bail	❌ fast bail	❌	❌	❌

Local benchmark (single file, 61 sorted RGs, DESC LIMIT, 1 partition)

Query	Baseline	With optimizations	Speedup
Q1 (DESC LIMIT 100)	28.48 ms	1.64 ms	17.4x
Q2 (DESC LIMIT 1000)	22.24 ms	0.37 ms	60.1x
Q3 (SELECT * LIMIT 100)	22.51 ms	0.66 ms	34.1x
Q4 (SELECT * LIMIT 1000)	22.37 ms	0.61 ms	36.7x

Key bug fix: `SortExec.fetch` ordering

create_filter() was called before new_sort.fetch was set, so DynamicFilterPhysicalExpr.fetch was always 0. Fixed by setting fetch before creating the filter.

Changes to `DynamicFilterPhysicalExpr`

sort_options: Option<Vec<SortOptions>> — sort direction for each child
fetch: Option<usize> — TopK K value for cumulative pruning
new_with_sort_options() constructor, sort_options() and fetch() getters
Set by SortExec::create_filter() for all TopK queries

Are these changes tested?

110 unit tests in datafusion-datasource-parquet (all pass)
SLT tests: sort_pushdown.slt (Tests H/I/J/K), push_down_filter_parquet.slt, explain_analyze.slt, topk.slt (3 files)
Fuzz test: test_fuzz_topk_filter_pushdown — updated with tiebreaker columns for deterministic ORDER BY
ClickBench: no regression (fast bail for GROUP BY + ORDER BY queries)

Are there any user-facing changes?

No. Transparent optimization — same results, faster TopK on parquet with statistics.

zhuqi-lucas · 2026-04-13T06:42:46Z

cc @alamb @adriangb

Dandandan · 2026-04-13T06:44:38Z

run benchmarks

adriangbot · 2026-04-13T06:46:34Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4234363943-1136-k2bb2 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/reorder-row-groups-by-stats (a013bf6) to 29c5dd5 (merge-base) diff using: tpch
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-04-13T06:46:47Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4234363943-1134-pcn8f 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/reorder-row-groups-by-stats (a013bf6) to 29c5dd5 (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-04-13T06:47:16Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4234363943-1135-k9dwc 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/reorder-row-groups-by-stats (a013bf6) to 29c5dd5 (merge-base) diff using: tpcds
Results will be posted here when complete

File an issue against this benchmark runner

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR improves TopK performance for Parquet scans when sort pushdown is Inexact by enabling row-group reordering based on statistics, so likely “best” row groups are read earlier and dynamic filters can tighten sooner.

Changes:

Thread an optional LexOrdering from ParquetSource::try_pushdown_sort through ParquetMorselizer to the access-plan preparation step.
Add PreparedAccessPlan::reorder_by_statistics to reorder row_group_indexes using Parquet statistics.
Add unit tests covering reorder/skip behavior for multiple edge cases.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File	Description
datafusion/datasource-parquet/src/source.rs	Plumbs sort ordering into the file source for later row-group reordering.
datafusion/datasource-parquet/src/opener.rs	Carries optional sort ordering into the opener and applies `reorder_by_statistics` during plan preparation.
datafusion/datasource-parquet/src/access_plan.rs	Implements row-group reordering by statistics and adds focused unit tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-13T06:47:20Z

+        let sort_order = LexOrdering::new(order.iter().cloned());
+        let mut new_source = self.clone().with_reverse_row_groups(true);
+        new_source.sort_order_for_reorder = sort_order;


LexOrdering::new(...) appears to return a Result<LexOrdering, _> (as used with .unwrap() in the new unit tests), but here it’s assigned directly without ?/unwrap, and then assigned to sort_order_for_reorder: Option<LexOrdering> without wrapping in Some(...). This should be changed to construct a LexOrdering with error propagation and store it as Some(sort_order) (or skip setting the field on error). Otherwise this won’t compile.

Suggested change

let sort_order = LexOrdering::new(order.iter().cloned());

let mut new_source = self.clone().with_reverse_row_groups(true);

new_source.sort_order_for_reorder = sort_order;

let sort_order = LexOrdering::new(order.iter().cloned())?;

let mut new_source = self.clone().with_reverse_row_groups(true);

new_source.sort_order_for_reorder = Some(sort_order);

Copilot · 2026-04-13T06:47:21Z

+        // LexOrdering is guaranteed non-empty, so first() returns &PhysicalSortExpr
+        let first_sort_expr = sort_order.first();


sort_order.first() (if LexOrdering is Vec-like) returns Option<&PhysicalSortExpr>, but the code uses it as if it were &PhysicalSortExpr (first_sort_expr.expr...). This is likely a compile error. A concrete fix is to obtain the first element via iteration and handle the empty case (e.g., early-return Ok(self) if no sort expressions), then use the returned &PhysicalSortExpr.

Suggested change

// LexOrdering is guaranteed non-empty, so first() returns &PhysicalSortExpr

let first_sort_expr = sort_order.first();

let first_sort_expr = match sort_order.iter().next() {

Some(expr) => expr,

None => {

debug!("Skipping RG reorder: empty sort order");

return Ok(self);

}

};

Copilot · 2026-04-13T06:47:21Z

+            }
+        };
+
+        let descending = first_sort_expr.options.descending;


For DESC ordering, reordering by min values is often a poor proxy for “row group likely contains the largest values first”; typically you want to sort by max when descending == true (and by min when ascending). This can significantly reduce the intended TopK benefit (and can even choose a worse first row group when ranges overlap). Consider switching to row_group_maxs(...) for descending order, and update the doc comment (currently mentions “min/max”) and the DESC unit test accordingly.

Copilot · 2026-04-13T06:47:22Z

+        // Get min values for the selected row groups
+        let rg_metadata: Vec<&RowGroupMetaData> = self
+            .row_group_indexes
+            .iter()
+            .map(|&idx| file_metadata.row_group(idx))
+            .collect();
+
+        let min_values = match converter.row_group_mins(rg_metadata.iter().copied()) {
+            Ok(vals) => vals,
+            Err(e) => {
+                debug!("Skipping RG reorder: cannot get min values: {e}");
+                return Ok(self);
+            }
+        };
+
+        // Sort indices by min values
+        let sort_options = arrow::compute::SortOptions {
+            descending,
+            nulls_first: first_sort_expr.options.nulls_first,
+        };
+        let sorted_indices = match arrow::compute::sort_to_indices(
+            &min_values,


For DESC ordering, reordering by min values is often a poor proxy for “row group likely contains the largest values first”; typically you want to sort by max when descending == true (and by min when ascending). This can significantly reduce the intended TopK benefit (and can even choose a worse first row group when ranges overlap). Consider switching to row_group_maxs(...) for descending order, and update the doc comment (currently mentions “min/max”) and the DESC unit test accordingly.

Suggested change

// Get min values for the selected row groups

let rg_metadata: Vec<&RowGroupMetaData> = self

.row_group_indexes

.iter()

.map(|&idx| file_metadata.row_group(idx))

.collect();

let min_values = match converter.row_group_mins(rg_metadata.iter().copied()) {

Ok(vals) => vals,

Err(e) => {

debug!("Skipping RG reorder: cannot get min values: {e}");

return Ok(self);

}

};

// Sort indices by min values

let sort_options = arrow::compute::SortOptions {

descending,

nulls_first: first_sort_expr.options.nulls_first,

};

let sorted_indices = match arrow::compute::sort_to_indices(

&min_values,

// Get values for the selected row groups: mins for ASC, maxs for DESC

let rg_metadata: Vec<&RowGroupMetaData> = self

.row_group_indexes

.iter()

.map(|&idx| file_metadata.row_group(idx))

.collect();

let sort_values = match if descending {

converter.row_group_maxs(rg_metadata.iter().copied())

} else {

converter.row_group_mins(rg_metadata.iter().copied())

} {

Ok(vals) => vals,

Err(e) => {

debug!("Skipping RG reorder: cannot get min/max values: {e}");

return Ok(self);

}

};

// Sort indices by the statistics that best match the requested order

let sort_options = arrow::compute::SortOptions {

descending,

nulls_first: first_sort_expr.options.nulls_first,

};

let sorted_indices = match arrow::compute::sort_to_indices(

&sort_values,

This is a good point

Yes, this is a good point.

Copilot · 2026-04-13T06:47:22Z

+        let sorted_indices = match arrow::compute::sort_to_indices(
+            &min_values,
+            Some(sort_options),
+            None,
+        ) {


If multiple row groups share the same min (or max) statistic, sort_to_indices may not guarantee a deterministic/stable tie-breaker across platforms/versions. Since row-group order can affect scan reproducibility and performance debugging, consider adding a stable secondary key (e.g., original row group index) when statistics are equal.

Dandandan · 2026-04-13T06:52:16Z

+    /// - 0 or 1 row groups (nothing to reorder)
+    /// - Sort expression is not a simple column reference
+    /// - Statistics are unavailable
+    pub(crate) fn reorder_by_statistics(


I think @adriangb had the great idea to also order by grouping keys which can

reduce cardinality within partitions (partition-local state can be smaller)

allow for better cache locality (row groups with more equal keys are grouped together)

Doesn't have to be in this PR but perhaps we can think about how it fits in.

Thanks @Dandandan for review! That's a great extension. The reorder_by_statistics method is generic enough to take any LexOrdering — it doesn't need to be tied to TopK specifically. So extending this for GROUP BY should be a matter of:

Computing a preferred RG ordering from grouping keys in the aggregate planner

Passing it through to ParquetSource::sort_order_for_reorder

Happy to track this as a follow-up issue. Will open one after this PR lands.

Thanks @Dandandan! Created #21581 to track this. The existing infrastructure from this PR should be directly reusable — mainly needs the aggregate planner to populate sort_order_for_reorder from grouping keys.

Dandandan · 2026-04-13T06:57:24Z

run benchmarks

env:
    PUSHDOWN_FILTERS: true
    REORDER_FILTERS: true

adriangbot · 2026-04-13T06:59:28Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4234422910-1137-vvpxc 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/reorder-row-groups-by-stats (5018882) to 29c5dd5 (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-04-13T06:59:39Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4234422910-1138-qjxtt 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/reorder-row-groups-by-stats (5018882) to 29c5dd5 (merge-base) diff using: tpcds
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-04-13T06:59:40Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4234422910-1139-wpv6n 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/reorder-row-groups-by-stats (5018882) to 29c5dd5 (merge-base) diff using: tpch
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-04-13T07:05:19Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and feat_reorder-row-groups-by-stats
--------------------
Benchmark tpcds_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query     ┃                                     HEAD ┃         feat_reorder-row-groups-by-stats ┃       Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 1  │              6.76 / 7.21 ±0.73 / 8.66 ms │              6.66 / 7.05 ±0.75 / 8.56 ms │    no change │
│ QQuery 2  │        145.75 / 146.85 ±1.06 / 148.46 ms │        144.70 / 146.59 ±1.26 / 148.61 ms │    no change │
│ QQuery 3  │        114.48 / 115.73 ±0.96 / 117.37 ms │        113.70 / 114.51 ±0.53 / 115.15 ms │    no change │
│ QQuery 4  │    1336.62 / 1383.93 ±28.36 / 1413.36 ms │    1341.10 / 1369.70 ±25.80 / 1400.67 ms │    no change │
│ QQuery 5  │        172.96 / 173.76 ±0.93 / 175.52 ms │        172.87 / 173.71 ±1.06 / 175.76 ms │    no change │
│ QQuery 6  │       831.80 / 869.64 ±22.94 / 893.03 ms │       817.74 / 885.67 ±36.62 / 924.77 ms │    no change │
│ QQuery 7  │        343.27 / 346.44 ±2.73 / 351.44 ms │        343.10 / 347.19 ±4.26 / 354.07 ms │    no change │
│ QQuery 8  │        115.88 / 117.19 ±0.88 / 118.45 ms │        115.12 / 116.82 ±1.09 / 118.14 ms │    no change │
│ QQuery 9  │        101.81 / 105.19 ±5.49 / 116.11 ms │       100.38 / 109.94 ±10.62 / 129.61 ms │    no change │
│ QQuery 10 │        105.57 / 107.18 ±0.90 / 108.19 ms │        106.58 / 108.12 ±1.52 / 110.94 ms │    no change │
│ QQuery 11 │        951.72 / 963.82 ±6.42 / 968.96 ms │        964.70 / 976.63 ±7.17 / 986.01 ms │    no change │
│ QQuery 12 │           44.04 / 47.17 ±1.95 / 49.81 ms │           46.18 / 47.24 ±0.73 / 48.28 ms │    no change │
│ QQuery 13 │        403.35 / 405.32 ±1.43 / 407.21 ms │        403.48 / 405.90 ±1.57 / 407.89 ms │    no change │
│ QQuery 14 │     1009.53 / 1015.83 ±5.78 / 1026.74 ms │    1002.02 / 1020.87 ±15.11 / 1047.87 ms │    no change │
│ QQuery 15 │           16.12 / 17.91 ±1.76 / 20.55 ms │           17.16 / 19.05 ±1.98 / 22.65 ms │ 1.06x slower │
│ QQuery 16 │              7.31 / 7.59 ±0.20 / 7.91 ms │              7.88 / 8.59 ±0.73 / 9.70 ms │ 1.13x slower │
│ QQuery 17 │        229.47 / 231.19 ±1.44 / 233.08 ms │        240.36 / 243.29 ±2.25 / 246.13 ms │ 1.05x slower │
│ QQuery 18 │        126.61 / 128.79 ±1.68 / 131.55 ms │        134.44 / 135.44 ±0.69 / 136.56 ms │ 1.05x slower │
│ QQuery 19 │        156.40 / 157.51 ±0.85 / 159.00 ms │        162.88 / 164.97 ±1.38 / 166.21 ms │    no change │
│ QQuery 20 │           13.75 / 14.66 ±0.67 / 15.79 ms │           15.45 / 15.77 ±0.24 / 15.99 ms │ 1.08x slower │
│ QQuery 21 │           19.51 / 20.16 ±0.51 / 20.76 ms │           21.05 / 21.45 ±0.35 / 22.06 ms │ 1.06x slower │
│ QQuery 22 │        481.56 / 489.10 ±4.25 / 493.73 ms │        491.92 / 498.86 ±8.92 / 516.06 ms │    no change │
│ QQuery 23 │        873.33 / 884.81 ±6.37 / 892.89 ms │       887.79 / 896.87 ±10.62 / 916.66 ms │    no change │
│ QQuery 24 │        381.97 / 385.40 ±3.50 / 391.57 ms │        382.98 / 387.53 ±2.66 / 390.54 ms │    no change │
│ QQuery 25 │        340.86 / 344.45 ±2.19 / 346.83 ms │        340.45 / 343.50 ±1.73 / 345.12 ms │    no change │
│ QQuery 26 │           80.84 / 81.79 ±0.82 / 83.32 ms │           81.27 / 82.66 ±0.83 / 83.86 ms │    no change │
│ QQuery 27 │              6.88 / 7.52 ±0.74 / 8.95 ms │              6.87 / 7.23 ±0.48 / 8.16 ms │    no change │
│ QQuery 28 │        148.59 / 150.91 ±1.93 / 153.50 ms │        148.77 / 150.19 ±1.12 / 152.08 ms │    no change │
│ QQuery 29 │        282.92 / 285.15 ±1.50 / 287.45 ms │        283.28 / 284.50 ±1.29 / 286.63 ms │    no change │
│ QQuery 30 │           43.82 / 45.47 ±1.49 / 48.04 ms │           42.04 / 44.32 ±1.23 / 45.71 ms │    no change │
│ QQuery 31 │        169.84 / 172.47 ±2.27 / 176.48 ms │        169.29 / 171.30 ±1.68 / 173.99 ms │    no change │
│ QQuery 32 │           57.32 / 58.30 ±0.57 / 59.04 ms │           57.74 / 58.68 ±0.98 / 60.05 ms │    no change │
│ QQuery 33 │        140.09 / 143.20 ±1.85 / 145.61 ms │        141.03 / 142.63 ±0.92 / 143.86 ms │    no change │
│ QQuery 34 │              6.96 / 7.30 ±0.30 / 7.84 ms │              7.00 / 7.23 ±0.26 / 7.61 ms │    no change │
│ QQuery 35 │        107.83 / 109.50 ±1.10 / 110.70 ms │        106.64 / 109.99 ±2.04 / 112.55 ms │    no change │
│ QQuery 36 │              6.51 / 6.99 ±0.48 / 7.88 ms │              6.53 / 6.71 ±0.20 / 7.07 ms │    no change │
│ QQuery 37 │             8.73 / 9.25 ±0.67 / 10.56 ms │              8.21 / 8.81 ±0.46 / 9.57 ms │    no change │
│ QQuery 38 │           85.23 / 88.80 ±4.94 / 98.57 ms │           84.88 / 87.11 ±3.72 / 94.52 ms │    no change │
│ QQuery 39 │        125.77 / 129.50 ±4.44 / 137.69 ms │        127.22 / 128.59 ±1.12 / 130.62 ms │    no change │
│ QQuery 40 │        111.88 / 117.77 ±5.93 / 128.78 ms │        110.11 / 117.40 ±8.84 / 134.20 ms │    no change │
│ QQuery 41 │           15.66 / 16.18 ±0.59 / 17.31 ms │           14.86 / 16.06 ±1.18 / 18.18 ms │    no change │
│ QQuery 42 │        108.25 / 110.15 ±1.58 / 112.51 ms │        107.58 / 109.71 ±1.45 / 111.73 ms │    no change │
│ QQuery 43 │              5.98 / 6.31 ±0.27 / 6.73 ms │              6.03 / 6.53 ±0.80 / 8.12 ms │    no change │
│ QQuery 44 │           11.71 / 12.19 ±0.68 / 13.53 ms │           11.79 / 12.14 ±0.20 / 12.35 ms │    no change │
│ QQuery 45 │           51.06 / 52.40 ±0.78 / 53.28 ms │           50.16 / 52.05 ±1.38 / 54.23 ms │    no change │
│ QQuery 46 │              8.65 / 8.89 ±0.17 / 9.16 ms │              8.63 / 8.88 ±0.20 / 9.11 ms │    no change │
│ QQuery 47 │        710.73 / 722.40 ±6.30 / 729.86 ms │        733.32 / 745.39 ±6.92 / 754.00 ms │    no change │
│ QQuery 48 │        289.87 / 294.72 ±4.74 / 300.92 ms │        288.52 / 294.73 ±6.45 / 306.78 ms │    no change │
│ QQuery 49 │        251.64 / 252.97 ±1.41 / 255.48 ms │        255.34 / 255.97 ±0.56 / 256.78 ms │    no change │
│ QQuery 50 │        222.58 / 228.41 ±4.03 / 235.01 ms │        226.54 / 233.03 ±5.00 / 240.31 ms │    no change │
│ QQuery 51 │        180.59 / 184.08 ±2.83 / 187.39 ms │        183.67 / 186.36 ±1.96 / 188.77 ms │    no change │
│ QQuery 52 │        107.91 / 109.06 ±0.89 / 110.34 ms │        110.03 / 111.42 ±0.81 / 112.47 ms │    no change │
│ QQuery 53 │        104.18 / 105.27 ±1.15 / 107.27 ms │        105.42 / 106.80 ±1.39 / 109.23 ms │    no change │
│ QQuery 54 │        147.22 / 148.77 ±1.46 / 151.25 ms │        149.43 / 151.07 ±0.93 / 152.31 ms │    no change │
│ QQuery 55 │        108.14 / 109.84 ±1.73 / 112.49 ms │        108.46 / 110.09 ±1.84 / 113.65 ms │    no change │
│ QQuery 56 │        141.89 / 144.13 ±1.81 / 146.71 ms │        142.50 / 144.45 ±1.14 / 145.70 ms │    no change │
│ QQuery 57 │        172.45 / 174.79 ±1.90 / 177.95 ms │        176.55 / 178.66 ±1.32 / 180.38 ms │    no change │
│ QQuery 58 │        292.06 / 297.55 ±2.86 / 299.80 ms │        290.26 / 298.13 ±6.17 / 309.20 ms │    no change │
│ QQuery 59 │        199.68 / 202.81 ±4.20 / 210.76 ms │        196.18 / 201.51 ±2.94 / 205.01 ms │    no change │
│ QQuery 60 │        145.63 / 146.61 ±1.30 / 149.18 ms │        144.93 / 145.97 ±0.58 / 146.66 ms │    no change │
│ QQuery 61 │           13.09 / 13.42 ±0.22 / 13.74 ms │           13.26 / 13.48 ±0.20 / 13.79 ms │    no change │
│ QQuery 62 │      903.17 / 972.36 ±73.07 / 1110.97 ms │       893.58 / 940.09 ±28.77 / 975.15 ms │    no change │
│ QQuery 63 │        104.29 / 106.12 ±1.11 / 107.45 ms │        106.05 / 107.70 ±1.58 / 110.72 ms │    no change │
│ QQuery 64 │        684.26 / 697.36 ±8.94 / 709.76 ms │        684.46 / 694.14 ±8.98 / 708.42 ms │    no change │
│ QQuery 65 │        251.36 / 257.09 ±3.42 / 260.31 ms │        254.62 / 257.41 ±1.68 / 259.77 ms │    no change │
│ QQuery 66 │        239.53 / 250.02 ±7.11 / 258.78 ms │        242.28 / 255.27 ±8.91 / 265.85 ms │    no change │
│ QQuery 67 │        312.00 / 314.58 ±2.53 / 317.92 ms │        315.48 / 320.21 ±4.64 / 327.33 ms │    no change │
│ QQuery 68 │             8.86 / 9.83 ±1.00 / 11.23 ms │            8.85 / 10.18 ±0.80 / 10.96 ms │    no change │
│ QQuery 69 │        101.72 / 103.97 ±1.51 / 106.48 ms │        102.05 / 104.18 ±2.49 / 108.67 ms │    no change │
│ QQuery 70 │        349.78 / 359.97 ±8.05 / 369.60 ms │        341.41 / 352.18 ±7.53 / 362.53 ms │    no change │
│ QQuery 71 │        135.76 / 139.31 ±3.34 / 145.38 ms │        136.47 / 138.33 ±2.01 / 141.85 ms │    no change │
│ QQuery 72 │       610.09 / 628.55 ±11.61 / 639.76 ms │        618.47 / 625.85 ±7.96 / 638.13 ms │    no change │
│ QQuery 73 │              7.84 / 8.48 ±0.63 / 9.55 ms │             6.99 / 9.50 ±2.08 / 13.02 ms │ 1.12x slower │
│ QQuery 74 │        578.61 / 594.14 ±8.66 / 602.76 ms │        598.69 / 607.55 ±7.77 / 620.04 ms │    no change │
│ QQuery 75 │        276.80 / 279.31 ±2.06 / 282.54 ms │        280.18 / 283.49 ±2.96 / 288.22 ms │    no change │
│ QQuery 76 │        133.52 / 135.27 ±1.41 / 137.64 ms │        134.75 / 136.00 ±1.23 / 138.35 ms │    no change │
│ QQuery 77 │        187.89 / 190.27 ±1.66 / 192.45 ms │        188.51 / 190.82 ±2.30 / 194.65 ms │    no change │
│ QQuery 78 │        339.14 / 344.57 ±3.86 / 351.08 ms │        335.65 / 342.70 ±4.54 / 349.86 ms │    no change │
│ QQuery 79 │        235.13 / 237.76 ±1.41 / 239.31 ms │        235.50 / 239.71 ±2.39 / 242.04 ms │    no change │
│ QQuery 80 │        321.80 / 323.60 ±1.62 / 326.10 ms │        318.67 / 323.18 ±2.71 / 327.21 ms │    no change │
│ QQuery 81 │           26.78 / 27.93 ±1.53 / 30.94 ms │           26.92 / 27.55 ±0.60 / 28.44 ms │    no change │
│ QQuery 82 │        200.18 / 203.94 ±2.03 / 205.90 ms │        198.56 / 201.76 ±2.24 / 204.67 ms │    no change │
│ QQuery 83 │           38.87 / 40.48 ±1.80 / 43.37 ms │           38.94 / 40.31 ±0.99 / 41.65 ms │    no change │
│ QQuery 84 │           49.28 / 50.21 ±0.71 / 51.04 ms │           48.23 / 49.02 ±0.65 / 49.90 ms │    no change │
│ QQuery 85 │        146.57 / 149.44 ±1.63 / 151.12 ms │        149.68 / 151.20 ±0.91 / 152.20 ms │    no change │
│ QQuery 86 │           39.01 / 40.64 ±0.95 / 41.99 ms │           38.73 / 41.24 ±1.97 / 44.03 ms │    no change │
│ QQuery 87 │           88.77 / 90.74 ±2.50 / 95.56 ms │           87.04 / 89.26 ±2.80 / 94.74 ms │    no change │
│ QQuery 88 │        100.76 / 101.56 ±0.68 / 102.61 ms │        101.35 / 102.49 ±0.61 / 103.04 ms │    no change │
│ QQuery 89 │        118.94 / 121.61 ±1.96 / 124.28 ms │        120.01 / 120.84 ±0.76 / 121.77 ms │    no change │
│ QQuery 90 │           24.31 / 25.44 ±1.09 / 27.13 ms │           24.29 / 24.83 ±0.42 / 25.47 ms │    no change │
│ QQuery 91 │           60.69 / 63.74 ±1.69 / 65.64 ms │           62.38 / 65.54 ±1.79 / 67.28 ms │    no change │
│ QQuery 92 │           57.86 / 58.66 ±0.63 / 59.28 ms │           58.08 / 59.35 ±1.21 / 61.29 ms │    no change │
│ QQuery 93 │        187.91 / 189.87 ±2.11 / 193.21 ms │        187.01 / 189.06 ±1.65 / 192.02 ms │    no change │
│ QQuery 94 │           60.96 / 62.29 ±1.12 / 63.81 ms │           61.39 / 62.58 ±0.65 / 63.32 ms │    no change │
│ QQuery 95 │        129.33 / 129.84 ±0.28 / 130.12 ms │        129.13 / 130.31 ±1.16 / 132.24 ms │    no change │
│ QQuery 96 │           73.64 / 75.06 ±0.88 / 76.02 ms │           72.31 / 74.37 ±1.26 / 76.19 ms │    no change │
│ QQuery 97 │        124.76 / 127.56 ±2.34 / 131.18 ms │        124.99 / 127.31 ±1.47 / 128.75 ms │    no change │
│ QQuery 98 │        152.55 / 155.51 ±2.30 / 158.79 ms │        152.39 / 155.81 ±2.16 / 159.14 ms │    no change │
│ QQuery 99 │ 10799.15 / 10864.77 ±35.59 / 10898.37 ms │ 10810.76 / 10852.07 ±29.64 / 10887.35 ms │    no change │
└───────────┴──────────────────────────────────────────┴──────────────────────────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                               ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                               │ 31773.55ms │
│ Total Time (feat_reorder-row-groups-by-stats)   │ 31858.42ms │
│ Average Time (HEAD)                             │   320.94ms │
│ Average Time (feat_reorder-row-groups-by-stats) │   321.80ms │
│ Queries Faster                                  │          0 │
│ Queries Slower                                  │          7 │
│ Queries with No Change                          │         92 │
│ Queries with Failure                            │          0 │
└─────────────────────────────────────────────────┴────────────┘

Resource Usage

tpcds — base (merge-base)

Metric	Value
Wall time	159.2s
Peak memory	5.5 GiB
Avg memory	4.5 GiB
CPU user	262.4s
CPU sys	17.8s
Peak spill	0 B

tpcds — branch

Metric	Value
Wall time	159.6s
Peak memory	5.6 GiB
Avg memory	4.5 GiB
CPU user	263.9s
CPU sys	17.1s
Peak spill	0 B

File an issue against this benchmark runner

adriangbot · 2026-04-13T07:07:37Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and feat_reorder-row-groups-by-stats
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃      feat_reorder-row-groups-by-stats ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │          1.23 / 4.55 ±6.48 / 17.51 ms │          1.19 / 4.44 ±6.37 / 17.18 ms │     no change │
│ QQuery 1  │        14.40 / 14.69 ±0.23 / 14.92 ms │        14.23 / 14.59 ±0.20 / 14.82 ms │     no change │
│ QQuery 2  │        44.12 / 44.31 ±0.17 / 44.54 ms │        44.11 / 44.28 ±0.11 / 44.47 ms │     no change │
│ QQuery 3  │        41.87 / 44.88 ±2.83 / 48.21 ms │        45.19 / 46.05 ±0.92 / 47.82 ms │     no change │
│ QQuery 4  │     301.57 / 305.93 ±4.33 / 312.84 ms │     283.62 / 292.46 ±7.51 / 301.96 ms │     no change │
│ QQuery 5  │     343.76 / 349.26 ±2.90 / 351.52 ms │     340.90 / 346.09 ±3.65 / 350.98 ms │     no change │
│ QQuery 6  │          5.00 / 7.74 ±2.08 / 10.58 ms │          5.80 / 8.85 ±4.45 / 17.63 ms │  1.14x slower │
│ QQuery 7  │        16.79 / 17.42 ±0.40 / 17.97 ms │        16.80 / 16.96 ±0.18 / 17.30 ms │     no change │
│ QQuery 8  │     417.96 / 426.38 ±7.76 / 436.12 ms │     421.02 / 426.12 ±4.84 / 434.24 ms │     no change │
│ QQuery 9  │     666.36 / 676.75 ±8.41 / 686.88 ms │    655.03 / 663.43 ±10.04 / 682.29 ms │     no change │
│ QQuery 10 │        92.21 / 93.67 ±2.08 / 97.80 ms │       93.16 / 96.01 ±4.39 / 104.70 ms │     no change │
│ QQuery 11 │     104.40 / 105.92 ±1.09 / 107.50 ms │     103.33 / 108.14 ±4.01 / 115.55 ms │     no change │
│ QQuery 12 │     345.12 / 351.62 ±5.08 / 358.61 ms │     338.51 / 349.23 ±6.79 / 358.13 ms │     no change │
│ QQuery 13 │    454.95 / 466.86 ±13.39 / 492.97 ms │    459.79 / 482.10 ±32.75 / 546.44 ms │     no change │
│ QQuery 14 │     344.61 / 348.93 ±4.09 / 356.53 ms │     343.57 / 349.57 ±3.91 / 354.96 ms │     no change │
│ QQuery 15 │    354.08 / 376.26 ±20.59 / 412.84 ms │    353.05 / 376.01 ±22.93 / 414.75 ms │     no change │
│ QQuery 16 │    717.04 / 731.91 ±17.98 / 766.45 ms │    714.64 / 749.96 ±28.15 / 784.01 ms │     no change │
│ QQuery 17 │     711.73 / 718.92 ±4.04 / 723.31 ms │     713.17 / 717.82 ±5.10 / 727.41 ms │     no change │
│ QQuery 18 │ 1419.90 / 1476.97 ±45.70 / 1523.25 ms │  1361.04 / 1376.04 ±9.64 / 1390.97 ms │ +1.07x faster │
│ QQuery 19 │       35.97 / 46.26 ±19.56 / 85.37 ms │        35.78 / 38.31 ±1.93 / 41.70 ms │ +1.21x faster │
│ QQuery 20 │    712.30 / 733.16 ±16.60 / 755.99 ms │     707.03 / 714.65 ±8.59 / 731.42 ms │     no change │
│ QQuery 21 │     767.93 / 773.25 ±4.38 / 778.46 ms │     757.44 / 762.39 ±4.21 / 769.10 ms │     no change │
│ QQuery 22 │  1137.01 / 1149.77 ±8.70 / 1162.30 ms │  1134.94 / 1140.41 ±5.69 / 1150.59 ms │     no change │
│ QQuery 23 │ 3090.99 / 3109.19 ±13.77 / 3131.70 ms │ 3079.16 / 3106.21 ±14.77 / 3123.90 ms │     no change │
│ QQuery 24 │     100.24 / 103.67 ±2.55 / 106.89 ms │     100.11 / 102.94 ±1.70 / 105.16 ms │     no change │
│ QQuery 25 │     139.49 / 141.35 ±1.43 / 143.90 ms │     137.95 / 141.81 ±2.79 / 146.51 ms │     no change │
│ QQuery 26 │      98.88 / 101.65 ±1.99 / 104.34 ms │      98.97 / 103.22 ±2.21 / 104.74 ms │     no change │
│ QQuery 27 │     852.53 / 858.70 ±9.74 / 878.11 ms │     846.79 / 851.11 ±4.26 / 857.73 ms │     no change │
│ QQuery 28 │ 3273.16 / 3306.19 ±16.95 / 3319.21 ms │ 3289.86 / 3315.39 ±20.65 / 3344.13 ms │     no change │
│ QQuery 29 │        50.27 / 54.97 ±4.49 / 62.93 ms │        50.24 / 56.60 ±5.57 / 65.85 ms │     no change │
│ QQuery 30 │     361.99 / 367.45 ±5.71 / 374.86 ms │     354.82 / 363.42 ±7.55 / 376.71 ms │     no change │
│ QQuery 31 │     354.41 / 371.28 ±9.10 / 378.15 ms │    361.59 / 378.76 ±12.41 / 394.38 ms │     no change │
│ QQuery 32 │ 1214.59 / 1260.10 ±34.96 / 1305.41 ms │ 1041.72 / 1056.56 ±15.17 / 1084.81 ms │ +1.19x faster │
│ QQuery 33 │ 1515.52 / 1570.85 ±38.41 / 1634.04 ms │  1469.34 / 1474.38 ±7.24 / 1488.32 ms │ +1.07x faster │
│ QQuery 34 │ 1485.86 / 1532.37 ±26.82 / 1565.04 ms │  1477.09 / 1487.24 ±7.28 / 1496.50 ms │     no change │
│ QQuery 35 │    393.36 / 426.17 ±54.55 / 534.85 ms │     391.43 / 401.95 ±7.89 / 411.93 ms │ +1.06x faster │
│ QQuery 36 │     115.02 / 120.80 ±3.82 / 125.19 ms │     118.07 / 122.65 ±3.64 / 128.83 ms │     no change │
│ QQuery 37 │        49.52 / 51.49 ±1.92 / 55.07 ms │        47.48 / 50.62 ±1.83 / 52.79 ms │     no change │
│ QQuery 38 │        74.07 / 76.49 ±1.25 / 77.52 ms │        75.14 / 77.58 ±1.56 / 79.76 ms │     no change │
│ QQuery 39 │     209.85 / 215.78 ±4.18 / 220.73 ms │     203.18 / 218.92 ±8.79 / 228.06 ms │     no change │
│ QQuery 40 │        24.46 / 25.99 ±1.19 / 27.44 ms │        21.42 / 23.66 ±1.70 / 26.54 ms │ +1.10x faster │
│ QQuery 41 │        20.66 / 22.64 ±2.61 / 27.69 ms │        19.87 / 21.02 ±1.09 / 22.36 ms │ +1.08x faster │
│ QQuery 42 │        19.06 / 19.93 ±0.46 / 20.34 ms │        19.08 / 20.03 ±0.64 / 21.10 ms │     no change │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                               ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                               │ 23002.46ms │
│ Total Time (feat_reorder-row-groups-by-stats)   │ 22498.00ms │
│ Average Time (HEAD)                             │   534.94ms │
│ Average Time (feat_reorder-row-groups-by-stats) │   523.21ms │
│ Queries Faster                                  │          7 │
│ Queries Slower                                  │          1 │
│ Queries with No Change                          │         35 │
│ Queries with Failure                            │          0 │
└─────────────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric	Value
Wall time	115.8s
Peak memory	36.4 GiB
Avg memory	27.0 GiB
CPU user	1079.4s
CPU sys	98.2s
Peak spill	0 B

clickbench_partitioned — branch

Metric	Value
Wall time	113.5s
Peak memory	40.1 GiB
Avg memory	33.2 GiB
CPU user	1075.7s
CPU sys	81.0s
Peak spill	0 B

File an issue against this benchmark runner

Dandandan · 2026-04-13T07:08:52Z

run benchmark clickbench_partitioned clickbench_extended

adriangbot · 2026-04-13T07:09:10Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4234477729-1140-pwvsf 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/reorder-row-groups-by-stats (5018882) to 29c5dd5 (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-04-13T07:10:54Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4234477729-1141-9x5wm 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/reorder-row-groups-by-stats (5018882) to 29c5dd5 (merge-base) diff using: clickbench_extended
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-04-13T07:13:29Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and feat_reorder-row-groups-by-stats
--------------------
Benchmark tpcds_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                     HEAD ┃         feat_reorder-row-groups-by-stats ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │              6.60 / 7.07 ±0.83 / 8.73 ms │              6.60 / 7.06 ±0.83 / 8.71 ms │     no change │
│ QQuery 2  │        143.88 / 144.99 ±1.22 / 147.28 ms │        145.49 / 146.39 ±0.75 / 147.36 ms │     no change │
│ QQuery 3  │        114.20 / 115.49 ±1.27 / 117.17 ms │        113.54 / 114.31 ±0.75 / 115.71 ms │     no change │
│ QQuery 4  │    1407.17 / 1446.02 ±28.01 / 1488.27 ms │    1346.64 / 1364.53 ±15.80 / 1393.25 ms │ +1.06x faster │
│ QQuery 5  │        171.78 / 174.25 ±2.64 / 179.25 ms │        172.57 / 174.59 ±1.43 / 176.25 ms │     no change │
│ QQuery 6  │       849.48 / 877.16 ±22.87 / 905.49 ms │       828.07 / 868.05 ±30.14 / 901.99 ms │     no change │
│ QQuery 7  │        343.32 / 346.59 ±2.93 / 350.49 ms │        341.03 / 345.45 ±3.39 / 351.37 ms │     no change │
│ QQuery 8  │        117.21 / 119.39 ±1.66 / 121.19 ms │        117.29 / 118.25 ±1.00 / 119.81 ms │     no change │
│ QQuery 9  │        101.98 / 105.46 ±2.06 / 107.89 ms │        101.30 / 103.91 ±2.47 / 107.10 ms │     no change │
│ QQuery 10 │        105.17 / 106.91 ±1.15 / 108.76 ms │        104.99 / 106.26 ±0.68 / 106.86 ms │     no change │
│ QQuery 11 │       950.10 / 964.75 ±15.56 / 992.71 ms │       952.07 / 966.13 ±10.18 / 977.82 ms │     no change │
│ QQuery 12 │           49.17 / 50.87 ±1.79 / 53.14 ms │           44.24 / 45.58 ±1.32 / 48.07 ms │ +1.12x faster │
│ QQuery 13 │        400.79 / 408.52 ±5.63 / 417.80 ms │        401.25 / 405.70 ±3.28 / 410.32 ms │     no change │
│ QQuery 14 │     1004.14 / 1009.46 ±3.40 / 1013.29 ms │     1004.37 / 1006.86 ±1.62 / 1008.42 ms │     no change │
│ QQuery 15 │           15.60 / 16.29 ±0.72 / 17.64 ms │           15.34 / 16.91 ±1.14 / 18.64 ms │     no change │
│ QQuery 16 │              7.29 / 7.58 ±0.23 / 7.82 ms │              7.31 / 7.77 ±0.29 / 8.12 ms │     no change │
│ QQuery 17 │        229.09 / 231.19 ±1.64 / 233.94 ms │        227.78 / 229.37 ±1.34 / 231.54 ms │     no change │
│ QQuery 18 │        129.42 / 129.76 ±0.34 / 130.31 ms │        126.89 / 128.72 ±1.18 / 130.47 ms │     no change │
│ QQuery 19 │        154.96 / 157.06 ±1.46 / 158.84 ms │        155.11 / 156.36 ±1.00 / 157.47 ms │     no change │
│ QQuery 20 │           13.40 / 14.08 ±0.44 / 14.76 ms │           13.71 / 14.26 ±0.30 / 14.57 ms │     no change │
│ QQuery 21 │           18.94 / 19.61 ±0.35 / 19.91 ms │           19.53 / 19.89 ±0.31 / 20.34 ms │     no change │
│ QQuery 22 │        486.41 / 490.25 ±2.46 / 492.89 ms │        485.59 / 488.57 ±2.14 / 491.78 ms │     no change │
│ QQuery 23 │        881.75 / 888.47 ±6.90 / 897.22 ms │        874.89 / 884.92 ±8.92 / 901.54 ms │     no change │
│ QQuery 24 │        382.00 / 384.91 ±2.95 / 389.94 ms │        381.40 / 383.88 ±3.14 / 389.86 ms │     no change │
│ QQuery 25 │        340.39 / 342.38 ±1.35 / 343.76 ms │        336.89 / 340.18 ±2.75 / 343.84 ms │     no change │
│ QQuery 26 │           82.02 / 82.93 ±0.75 / 84.22 ms │           81.69 / 83.69 ±2.44 / 87.06 ms │     no change │
│ QQuery 27 │              7.14 / 7.67 ±0.78 / 9.20 ms │              6.75 / 6.99 ±0.29 / 7.51 ms │ +1.10x faster │
│ QQuery 28 │        148.11 / 151.11 ±2.46 / 155.58 ms │        148.59 / 150.08 ±0.99 / 151.50 ms │     no change │
│ QQuery 29 │        280.02 / 283.14 ±1.85 / 285.10 ms │        278.47 / 282.20 ±2.15 / 284.39 ms │     no change │
│ QQuery 30 │           43.46 / 46.42 ±1.97 / 48.60 ms │           43.38 / 45.04 ±1.56 / 47.82 ms │     no change │
│ QQuery 31 │        169.78 / 171.51 ±1.02 / 172.58 ms │        171.26 / 173.61 ±1.70 / 175.62 ms │     no change │
│ QQuery 32 │           56.82 / 58.73 ±1.23 / 60.51 ms │           57.19 / 57.73 ±0.63 / 58.94 ms │     no change │
│ QQuery 33 │        141.79 / 142.90 ±0.89 / 144.49 ms │        140.06 / 142.63 ±2.83 / 147.88 ms │     no change │
│ QQuery 34 │              7.10 / 7.27 ±0.16 / 7.54 ms │             7.31 / 8.11 ±1.00 / 10.04 ms │  1.12x slower │
│ QQuery 35 │        105.24 / 108.18 ±1.55 / 109.74 ms │        113.10 / 114.31 ±1.13 / 115.81 ms │  1.06x slower │
│ QQuery 36 │              6.52 / 6.61 ±0.11 / 6.82 ms │              6.69 / 7.12 ±0.48 / 8.04 ms │  1.08x slower │
│ QQuery 37 │             8.66 / 9.39 ±0.80 / 10.84 ms │             8.66 / 9.51 ±0.66 / 10.70 ms │     no change │
│ QQuery 38 │           86.45 / 88.58 ±2.96 / 94.37 ms │           87.34 / 90.25 ±4.29 / 98.75 ms │     no change │
│ QQuery 39 │        125.56 / 128.65 ±2.68 / 132.66 ms │        126.42 / 130.74 ±3.44 / 136.60 ms │     no change │
│ QQuery 40 │        108.75 / 116.53 ±6.97 / 129.42 ms │        120.88 / 127.63 ±9.32 / 145.94 ms │  1.10x slower │
│ QQuery 41 │           14.34 / 15.28 ±0.58 / 16.07 ms │           14.30 / 15.82 ±1.19 / 17.47 ms │     no change │
│ QQuery 42 │        108.24 / 109.86 ±1.55 / 112.63 ms │        108.34 / 109.85 ±0.93 / 110.82 ms │     no change │
│ QQuery 43 │              6.00 / 6.12 ±0.12 / 6.31 ms │              5.93 / 6.03 ±0.12 / 6.27 ms │     no change │
│ QQuery 44 │           11.93 / 12.85 ±0.98 / 14.29 ms │           11.79 / 12.23 ±0.34 / 12.81 ms │     no change │
│ QQuery 45 │           51.59 / 52.20 ±0.71 / 53.58 ms │           50.50 / 51.48 ±0.80 / 52.59 ms │     no change │
│ QQuery 46 │              8.37 / 8.86 ±0.32 / 9.30 ms │              8.22 / 8.55 ±0.21 / 8.79 ms │     no change │
│ QQuery 47 │        730.15 / 735.98 ±6.90 / 748.40 ms │        705.59 / 712.82 ±4.86 / 720.66 ms │     no change │
│ QQuery 48 │        293.14 / 296.48 ±3.12 / 301.21 ms │        294.01 / 296.74 ±2.30 / 300.54 ms │     no change │
│ QQuery 49 │        250.28 / 253.44 ±3.22 / 259.53 ms │        251.81 / 253.12 ±1.05 / 254.43 ms │     no change │
│ QQuery 50 │        226.01 / 230.32 ±4.01 / 235.24 ms │        220.67 / 223.64 ±2.76 / 228.09 ms │     no change │
│ QQuery 51 │        183.04 / 185.25 ±2.09 / 189.07 ms │        178.31 / 181.98 ±1.95 / 184.09 ms │     no change │
│ QQuery 52 │        107.65 / 110.58 ±3.03 / 116.28 ms │        108.42 / 110.26 ±2.22 / 114.63 ms │     no change │
│ QQuery 53 │        102.87 / 103.59 ±0.90 / 105.24 ms │        103.27 / 104.20 ±1.01 / 106.01 ms │     no change │
│ QQuery 54 │        144.26 / 147.65 ±2.00 / 150.36 ms │        145.75 / 148.22 ±2.27 / 152.02 ms │     no change │
│ QQuery 55 │        107.20 / 108.13 ±0.76 / 109.28 ms │        107.44 / 109.68 ±1.38 / 111.81 ms │     no change │
│ QQuery 56 │        141.05 / 142.32 ±1.01 / 144.15 ms │        140.48 / 142.52 ±1.42 / 144.84 ms │     no change │
│ QQuery 57 │        172.82 / 175.12 ±1.39 / 176.89 ms │        174.64 / 176.19 ±1.47 / 178.51 ms │     no change │
│ QQuery 58 │        286.62 / 296.24 ±6.87 / 305.53 ms │       285.31 / 298.28 ±13.20 / 317.51 ms │     no change │
│ QQuery 59 │        199.23 / 200.95 ±1.69 / 204.20 ms │        195.69 / 199.36 ±3.05 / 203.59 ms │     no change │
│ QQuery 60 │        144.67 / 145.48 ±0.66 / 146.41 ms │        142.34 / 143.44 ±1.31 / 145.79 ms │     no change │
│ QQuery 61 │           12.99 / 13.45 ±0.35 / 13.95 ms │           12.73 / 13.06 ±0.22 / 13.34 ms │     no change │
│ QQuery 62 │       904.73 / 932.43 ±16.55 / 947.84 ms │       901.55 / 934.20 ±25.10 / 966.87 ms │     no change │
│ QQuery 63 │        103.15 / 106.72 ±3.02 / 110.78 ms │        103.85 / 105.22 ±1.02 / 106.83 ms │     no change │
│ QQuery 64 │        683.07 / 685.79 ±2.81 / 690.87 ms │        680.75 / 687.10 ±3.46 / 690.59 ms │     no change │
│ QQuery 65 │        246.22 / 253.56 ±4.22 / 258.12 ms │        252.05 / 256.03 ±3.55 / 262.20 ms │     no change │
│ QQuery 66 │       234.63 / 253.48 ±10.76 / 265.83 ms │        247.60 / 256.44 ±7.16 / 265.72 ms │     no change │
│ QQuery 67 │        307.25 / 316.77 ±5.71 / 323.28 ms │       319.99 / 334.45 ±14.63 / 357.79 ms │  1.06x slower │
│ QQuery 68 │           10.40 / 11.74 ±1.30 / 14.02 ms │            9.81 / 10.88 ±0.79 / 12.24 ms │ +1.08x faster │
│ QQuery 69 │        100.32 / 103.93 ±2.11 / 106.32 ms │        102.81 / 105.31 ±1.32 / 106.40 ms │     no change │
│ QQuery 70 │       342.77 / 354.40 ±11.96 / 373.37 ms │        337.23 / 344.94 ±6.28 / 351.91 ms │     no change │
│ QQuery 71 │        134.41 / 137.03 ±1.43 / 138.73 ms │        136.55 / 137.88 ±1.11 / 139.85 ms │     no change │
│ QQuery 72 │        611.97 / 618.14 ±5.14 / 627.10 ms │       605.50 / 623.82 ±12.23 / 637.66 ms │     no change │
│ QQuery 73 │              7.45 / 8.17 ±0.58 / 9.07 ms │             7.32 / 8.36 ±1.07 / 10.11 ms │     no change │
│ QQuery 74 │        581.34 / 592.24 ±8.46 / 606.84 ms │        574.83 / 587.08 ±9.45 / 597.06 ms │     no change │
│ QQuery 75 │        277.59 / 280.04 ±2.61 / 285.00 ms │        275.81 / 279.40 ±2.65 / 283.37 ms │     no change │
│ QQuery 76 │        131.53 / 133.57 ±1.57 / 136.07 ms │        131.98 / 133.99 ±1.18 / 135.67 ms │     no change │
│ QQuery 77 │        188.69 / 190.76 ±1.26 / 192.15 ms │        189.33 / 190.25 ±0.58 / 191.04 ms │     no change │
│ QQuery 78 │        340.49 / 343.98 ±3.33 / 350.02 ms │        339.79 / 342.76 ±2.82 / 346.34 ms │     no change │
│ QQuery 79 │        233.09 / 234.57 ±1.62 / 237.15 ms │        233.94 / 236.02 ±1.24 / 237.23 ms │     no change │
│ QQuery 80 │        320.55 / 323.94 ±2.76 / 327.30 ms │        321.31 / 326.39 ±2.91 / 329.11 ms │     no change │
│ QQuery 81 │           26.33 / 27.38 ±0.68 / 28.20 ms │           26.48 / 27.22 ±0.62 / 28.21 ms │     no change │
│ QQuery 82 │        197.82 / 199.31 ±2.29 / 203.86 ms │        198.55 / 200.71 ±2.16 / 203.59 ms │     no change │
│ QQuery 83 │           39.37 / 41.36 ±2.24 / 45.22 ms │           38.52 / 39.36 ±1.33 / 42.00 ms │     no change │
│ QQuery 84 │           48.63 / 49.58 ±0.88 / 50.80 ms │           48.77 / 49.40 ±0.39 / 49.92 ms │     no change │
│ QQuery 85 │        147.39 / 148.66 ±1.16 / 150.63 ms │        147.83 / 148.63 ±0.66 / 149.75 ms │     no change │
│ QQuery 86 │           38.52 / 40.01 ±1.14 / 41.54 ms │           39.86 / 40.90 ±0.87 / 42.04 ms │     no change │
│ QQuery 87 │           85.60 / 88.73 ±3.70 / 95.81 ms │           85.60 / 88.35 ±3.32 / 94.88 ms │     no change │
│ QQuery 88 │        100.63 / 101.95 ±0.96 / 103.51 ms │         99.93 / 101.22 ±1.04 / 102.68 ms │     no change │
│ QQuery 89 │        118.81 / 119.79 ±1.26 / 122.07 ms │        118.70 / 119.92 ±1.02 / 121.42 ms │     no change │
│ QQuery 90 │           23.99 / 24.20 ±0.20 / 24.55 ms │           22.99 / 24.11 ±0.67 / 24.90 ms │     no change │
│ QQuery 91 │           61.98 / 64.38 ±1.66 / 66.73 ms │           62.03 / 64.30 ±2.30 / 68.74 ms │     no change │
│ QQuery 92 │           57.67 / 58.07 ±0.31 / 58.44 ms │           57.81 / 59.39 ±1.21 / 61.43 ms │     no change │
│ QQuery 93 │        184.73 / 185.90 ±0.88 / 187.18 ms │        185.38 / 188.28 ±1.90 / 190.69 ms │     no change │
│ QQuery 94 │           61.74 / 62.66 ±0.75 / 63.87 ms │           60.38 / 62.32 ±1.48 / 64.92 ms │     no change │
│ QQuery 95 │        127.91 / 128.82 ±0.55 / 129.40 ms │        127.77 / 128.56 ±0.72 / 129.74 ms │     no change │
│ QQuery 96 │           73.22 / 74.44 ±0.77 / 75.59 ms │           73.32 / 74.75 ±1.10 / 76.65 ms │     no change │
│ QQuery 97 │        125.16 / 126.41 ±0.79 / 127.42 ms │        124.06 / 127.60 ±2.47 / 130.65 ms │     no change │
│ QQuery 98 │        154.18 / 156.03 ±1.73 / 159.27 ms │        153.08 / 156.89 ±2.23 / 159.74 ms │     no change │
│ QQuery 99 │ 10778.40 / 10822.79 ±36.20 / 10879.92 ms │ 10738.74 / 10797.52 ±47.31 / 10877.80 ms │     no change │
└───────────┴──────────────────────────────────────────┴──────────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                               ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                               │ 31720.03ms │
│ Total Time (feat_reorder-row-groups-by-stats)   │ 31590.96ms │
│ Average Time (HEAD)                             │   320.40ms │
│ Average Time (feat_reorder-row-groups-by-stats) │   319.10ms │
│ Queries Faster                                  │          4 │
│ Queries Slower                                  │          5 │
│ Queries with No Change                          │         90 │
│ Queries with Failure                            │          0 │
└─────────────────────────────────────────────────┴────────────┘

Resource Usage

tpcds — base (merge-base)

Metric	Value
Wall time	158.9s
Peak memory	5.5 GiB
Avg memory	4.5 GiB
CPU user	261.6s
CPU sys	17.7s
Peak spill	0 B

tpcds — branch

Metric	Value
Wall time	158.3s
Peak memory	5.5 GiB
Avg memory	4.7 GiB
CPU user	260.4s
CPU sys	17.4s
Peak spill	0 B

File an issue against this benchmark runner

adriangbot · 2026-04-13T07:13:35Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and feat_reorder-row-groups-by-stats
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃      feat_reorder-row-groups-by-stats ┃       Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 0  │          1.19 / 4.47 ±6.41 / 17.29 ms │          1.18 / 4.49 ±6.46 / 17.40 ms │    no change │
│ QQuery 1  │        14.15 / 14.60 ±0.26 / 14.84 ms │        14.15 / 14.56 ±0.22 / 14.82 ms │    no change │
│ QQuery 2  │        44.34 / 44.86 ±0.43 / 45.63 ms │        43.30 / 43.63 ±0.25 / 44.03 ms │    no change │
│ QQuery 3  │        44.54 / 45.82 ±1.10 / 47.71 ms │        43.32 / 44.20 ±1.01 / 46.01 ms │    no change │
│ QQuery 4  │     292.18 / 299.58 ±6.01 / 307.53 ms │     286.50 / 297.56 ±6.69 / 305.63 ms │    no change │
│ QQuery 5  │     347.46 / 350.76 ±2.10 / 353.17 ms │     346.62 / 348.87 ±1.91 / 351.08 ms │    no change │
│ QQuery 6  │          5.72 / 7.22 ±1.67 / 10.45 ms │         5.59 / 10.51 ±5.76 / 21.61 ms │ 1.46x slower │
│ QQuery 7  │        16.96 / 17.07 ±0.12 / 17.27 ms │        16.65 / 16.92 ±0.19 / 17.22 ms │    no change │
│ QQuery 8  │     417.45 / 427.68 ±7.65 / 440.48 ms │     426.14 / 431.89 ±5.38 / 441.00 ms │    no change │
│ QQuery 9  │     677.82 / 684.73 ±7.88 / 698.33 ms │    648.52 / 655.74 ±10.01 / 675.29 ms │    no change │
│ QQuery 10 │        94.83 / 95.71 ±0.80 / 97.09 ms │        90.54 / 93.26 ±2.43 / 97.71 ms │    no change │
│ QQuery 11 │     107.39 / 107.95 ±0.67 / 108.92 ms │     104.16 / 105.15 ±0.72 / 106.34 ms │    no change │
│ QQuery 12 │     349.32 / 356.12 ±4.19 / 361.45 ms │     338.19 / 342.31 ±2.21 / 344.83 ms │    no change │
│ QQuery 13 │    452.34 / 466.84 ±13.61 / 486.92 ms │    441.75 / 464.03 ±17.71 / 491.08 ms │    no change │
│ QQuery 14 │     348.39 / 351.36 ±3.27 / 356.18 ms │     348.22 / 351.57 ±1.94 / 353.55 ms │    no change │
│ QQuery 15 │    357.19 / 373.85 ±17.96 / 408.62 ms │     362.86 / 368.16 ±5.64 / 376.07 ms │    no change │
│ QQuery 16 │     714.87 / 726.32 ±6.98 / 736.12 ms │    741.65 / 757.99 ±13.25 / 781.76 ms │    no change │
│ QQuery 17 │    716.44 / 748.86 ±25.39 / 773.58 ms │     721.88 / 729.91 ±6.46 / 738.41 ms │    no change │
│ QQuery 18 │ 1373.87 / 1427.78 ±45.55 / 1482.21 ms │ 1434.15 / 1503.93 ±35.02 / 1525.61 ms │ 1.05x slower │
│ QQuery 19 │        35.59 / 36.41 ±0.62 / 37.03 ms │        36.31 / 38.22 ±1.99 / 41.73 ms │    no change │
│ QQuery 20 │    713.38 / 725.99 ±13.06 / 742.01 ms │    716.03 / 731.70 ±15.57 / 761.50 ms │    no change │
│ QQuery 21 │     765.34 / 768.87 ±3.09 / 772.97 ms │     762.06 / 764.19 ±1.70 / 767.11 ms │    no change │
│ QQuery 22 │  1134.09 / 1142.01 ±5.27 / 1147.72 ms │  1132.10 / 1137.95 ±4.16 / 1143.66 ms │    no change │
│ QQuery 23 │ 3094.85 / 3120.29 ±14.36 / 3137.68 ms │ 3077.09 / 3115.46 ±20.88 / 3134.31 ms │    no change │
│ QQuery 24 │     100.75 / 103.14 ±1.95 / 106.05 ms │     100.97 / 103.96 ±2.97 / 108.54 ms │    no change │
│ QQuery 25 │     139.98 / 141.47 ±1.44 / 144.06 ms │     138.30 / 140.65 ±1.57 / 142.80 ms │    no change │
│ QQuery 26 │     101.23 / 102.52 ±0.77 / 103.55 ms │     102.40 / 104.22 ±1.40 / 105.89 ms │    no change │
│ QQuery 27 │     855.42 / 859.58 ±5.27 / 869.85 ms │     855.77 / 861.08 ±5.43 / 869.79 ms │    no change │
│ QQuery 28 │ 3284.27 / 3308.55 ±14.18 / 3325.76 ms │ 3289.54 / 3316.91 ±14.83 / 3330.08 ms │    no change │
│ QQuery 29 │        50.39 / 55.78 ±5.11 / 62.80 ms │        51.97 / 56.29 ±4.39 / 63.23 ms │    no change │
│ QQuery 30 │     357.77 / 370.56 ±7.03 / 377.46 ms │     362.16 / 368.32 ±5.55 / 378.64 ms │    no change │
│ QQuery 31 │    363.55 / 385.00 ±12.49 / 398.04 ms │     398.54 / 401.59 ±2.74 / 405.02 ms │    no change │
│ QQuery 32 │ 1034.15 / 1059.31 ±22.35 / 1100.09 ms │ 1173.83 / 1288.89 ±81.25 / 1419.40 ms │ 1.22x slower │
│ QQuery 33 │ 1472.92 / 1487.84 ±11.01 / 1499.14 ms │ 1466.40 / 1513.38 ±43.97 / 1593.67 ms │    no change │
│ QQuery 34 │ 1464.67 / 1499.90 ±31.40 / 1548.80 ms │ 1475.45 / 1491.40 ±14.74 / 1517.30 ms │    no change │
│ QQuery 35 │     390.93 / 396.99 ±5.12 / 404.97 ms │     392.12 / 396.84 ±3.54 / 401.88 ms │    no change │
│ QQuery 36 │     120.42 / 122.98 ±1.62 / 125.38 ms │     119.20 / 123.01 ±3.25 / 127.60 ms │    no change │
│ QQuery 37 │        49.66 / 50.72 ±1.27 / 53.16 ms │        47.35 / 50.08 ±1.55 / 51.79 ms │    no change │
│ QQuery 38 │        76.35 / 78.01 ±1.50 / 80.66 ms │        76.64 / 77.90 ±0.90 / 78.73 ms │    no change │
│ QQuery 39 │     208.12 / 219.98 ±6.84 / 229.12 ms │     220.84 / 223.45 ±1.88 / 225.94 ms │    no change │
│ QQuery 40 │        24.82 / 25.18 ±0.37 / 25.85 ms │        24.34 / 26.23 ±2.29 / 30.09 ms │    no change │
│ QQuery 41 │        20.47 / 21.79 ±1.17 / 23.54 ms │        20.58 / 21.45 ±0.94 / 23.06 ms │    no change │
│ QQuery 42 │        19.76 / 20.16 ±0.31 / 20.63 ms │        19.68 / 20.30 ±0.47 / 21.02 ms │    no change │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                               ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                               │ 22654.62ms │
│ Total Time (feat_reorder-row-groups-by-stats)   │ 22958.16ms │
│ Average Time (HEAD)                             │   526.85ms │
│ Average Time (feat_reorder-row-groups-by-stats) │   533.91ms │
│ Queries Faster                                  │          0 │
│ Queries Slower                                  │          3 │
│ Queries with No Change                          │         40 │
│ Queries with Failure                            │          0 │
└─────────────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric	Value
Wall time	114.5s
Peak memory	42.0 GiB
Avg memory	32.4 GiB
CPU user	1080.8s
CPU sys	84.9s
Peak spill	0 B

clickbench_partitioned — branch

Metric	Value
Wall time	115.9s
Peak memory	37.7 GiB
Avg memory	28.3 GiB
CPU user	1081.1s
CPU sys	96.2s
Peak spill	0 B

File an issue against this benchmark runner

adriangbot · 2026-04-13T07:23:27Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and feat_reorder-row-groups-by-stats
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃      feat_reorder-row-groups-by-stats ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │          1.34 / 4.79 ±6.61 / 18.02 ms │          1.22 / 4.58 ±6.53 / 17.64 ms │     no change │
│ QQuery 1  │        15.01 / 15.50 ±0.43 / 16.28 ms │        14.27 / 14.85 ±0.31 / 15.10 ms │     no change │
│ QQuery 2  │        45.69 / 46.01 ±0.28 / 46.38 ms │        44.25 / 44.67 ±0.33 / 45.07 ms │     no change │
│ QQuery 3  │        45.23 / 49.08 ±3.08 / 53.24 ms │        44.46 / 47.22 ±1.51 / 48.61 ms │     no change │
│ QQuery 4  │    307.65 / 329.18 ±15.17 / 353.16 ms │    333.50 / 349.92 ±10.12 / 363.58 ms │  1.06x slower │
│ QQuery 5  │     383.18 / 390.58 ±5.85 / 398.30 ms │    375.25 / 389.29 ±13.55 / 414.94 ms │     no change │
│ QQuery 6  │          5.21 / 7.51 ±2.87 / 13.10 ms │           5.67 / 7.06 ±1.14 / 8.47 ms │ +1.06x faster │
│ QQuery 7  │        17.58 / 18.20 ±0.57 / 19.20 ms │        17.81 / 21.41 ±6.39 / 34.18 ms │  1.18x slower │
│ QQuery 8  │     467.34 / 477.25 ±9.77 / 495.30 ms │    473.21 / 490.70 ±19.41 / 524.50 ms │     no change │
│ QQuery 9  │    699.39 / 729.91 ±22.29 / 765.71 ms │    745.55 / 764.99 ±14.67 / 789.00 ms │     no change │
│ QQuery 10 │      99.31 / 102.80 ±4.68 / 111.76 ms │       95.14 / 99.07 ±3.46 / 104.88 ms │     no change │
│ QQuery 11 │     107.63 / 109.49 ±1.08 / 110.50 ms │     110.95 / 113.80 ±2.44 / 117.91 ms │     no change │
│ QQuery 12 │     389.20 / 393.99 ±3.59 / 399.55 ms │    378.13 / 397.72 ±15.36 / 416.18 ms │     no change │
│ QQuery 13 │    497.82 / 519.79 ±18.11 / 553.00 ms │    507.72 / 534.68 ±18.52 / 564.44 ms │     no change │
│ QQuery 14 │    356.26 / 382.90 ±14.21 / 394.37 ms │     378.78 / 390.21 ±8.26 / 404.19 ms │     no change │
│ QQuery 15 │    397.89 / 421.76 ±20.69 / 455.22 ms │    406.54 / 430.62 ±31.51 / 491.99 ms │     no change │
│ QQuery 16 │    817.79 / 843.15 ±20.23 / 870.39 ms │    795.58 / 834.18 ±21.31 / 858.28 ms │     no change │
│ QQuery 17 │    769.11 / 793.23 ±12.93 / 806.04 ms │    790.19 / 823.71 ±33.40 / 886.71 ms │     no change │
│ QQuery 18 │ 1592.82 / 1638.07 ±31.84 / 1675.50 ms │ 1536.04 / 1625.95 ±49.00 / 1673.65 ms │     no change │
│ QQuery 19 │        36.17 / 38.43 ±2.73 / 41.97 ms │       39.25 / 52.79 ±14.25 / 76.08 ms │  1.37x slower │
│ QQuery 20 │    742.56 / 763.72 ±21.08 / 796.51 ms │    747.36 / 771.03 ±35.37 / 841.37 ms │     no change │
│ QQuery 21 │     787.42 / 799.07 ±8.52 / 810.66 ms │     794.97 / 798.60 ±2.97 / 803.45 ms │     no change │
│ QQuery 22 │  1173.63 / 1184.40 ±7.48 / 1192.57 ms │  1187.50 / 1195.28 ±6.21 / 1202.63 ms │     no change │
│ QQuery 23 │ 3281.57 / 3306.51 ±21.44 / 3343.54 ms │ 3275.73 / 3301.49 ±20.45 / 3332.33 ms │     no change │
│ QQuery 24 │     108.95 / 111.32 ±1.91 / 114.39 ms │     107.23 / 110.08 ±3.29 / 116.30 ms │     no change │
│ QQuery 25 │     144.33 / 146.42 ±1.45 / 148.05 ms │     143.15 / 145.40 ±1.34 / 146.55 ms │     no change │
│ QQuery 26 │     107.01 / 108.68 ±1.66 / 111.45 ms │     105.69 / 108.26 ±1.82 / 110.40 ms │     no change │
│ QQuery 27 │     883.03 / 891.30 ±4.95 / 898.28 ms │    874.98 / 887.56 ±12.77 / 911.22 ms │     no change │
│ QQuery 28 │ 3386.14 / 3425.51 ±28.74 / 3464.68 ms │ 3398.98 / 3422.81 ±12.65 / 3436.83 ms │     no change │
│ QQuery 29 │        53.27 / 58.56 ±6.00 / 69.01 ms │        52.73 / 57.20 ±4.79 / 64.77 ms │     no change │
│ QQuery 30 │     405.31 / 409.26 ±4.00 / 416.32 ms │     393.87 / 407.38 ±7.40 / 414.80 ms │     no change │
│ QQuery 31 │    383.84 / 403.31 ±16.85 / 432.32 ms │    397.47 / 427.43 ±17.62 / 452.58 ms │  1.06x slower │
│ QQuery 32 │ 1072.71 / 1165.40 ±47.01 / 1203.37 ms │ 1232.95 / 1400.42 ±92.57 / 1512.21 ms │  1.20x slower │
│ QQuery 33 │ 1640.38 / 1658.01 ±11.14 / 1675.05 ms │ 1633.63 / 1658.56 ±17.62 / 1686.57 ms │     no change │
│ QQuery 34 │ 1681.98 / 1711.27 ±17.86 / 1732.02 ms │ 1664.80 / 1679.12 ±11.31 / 1694.41 ms │     no change │
│ QQuery 35 │    454.45 / 479.47 ±16.71 / 500.66 ms │    462.20 / 481.40 ±13.83 / 502.10 ms │     no change │
│ QQuery 36 │     122.21 / 128.91 ±3.68 / 132.42 ms │     124.76 / 133.75 ±5.72 / 142.63 ms │     no change │
│ QQuery 37 │        52.60 / 56.72 ±3.07 / 61.58 ms │        52.08 / 54.36 ±1.53 / 56.64 ms │     no change │
│ QQuery 38 │        77.39 / 80.30 ±1.75 / 82.02 ms │        79.32 / 81.94 ±2.16 / 84.78 ms │     no change │
│ QQuery 39 │     242.23 / 248.31 ±4.99 / 255.10 ms │     246.21 / 254.21 ±7.55 / 264.73 ms │     no change │
│ QQuery 40 │        28.18 / 30.63 ±1.34 / 32.00 ms │        24.84 / 27.65 ±1.91 / 29.46 ms │ +1.11x faster │
│ QQuery 41 │        22.58 / 23.91 ±0.89 / 25.28 ms │        22.53 / 23.86 ±1.25 / 26.03 ms │     no change │
│ QQuery 42 │        21.19 / 22.21 ±1.14 / 24.33 ms │        21.33 / 22.81 ±1.23 / 24.84 ms │     no change │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                               ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                               │ 24524.79ms │
│ Total Time (feat_reorder-row-groups-by-stats)   │ 24888.03ms │
│ Average Time (HEAD)                             │   570.34ms │
│ Average Time (feat_reorder-row-groups-by-stats) │   578.79ms │
│ Queries Faster                                  │          2 │
│ Queries Slower                                  │          5 │
│ Queries with No Change                          │         36 │
│ Queries with Failure                            │          0 │
└─────────────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric	Value
Wall time	123.9s
Peak memory	42.3 GiB
Avg memory	31.0 GiB
CPU user	1165.0s
CPU sys	98.5s
Peak spill	0 B

clickbench_partitioned — branch

Metric	Value
Wall time	125.7s
Peak memory	40.7 GiB
Avg memory	29.1 GiB
CPU user	1166.3s
CPU sys	111.6s
Peak spill	0 B

File an issue against this benchmark runner

Dandandan · 2026-04-13T07:24:14Z

I wonder if the ordering should be done before the files / row groups are assigned to partitions? So they are more globally sorted instead of just per partition? It seems now they are sorted within each partition, which should help, but perhaps not nearly as much as it would be if all the partitions contain the optimal row groups?

This would also help in the case of #21581

adriangbot · 2026-04-13T07:28:28Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and feat_reorder-row-groups-by-stats
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Query     ┃                                   HEAD ┃        feat_reorder-row-groups-by-stats ┃    Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ QQuery 0  │     806.64 / 824.04 ±15.06 / 844.04 ms │       818.76 / 831.86 ±9.65 / 847.07 ms │ no change │
│ QQuery 1  │      207.91 / 208.37 ±0.34 / 208.85 ms │       208.05 / 209.42 ±1.18 / 211.29 ms │ no change │
│ QQuery 2  │      493.00 / 495.57 ±2.04 / 499.15 ms │       501.52 / 504.02 ±1.65 / 505.95 ms │ no change │
│ QQuery 3  │      313.03 / 314.64 ±0.96 / 315.57 ms │       313.38 / 315.81 ±1.51 / 317.65 ms │ no change │
│ QQuery 4  │     656.64 / 674.45 ±10.93 / 686.40 ms │       663.78 / 674.03 ±8.68 / 688.82 ms │ no change │
│ QQuery 5  │ 9437.73 / 9707.73 ±166.88 / 9887.44 ms │ 9679.36 / 9939.30 ±174.56 / 10160.05 ms │ no change │
│ QQuery 6  │  1002.26 / 1011.57 ±14.99 / 1041.49 ms │     997.60 / 1006.50 ±9.67 / 1023.43 ms │ no change │
│ QQuery 7  │     773.67 / 806.98 ±35.77 / 873.62 ms │       778.19 / 786.06 ±5.20 / 792.91 ms │ no change │
│ QQuery 8  │      397.92 / 404.38 ±5.08 / 412.20 ms │       398.58 / 404.24 ±5.67 / 415.04 ms │ no change │
│ QQuery 9  │  2807.44 / 2826.33 ±16.14 / 2853.16 ms │   2754.46 / 2797.70 ±24.98 / 2824.10 ms │ no change │
│ QQuery 10 │      633.75 / 639.16 ±5.96 / 648.49 ms │      631.36 / 642.65 ±13.99 / 670.06 ms │ no change │
│ QQuery 11 │  2047.27 / 2070.44 ±19.89 / 2101.14 ms │   2049.92 / 2079.78 ±21.09 / 2115.19 ms │ no change │
│ QQuery 12 │      200.39 / 202.67 ±2.01 / 205.97 ms │       194.24 / 202.01 ±6.44 / 213.63 ms │ no change │
└───────────┴────────────────────────────────────────┴─────────────────────────────────────────┴───────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                               ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                               │ 20186.32ms │
│ Total Time (feat_reorder-row-groups-by-stats)   │ 20393.39ms │
│ Average Time (HEAD)                             │  1552.79ms │
│ Average Time (feat_reorder-row-groups-by-stats) │  1568.72ms │
│ Queries Faster                                  │          0 │
│ Queries Slower                                  │          0 │
│ Queries with No Change                          │         13 │
│ Queries with Failure                            │          0 │
└─────────────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_extended — base (merge-base)

Metric	Value
Wall time	101.8s
Peak memory	32.6 GiB
Avg memory	27.4 GiB
CPU user	981.3s
CPU sys	48.2s
Peak spill	0 B

clickbench_extended — branch

Metric	Value
Wall time	102.8s
Peak memory	34.1 GiB
Avg memory	29.7 GiB
CPU user	986.9s
CPU sys	46.1s
Peak spill	0 B

File an issue against this benchmark runner

zhuqi-lucas · 2026-04-13T07:33:43Z

I wonder if the ordering should be done before the files / row groups are assigned to partitions? So they are more globally sorted instead of just per partition? It seems now they are sorted within each partition, which should help, but perhaps not nearly as much as it would be if all the partitions contain the optimal row groups?

This would also help in the case of #21581

Great point @Dandandan — you're right that global reorder is much more effective than per-partition reorder. With global reorder + round-robin distribution, each partition's first RG is close to the global optimum, so:

All partitions quickly converge to tight local TopK thresholds in parallel
SPM merging finishes faster because each partition's first few batches contain optimal values → LIMIT can be satisfied with minimal reads across partitions

The current per-partition reorder is limited because even after sorting, partition 0's "best" RG might be much worse than the global best (which may have landed in partition 2).

Moving to global reorder would require changes at the planning / EnforceDistribution layer to load RG statistics and redistribute RGs across partitions. I'd prefer to keep this PR as an incremental step (per-partition) and track global reorder as a follow-up — it would benefit both #21317 and #21581.

Does this make sense?

Dandandan · 2026-04-13T07:40:29Z

I wonder if the ordering should be done before the files / row groups are assigned to partitions? So they are more globally sorted instead of just per partition? It seems now they are sorted within each partition, which should help, but perhaps not nearly as much as it would be if all the partitions contain the optimal row groups?
This would also help in the case of #21581

Great point @Dandandan — you're right that global reorder is much more effective than per-partition reorder. With global reorder + round-robin distribution, each partition's first RG is close to the global optimum, so:

All partitions quickly converge to tight local TopK thresholds in parallel

SPM merging finishes faster because each partition's first few batches contain optimal values → LIMIT can be satisfied with minimal reads across partitions

The current per-partition reorder is limited because even after sorting, partition 0's "best" RG might be much worse than the global best (which may have landed in partition 2).

Moving to global reorder would require changes at the planning / EnforceDistribution layer to load RG statistics and redistribute RGs across partitions. I'd prefer to keep this PR as an incremental step (per-partition) and track global reorder as a follow-up — it would benefit both #21317 and #21581.

Does this make sense?

Sure, makes sense.

Bring back stats init with all issues fixed: - GtEq/LtEq instead of Gt/Lt (include boundary values) - Use df.fetch() as limit (TopK K value, not scan limit) When K > single RG rows, stats init skips → cumulative prune handles it - Cast threshold to column data type (parquet vs table schema mismatch) - Null-aware filter for NULLS FIRST - Generation check prevents overwrite by later partitions - Restricted to sort pushdown + pure DynamicFilter (no WHERE) Stats init and cumulative prune are complementary: - Stats init: updates PruningPredicate → prunes at RG statistics level - Cumulative prune: truncates after reorder+reverse → prunes by row count Both work together without conflict when using df.fetch().

zhuqi-lucas · 2026-04-21T12:37:02Z

run benchmark sort_pushdown_inexact

adriangbot · 2026-04-21T12:39:44Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4288570205-1674-fm6pm 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/reorder-row-groups-by-stats (839ab5a) to 466c3ea (merge-base) diff using: sort_pushdown_inexact
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-04-21T12:51:11Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and feat_reorder-row-groups-by-stats
--------------------
Benchmark sort_pushdown_inexact.json
--------------------
┏━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query ┃                           HEAD ┃ feat_reorder-row-groups-by-stats ┃        Change ┃
┡━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ Q1    │    7.03 / 8.24 ±0.88 / 9.60 ms │      6.50 / 7.22 ±0.75 / 8.60 ms │ +1.14x faster │
│ Q2    │    6.80 / 7.03 ±0.36 / 7.75 ms │      6.68 / 7.23 ±0.77 / 8.74 ms │     no change │
│ Q3    │ 22.18 / 22.45 ±0.27 / 22.82 ms │   21.85 / 22.38 ±0.40 / 23.04 ms │     no change │
│ Q4    │ 20.18 / 21.01 ±0.82 / 22.33 ms │   20.13 / 21.15 ±0.70 / 21.74 ms │     no change │
└───────┴────────────────────────────────┴──────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┓
┃ Benchmark Summary                               ┃         ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━┩
│ Total Time (HEAD)                               │ 58.74ms │
│ Total Time (feat_reorder-row-groups-by-stats)   │ 57.98ms │
│ Average Time (HEAD)                             │ 14.68ms │
│ Average Time (feat_reorder-row-groups-by-stats) │ 14.49ms │
│ Queries Faster                                  │       1 │
│ Queries Slower                                  │       0 │
│ Queries with No Change                          │       3 │
│ Queries with Failure                            │       0 │
└─────────────────────────────────────────────────┴─────────┘

Resource Usage

sort_pushdown_inexact — base (merge-base)

Metric	Value
Wall time	5.0s
Peak memory	4.4 GiB
Avg memory	4.4 GiB
CPU user	2.5s
CPU sys	0.4s
Peak spill	0 B

sort_pushdown_inexact — branch

Metric	Value
Wall time	5.0s
Peak memory	4.4 GiB
Avg memory	4.4 GiB
CPU user	2.5s
CPU sys	0.3s
Peak spill	0 B

File an issue against this benchmark runner

create_filter() was called before new_sort.fetch was set, so DynamicFilterPhysicalExpr.fetch was always 0 (or None from old self). Fix by setting fetch before creating the filter. This was the root cause of stats init and cumulative prune not triggering on CI — fetch=0 meant "no rows needed" → skip.

zhuqi-lucas · 2026-04-21T15:49:09Z

run benchmark sort_pushdown_inexact

adriangbot · 2026-04-21T15:51:44Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4289923585-1686-zszkb 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/reorder-row-groups-by-stats (a269ffd) to 466c3ea (merge-base) diff using: sort_pushdown_inexact
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-04-21T16:04:24Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and feat_reorder-row-groups-by-stats
--------------------
Benchmark sort_pushdown_inexact.json
--------------------
┏━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query ┃                           HEAD ┃ feat_reorder-row-groups-by-stats ┃        Change ┃
┡━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ Q1    │    7.14 / 7.68 ±0.85 / 9.34 ms │      2.77 / 3.19 ±0.54 / 4.26 ms │ +2.41x faster │
│ Q2    │    6.75 / 6.99 ±0.27 / 7.47 ms │      2.94 / 3.12 ±0.22 / 3.54 ms │ +2.24x faster │
│ Q3    │ 21.26 / 22.14 ±0.49 / 22.68 ms │      6.67 / 6.87 ±0.16 / 7.11 ms │ +3.22x faster │
│ Q4    │ 19.84 / 21.47 ±0.88 / 22.35 ms │      6.98 / 7.05 ±0.05 / 7.10 ms │ +3.05x faster │
└───────┴────────────────────────────────┴──────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┓
┃ Benchmark Summary                               ┃         ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━┩
│ Total Time (HEAD)                               │ 58.28ms │
│ Total Time (feat_reorder-row-groups-by-stats)   │ 20.24ms │
│ Average Time (HEAD)                             │ 14.57ms │
│ Average Time (feat_reorder-row-groups-by-stats) │  5.06ms │
│ Queries Faster                                  │       4 │
│ Queries Slower                                  │       0 │
│ Queries with No Change                          │       0 │
│ Queries with Failure                            │       0 │
└─────────────────────────────────────────────────┴─────────┘

Resource Usage

sort_pushdown_inexact — base (merge-base)

Metric	Value
Wall time	5.0s
Peak memory	4.4 GiB
Avg memory	4.4 GiB
CPU user	2.5s
CPU sys	0.4s
Peak spill	0 B

sort_pushdown_inexact — branch

Metric	Value
Wall time	5.0s
Peak memory	4.4 GiB
Avg memory	4.4 GiB
CPU user	0.8s
CPU sys	0.2s
Peak spill	0 B

File an issue against this benchmark runner

Dandandan · 2026-04-21T19:41:25Z

run benchmark clickbench_1

adriangbot · 2026-04-21T19:51:04Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4291304401-1707-sqn9h 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/reorder-row-groups-by-stats (a269ffd) to 466c3ea (merge-base) diff using: clickbench_1
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-04-21T20:11:50Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and feat_reorder-row-groups-by-stats
--------------------
Benchmark clickbench_1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                   HEAD ┃       feat_reorder-row-groups-by-stats ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │            0.72 / 1.19 ±0.83 / 2.85 ms │            0.71 / 1.05 ±0.64 / 2.32 ms │ +1.13x faster │
│ QQuery 1  │         14.70 / 14.80 ±0.07 / 14.89 ms │         14.36 / 14.57 ±0.12 / 14.71 ms │     no change │
│ QQuery 2  │         43.06 / 43.26 ±0.18 / 43.59 ms │         42.55 / 42.80 ±0.22 / 43.06 ms │     no change │
│ QQuery 3  │         39.02 / 39.97 ±1.09 / 42.08 ms │         37.69 / 38.07 ±0.33 / 38.67 ms │     no change │
│ QQuery 4  │     269.60 / 285.23 ±10.89 / 303.20 ms │      268.19 / 276.07 ±5.62 / 284.28 ms │     no change │
│ QQuery 5  │     452.84 / 464.41 ±12.40 / 480.58 ms │     437.42 / 480.24 ±34.93 / 526.09 ms │     no change │
│ QQuery 6  │            6.39 / 6.75 ±0.30 / 7.27 ms │            6.91 / 7.21 ±0.21 / 7.50 ms │  1.07x slower │
│ QQuery 7  │         16.86 / 17.16 ±0.16 / 17.30 ms │         17.58 / 17.88 ±0.49 / 18.86 ms │     no change │
│ QQuery 8  │      379.69 / 391.13 ±7.75 / 402.33 ms │     434.77 / 448.19 ±10.33 / 464.14 ms │  1.15x slower │
│ QQuery 9  │     620.60 / 667.83 ±37.22 / 717.37 ms │      636.17 / 639.43 ±2.70 / 644.30 ms │     no change │
│ QQuery 10 │       98.79 / 100.66 ±3.08 / 106.80 ms │        94.96 / 97.27 ±4.26 / 105.78 ms │     no change │
│ QQuery 11 │      111.62 / 112.83 ±0.83 / 113.72 ms │      106.31 / 107.75 ±1.25 / 109.26 ms │     no change │
│ QQuery 12 │     457.64 / 499.44 ±30.51 / 537.14 ms │      457.67 / 463.95 ±7.64 / 478.90 ms │ +1.08x faster │
│ QQuery 13 │      506.16 / 518.33 ±7.09 / 526.32 ms │     499.82 / 518.72 ±10.83 / 532.53 ms │     no change │
│ QQuery 14 │      451.26 / 462.14 ±8.04 / 470.70 ms │     456.63 / 477.29 ±29.22 / 534.19 ms │     no change │
│ QQuery 15 │      336.44 / 340.16 ±3.31 / 344.80 ms │      397.02 / 401.58 ±5.05 / 411.32 ms │  1.18x slower │
│ QQuery 16 │     712.39 / 763.78 ±45.96 / 844.28 ms │     719.00 / 756.44 ±29.24 / 785.67 ms │     no change │
│ QQuery 17 │     733.18 / 766.64 ±18.22 / 787.43 ms │      693.23 / 699.22 ±5.01 / 707.65 ms │ +1.10x faster │
│ QQuery 18 │  1414.07 / 1447.65 ±22.54 / 1483.99 ms │  1425.75 / 1442.05 ±15.35 / 1468.64 ms │     no change │
│ QQuery 19 │       42.02 / 58.90 ±24.48 / 107.30 ms │       38.69 / 63.31 ±38.30 / 139.62 ms │  1.07x slower │
│ QQuery 20 │     620.94 / 638.97 ±28.41 / 695.28 ms │     626.44 / 641.11 ±25.82 / 692.71 ms │     no change │
│ QQuery 21 │      704.25 / 708.46 ±3.42 / 713.89 ms │      721.73 / 726.45 ±2.56 / 729.15 ms │     no change │
│ QQuery 22 │  1361.67 / 1394.23 ±20.70 / 1415.92 ms │  1363.18 / 1381.90 ±17.22 / 1412.23 ms │     no change │
│ QQuery 23 │  3812.19 / 3871.88 ±75.73 / 4018.67 ms │ 3747.74 / 3898.17 ±106.95 / 4053.49 ms │     no change │
│ QQuery 24 │      226.55 / 230.17 ±5.30 / 240.67 ms │      220.38 / 226.97 ±6.00 / 238.05 ms │     no change │
│ QQuery 25 │      190.34 / 192.41 ±1.95 / 195.71 ms │      185.99 / 188.88 ±2.40 / 191.84 ms │     no change │
│ QQuery 26 │      230.69 / 234.19 ±3.49 / 240.21 ms │      215.96 / 219.52 ±2.66 / 223.24 ms │ +1.07x faster │
│ QQuery 27 │     745.57 / 757.23 ±12.75 / 777.76 ms │      743.91 / 747.93 ±3.07 / 752.21 ms │     no change │
│ QQuery 28 │  3397.19 / 3475.39 ±75.18 / 3566.61 ms │  3439.06 / 3512.39 ±62.16 / 3618.52 ms │     no change │
│ QQuery 29 │         47.87 / 53.31 ±5.31 / 62.79 ms │       50.04 / 87.40 ±40.42 / 152.62 ms │  1.64x slower │
│ QQuery 30 │      437.04 / 452.96 ±9.82 / 467.78 ms │      454.30 / 462.68 ±8.36 / 475.58 ms │     no change │
│ QQuery 31 │      428.42 / 435.36 ±6.73 / 446.53 ms │     423.80 / 436.07 ±14.39 / 464.27 ms │     no change │
│ QQuery 32 │  1037.26 / 1059.54 ±20.26 / 1096.61 ms │  1195.72 / 1310.89 ±62.32 / 1383.94 ms │  1.24x slower │
│ QQuery 33 │  1544.01 / 1625.27 ±49.12 / 1676.99 ms │ 1554.73 / 1720.12 ±124.52 / 1863.63 ms │  1.06x slower │
│ QQuery 34 │ 1521.64 / 1615.37 ±109.34 / 1784.34 ms │  1551.66 / 1608.87 ±72.59 / 1746.03 ms │     no change │
│ QQuery 35 │      440.94 / 447.20 ±6.13 / 456.74 ms │     364.32 / 379.31 ±14.37 / 400.46 ms │ +1.18x faster │
│ QQuery 36 │      126.09 / 134.24 ±4.33 / 138.71 ms │      132.94 / 134.55 ±1.19 / 136.59 ms │     no change │
│ QQuery 37 │         59.26 / 61.90 ±2.25 / 64.91 ms │         59.61 / 61.24 ±2.06 / 65.30 ms │     no change │
│ QQuery 38 │         84.97 / 87.72 ±1.80 / 89.88 ms │         89.58 / 90.45 ±0.45 / 90.87 ms │     no change │
│ QQuery 39 │      225.99 / 241.95 ±9.88 / 255.59 ms │      247.68 / 255.39 ±6.31 / 262.91 ms │  1.06x slower │
│ QQuery 40 │         22.20 / 24.41 ±2.59 / 29.33 ms │         27.80 / 30.14 ±1.39 / 31.95 ms │  1.23x slower │
│ QQuery 41 │         20.15 / 20.63 ±0.47 / 21.48 ms │         22.06 / 23.36 ±1.47 / 26.12 ms │  1.13x slower │
│ QQuery 42 │         20.04 / 20.39 ±0.22 / 20.71 ms │         21.12 / 21.65 ±0.36 / 22.24 ms │  1.06x slower │
└───────────┴────────────────────────────────────────┴────────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                               ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                               │ 24785.41ms │
│ Total Time (feat_reorder-row-groups-by-stats)   │ 25158.54ms │
│ Average Time (HEAD)                             │   576.40ms │
│ Average Time (feat_reorder-row-groups-by-stats) │   585.08ms │
│ Queries Faster                                  │          5 │
│ Queries Slower                                  │         11 │
│ Queries with No Change                          │         27 │
│ Queries with Failure                            │          0 │
└─────────────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_1 — base (merge-base)

Metric	Value
Wall time	125.0s
Peak memory	37.6 GiB
Avg memory	27.9 GiB
CPU user	1163.7s
CPU sys	94.4s
Peak spill	0 B

clickbench_1 — branch

Metric	Value
Wall time	130.0s
Peak memory	36.1 GiB
Avg memory	25.8 GiB
CPU user	1163.3s
CPU sys	111.2s
Peak spill	0 B

File an issue against this benchmark runner

For GROUP BY + ORDER BY queries, the TopK sort column is an aggregate output (e.g. COUNT(*)) that doesn't exist in the parquet file schema. Previously we still created ReorderByStatistics which tried to look up the column in statistics — wasted work. Now check column existence in file schema before creating the optimizer. This eliminates overhead for non-scan-level TopK queries (ClickBench Q40-Q42 regression fix).

…pache#21711) ## Which issue does this PR close? Related to apache#21580 ## Rationale for this change The sort pushdown benchmark had two problems: 1. **Broken data generation**: The single-file ORDER BY approach caused the parquet writer to merge rows from adjacent chunks at RG boundaries, widening RG ranges to ~6M. The per-file split fix gave each file only 1 RG, so `reorder_by_statistics` (intra-file optimization) had nothing to reorder. 2. **Missing DESC LIMIT queries**: The `sort_pushdown` benchmark only had ASC queries (sort elimination). No queries tested the reverse scan + TopK path (Inexact sort pushdown), which is where RG reorder, stats init, and cumulative pruning provide 20-58x improvement. ## What changes are included in this PR? ### 1. Fix benchmark data generation Generate **multiple files with multiple scrambled RGs each**: - `inexact`: 3 files x ~20 RGs each - `overlap`: 5 files x ~12 RGs each Uses pyarrow to redistribute RGs from a sorted temp file into multiple output files with scrambled RG order. Each RG has a narrow `l_orderkey` range (~100K) but appears in scrambled order within its file. ### 2. Add DESC LIMIT queries to sort_pushdown benchmark New q5-q8 for `sort_pushdown` (sorted data, `WITH ORDER`): | Query | Description | |-------|-------------| | q5 | `ORDER BY l_orderkey DESC LIMIT 100` (narrow projection) | | q6 | `ORDER BY l_orderkey DESC LIMIT 1000` (narrow projection) | | q7 | `SELECT * ORDER BY l_orderkey DESC LIMIT 100` (wide projection) | | q8 | `SELECT * ORDER BY l_orderkey DESC LIMIT 1000` (wide projection) | These test the Inexact sort pushdown path: reverse scan + TopK + dynamic filter, which benefits from the optimizations in apache#21580. ## Are these changes tested? Benchmark changes only. Verified locally: - Data generation produces correct multi-file multi-RG output - DESC LIMIT queries return correct results - q5-q8 show 20-58x improvement with apache#21580 optimizations ## Are there any user-facing changes? No. Adds pyarrow as a dependency for generating benchmark datasets (`pip install pyarrow`).

zhuqi-lucas · 2026-04-22T04:58:59Z

run benchmark sort_pushdown_inexact

adriangbot · 2026-04-22T05:01:43Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4293595834-1736-l7lsc 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/reorder-row-groups-by-stats (bb87ff6) to 64619a6 (merge-base) diff using: sort_pushdown_inexact
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-04-22T05:16:12Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and feat_reorder-row-groups-by-stats
--------------------
Benchmark sort_pushdown_inexact.json
--------------------
┏━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query ┃                           HEAD ┃ feat_reorder-row-groups-by-stats ┃        Change ┃
┡━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ Q1    │   6.71 / 8.17 ±1.20 / 10.30 ms │      3.18 / 3.79 ±0.80 / 5.29 ms │ +2.15x faster │
│ Q2    │    6.92 / 7.24 ±0.39 / 7.98 ms │      3.16 / 3.39 ±0.39 / 4.17 ms │ +2.13x faster │
│ Q3    │ 22.09 / 22.87 ±0.67 / 23.75 ms │      7.01 / 7.20 ±0.26 / 7.71 ms │ +3.17x faster │
│ Q4    │ 20.72 / 21.83 ±0.80 / 22.77 ms │      7.17 / 7.21 ±0.03 / 7.24 ms │ +3.03x faster │
└───────┴────────────────────────────────┴──────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┓
┃ Benchmark Summary                               ┃         ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━┩
│ Total Time (HEAD)                               │ 60.11ms │
│ Total Time (feat_reorder-row-groups-by-stats)   │ 21.59ms │
│ Average Time (HEAD)                             │ 15.03ms │
│ Average Time (feat_reorder-row-groups-by-stats) │  5.40ms │
│ Queries Faster                                  │       4 │
│ Queries Slower                                  │       0 │
│ Queries with No Change                          │       0 │
│ Queries with Failure                            │       0 │
└─────────────────────────────────────────────────┴─────────┘

Resource Usage

sort_pushdown_inexact — base (merge-base)

Metric	Value
Wall time	5.0s
Peak memory	4.4 GiB
Avg memory	4.4 GiB
CPU user	2.6s
CPU sys	0.3s
Peak spill	0 B

sort_pushdown_inexact — branch

Metric	Value
Wall time	5.0s
Peak memory	4.4 GiB
Avg memory	4.4 GiB
CPU user	0.9s
CPU sys	0.2s
Peak spill	0 B

File an issue against this benchmark runner

zhuqi-lucas · 2026-04-22T05:26:51Z

run benchmark clickbench_1

adriangbot · 2026-04-22T05:27:52Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4293745755-1737-k7jcp 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/reorder-row-groups-by-stats (2081071) to 64619a6 (merge-base) diff using: clickbench_1
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-04-22T05:46:50Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and feat_reorder-row-groups-by-stats
--------------------
Benchmark clickbench_1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃      feat_reorder-row-groups-by-stats ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │           0.71 / 1.06 ±0.62 / 2.29 ms │           0.72 / 1.06 ±0.64 / 2.34 ms │     no change │
│ QQuery 1  │        14.01 / 14.32 ±0.38 / 15.06 ms │        13.92 / 14.29 ±0.41 / 15.07 ms │     no change │
│ QQuery 2  │        41.81 / 42.19 ±0.32 / 42.74 ms │        42.04 / 42.16 ±0.09 / 42.31 ms │     no change │
│ QQuery 3  │        38.16 / 39.03 ±0.92 / 40.23 ms │        38.09 / 38.29 ±0.13 / 38.47 ms │     no change │
│ QQuery 4  │     254.08 / 258.22 ±3.78 / 264.69 ms │     251.30 / 254.54 ±1.76 / 256.25 ms │     no change │
│ QQuery 5  │     420.94 / 428.30 ±6.34 / 439.75 ms │     422.07 / 424.77 ±2.18 / 428.75 ms │     no change │
│ QQuery 6  │           6.26 / 6.58 ±0.30 / 7.04 ms │           6.22 / 6.53 ±0.33 / 7.04 ms │     no change │
│ QQuery 7  │        15.98 / 16.07 ±0.08 / 16.16 ms │        16.15 / 16.35 ±0.16 / 16.64 ms │     no change │
│ QQuery 8  │     352.57 / 359.77 ±4.57 / 366.75 ms │     357.49 / 363.37 ±5.07 / 372.55 ms │     no change │
│ QQuery 9  │     526.54 / 528.54 ±1.63 / 530.86 ms │    521.91 / 533.38 ±10.46 / 548.04 ms │     no change │
│ QQuery 10 │        87.84 / 90.57 ±4.13 / 98.77 ms │       88.36 / 91.46 ±4.75 / 100.91 ms │     no change │
│ QQuery 11 │     101.01 / 101.58 ±0.49 / 102.39 ms │     102.57 / 104.09 ±1.60 / 106.86 ms │     no change │
│ QQuery 12 │    429.19 / 445.98 ±15.37 / 474.36 ms │    427.82 / 441.76 ±10.28 / 459.23 ms │     no change │
│ QQuery 13 │    498.66 / 506.89 ±11.66 / 530.02 ms │    478.98 / 494.18 ±12.72 / 514.17 ms │     no change │
│ QQuery 14 │     434.91 / 439.48 ±3.41 / 443.94 ms │     431.07 / 434.42 ±2.74 / 438.70 ms │     no change │
│ QQuery 15 │     294.99 / 302.55 ±6.70 / 314.09 ms │     301.08 / 304.46 ±3.17 / 310.42 ms │     no change │
│ QQuery 16 │     631.99 / 645.31 ±7.92 / 652.37 ms │     642.57 / 650.64 ±4.97 / 657.25 ms │     no change │
│ QQuery 17 │     642.32 / 644.89 ±2.35 / 649.31 ms │     645.75 / 654.34 ±8.71 / 668.26 ms │     no change │
│ QQuery 18 │ 1272.13 / 1288.08 ±13.56 / 1303.09 ms │ 1272.54 / 1284.75 ±15.54 / 1313.68 ms │     no change │
│ QQuery 19 │        35.15 / 36.51 ±2.35 / 41.21 ms │        35.04 / 35.22 ±0.14 / 35.47 ms │     no change │
│ QQuery 20 │    622.40 / 639.57 ±11.75 / 653.31 ms │    623.51 / 646.35 ±20.78 / 681.03 ms │     no change │
│ QQuery 21 │    709.07 / 719.07 ±10.40 / 737.61 ms │    708.72 / 725.80 ±11.27 / 738.38 ms │     no change │
│ QQuery 22 │ 1380.42 / 1392.99 ±13.47 / 1418.77 ms │  1389.23 / 1395.73 ±7.34 / 1407.26 ms │     no change │
│ QQuery 23 │ 3794.98 / 3808.40 ±13.62 / 3834.12 ms │ 3762.24 / 3794.71 ±20.43 / 3824.48 ms │     no change │
│ QQuery 24 │     214.71 / 221.35 ±7.70 / 234.91 ms │     214.71 / 219.78 ±6.48 / 232.33 ms │     no change │
│ QQuery 25 │     184.80 / 187.12 ±2.01 / 190.52 ms │     182.92 / 187.87 ±5.63 / 198.71 ms │     no change │
│ QQuery 26 │     214.55 / 216.73 ±1.71 / 219.71 ms │     214.26 / 217.59 ±3.03 / 222.64 ms │     no change │
│ QQuery 27 │    750.65 / 766.51 ±13.58 / 783.15 ms │     752.21 / 762.04 ±8.56 / 775.46 ms │     no change │
│ QQuery 28 │ 3467.77 / 3489.47 ±21.57 / 3519.84 ms │ 3453.68 / 3470.58 ±12.28 / 3483.38 ms │     no change │
│ QQuery 29 │       46.44 / 54.72 ±10.81 / 73.28 ms │        46.28 / 46.74 ±0.30 / 47.22 ms │ +1.17x faster │
│ QQuery 30 │    411.70 / 425.53 ±13.42 / 448.77 ms │    416.34 / 430.10 ±13.01 / 452.11 ms │     no change │
│ QQuery 31 │    388.41 / 402.03 ±13.07 / 425.76 ms │     389.94 / 396.69 ±5.70 / 406.16 ms │     no change │
│ QQuery 32 │ 1015.39 / 1030.76 ±18.04 / 1065.01 ms │ 1025.02 / 1050.65 ±22.54 / 1087.76 ms │     no change │
│ QQuery 33 │ 1457.03 / 1477.87 ±12.07 / 1490.70 ms │ 1465.08 / 1483.09 ±23.57 / 1528.64 ms │     no change │
│ QQuery 34 │ 1476.66 / 1491.32 ±11.26 / 1508.97 ms │ 1477.56 / 1510.65 ±40.61 / 1589.72 ms │     no change │
│ QQuery 35 │    324.15 / 353.47 ±40.51 / 432.19 ms │     320.39 / 332.61 ±9.84 / 347.03 ms │ +1.06x faster │
│ QQuery 36 │     119.12 / 126.53 ±5.93 / 136.26 ms │    130.31 / 141.15 ±11.60 / 156.11 ms │  1.12x slower │
│ QQuery 37 │       54.08 / 63.48 ±10.61 / 81.14 ms │        52.48 / 56.63 ±5.69 / 67.90 ms │ +1.12x faster │
│ QQuery 38 │        84.47 / 86.47 ±1.15 / 87.93 ms │        84.86 / 86.49 ±1.59 / 88.94 ms │     no change │
│ QQuery 39 │     239.98 / 244.46 ±4.33 / 252.55 ms │     234.40 / 245.65 ±6.93 / 252.64 ms │     no change │
│ QQuery 40 │        22.27 / 22.67 ±0.34 / 23.14 ms │        22.88 / 25.58 ±3.64 / 32.58 ms │  1.13x slower │
│ QQuery 41 │        20.14 / 20.72 ±0.69 / 22.02 ms │        19.81 / 20.27 ±0.49 / 21.19 ms │     no change │
│ QQuery 42 │       19.89 / 26.41 ±12.17 / 50.73 ms │        19.62 / 19.85 ±0.15 / 20.09 ms │ +1.33x faster │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                               ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                               │ 23463.59ms │
│ Total Time (feat_reorder-row-groups-by-stats)   │ 23456.64ms │
│ Average Time (HEAD)                             │   545.66ms │
│ Average Time (feat_reorder-row-groups-by-stats) │   545.50ms │
│ Queries Faster                                  │          4 │
│ Queries Slower                                  │          2 │
│ Queries with No Change                          │         37 │
│ Queries with Failure                            │          0 │
└─────────────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_1 — base (merge-base)

Metric	Value
Wall time	120.0s
Peak memory	30.0 GiB
Avg memory	21.9 GiB
CPU user	1118.2s
CPU sys	67.2s
Peak spill	0 B

clickbench_1 — branch

Metric	Value
Wall time	120.0s
Peak memory	28.7 GiB
Avg memory	21.9 GiB
CPU user	1118.6s
CPU sys	67.3s
Peak spill	0 B

File an issue against this benchmark runner

zhuqi-lucas · 2026-04-22T05:50:26Z

run benchmark sort_pushdown_inexact

adriangbot · 2026-04-22T05:53:28Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4293875595-1738-krt7d 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/reorder-row-groups-by-stats (2081071) to 64619a6 (merge-base) diff using: sort_pushdown_inexact
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-04-22T06:03:10Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and feat_reorder-row-groups-by-stats
--------------------
Benchmark sort_pushdown_inexact.json
--------------------
┏━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query ┃                           HEAD ┃ feat_reorder-row-groups-by-stats ┃        Change ┃
┡━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ Q1    │    6.80 / 7.59 ±1.12 / 9.76 ms │      2.90 / 3.43 ±0.84 / 5.11 ms │ +2.21x faster │
│ Q2    │    6.67 / 6.85 ±0.16 / 7.05 ms │      3.06 / 3.14 ±0.07 / 3.24 ms │ +2.19x faster │
│ Q3    │ 21.04 / 21.89 ±0.66 / 22.79 ms │      6.85 / 7.07 ±0.17 / 7.31 ms │ +3.10x faster │
│ Q4    │ 20.21 / 20.93 ±0.71 / 21.98 ms │      6.92 / 7.06 ±0.14 / 7.26 ms │ +2.96x faster │
└───────┴────────────────────────────────┴──────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┓
┃ Benchmark Summary                               ┃         ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━┩
│ Total Time (HEAD)                               │ 57.26ms │
│ Total Time (feat_reorder-row-groups-by-stats)   │ 20.70ms │
│ Average Time (HEAD)                             │ 14.32ms │
│ Average Time (feat_reorder-row-groups-by-stats) │  5.17ms │
│ Queries Faster                                  │       4 │
│ Queries Slower                                  │       0 │
│ Queries with No Change                          │       0 │
│ Queries with Failure                            │       0 │
└─────────────────────────────────────────────────┴─────────┘

Resource Usage

sort_pushdown_inexact — base (merge-base)

Metric	Value
Wall time	5.0s
Peak memory	4.4 GiB
Avg memory	4.4 GiB
CPU user	2.5s
CPU sys	0.3s
Peak spill	0 B

sort_pushdown_inexact — branch

Metric	Value
Wall time	5.0s
Peak memory	4.4 GiB
Avg memory	4.4 GiB
CPU user	0.9s
CPU sys	0.2s
Peak spill	0 B

File an issue against this benchmark runner

zhuqi-lucas · 2026-04-22T06:18:57Z

cc @adriangb @Dandandan @alamb

The latest CI benchmark results show 2x-3x faster on sort_pushdown_inexact with no regression on ClickBench:

┏━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query ┃                           HEAD ┃ feat_reorder-row-groups-by-stats ┃        Change ┃
┡━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ Q1    │    6.80 / 7.59 ±1.12 / 9.76 ms │      2.90 / 3.43 ±0.84 / 5.11 ms │ +2.21x faster │
│ Q2    │    6.67 / 6.85 ±0.16 / 7.05 ms │      3.06 / 3.14 ±0.07 / 3.24 ms │ +2.19x faster │
│ Q3    │ 21.04 / 21.89 ±0.66 / 22.79 ms │      6.85 / 7.07 ±0.17 / 7.31 ms │ +3.10x faster │
│ Q4    │ 20.21 / 20.93 ±0.71 / 21.98 ms │      6.92 / 7.06 ±0.14 / 7.26 ms │ +2.96x faster │
└───────┴────────────────────────────────┴──────────────────────────────────┴───────────────┘

Local benchmark on sorted single-file (61 RGs, narrow ranges) shows 17-60x faster — the full optimization chain (stats init + cumulative RG prune) skips 60 of 61 RGs with zero I/O.

Would appreciate a review when you get a chance! The PR is ready for review.

xudong963 · 2026-04-22T10:46:31Z

+            let mut cumulative = 0usize;
+            let mut keep_count = 0;
+            for &idx in rg_indexes {
+                cumulative += file_metadata.row_group(idx).num_rows() as usize;


The parquet opener first turns the dynamic filter into a pushed-down row_filter / row_selection, but later the cumulative cutoff still sums raw row_group.num_rows() and truncates once that raw count reaches fetch. That is unsafe when early RGs contain many rows, but few rows survive the filter.

Good point. Currently create_filter and fetch are set in the same method (with_fetch), and we fixed the ordering so fetch is set before create_filter is called. There's no separate code path that updates fetch
without recreating the filter.

But you're right that this coupling is fragile, if a future optimizer calls with_fetch independently, the filter's fetch would go stale. I'll optimize it as follow-up to consider making fetch on DynamicFilterPhysicalExpr read directly from SortExec.fetch (via shared reference) instead of copying the value at creation time.

Created the follow-up issue:
#21780

xudong963 · 2026-04-22T10:47:36Z

+
+    /// Keep only the first `count` row groups, dropping the rest.
+    /// Used for TopK cumulative pruning after reorder + reverse.
+    pub(crate) fn truncate_row_groups(mut self, count: usize) -> Self {


it drops existing row_selection entirely after truncation, which can widen the scan back to full row groups or discard exact page-level pruning state

Good catch @xudong963, currently truncate_row_groups drops row_selection entirely, which loses page-level pruning state for the retained RGs. I'll fix this by skipping truncation when row_selection is present cumulative
prune will only apply when there's no page-level pruning active.

This is safe because page pruning is already reducing I/O within those RGs.

xudong963 · 2026-04-22T10:49:17Z

+        new_sort.filter = fetch.is_some().then(|| {
+            // If we already have a filter, keep it. Otherwise, create a new one.
+            // Must be called after setting fetch so DynamicFilter gets the K value.
+            self.filter


SortExec::with_fetch can leave the embedded dynamic filter’s fetch stale. The new parquet optimizations read df.fetch() for threshold init and cumulative prune, so if any later optimizer rewrites fetch through with_fetch, parquet may prune using the old K

This is actually safe in the current implementation because cumulative prune is guarded by is_pure_dynamic_filter — it only fires when the predicate is purely the DynamicFilterPhysicalExpr (no WHERE
clause). And cumulative prune runs before any data is read, so row_filter hasn't filtered any rows yet — num_rows() is accurate at this point.

If we extend this to WHERE queries in the future (e.g. with dynamic RG pruning at runtime via #21399), we'd need to switch from static row counting to runtime early termination.

xudong963 · 2026-04-22T10:52:23Z

+
+        // Get the first sort expression
+        // LexOrdering is guaranteed non-empty, so first() returns &PhysicalSortExpr
+        let first_sort_expr = sort_order.first();


what about multi-key ORDER BY?

Thanks @xudong963 for review, yes, now only support sort one key, i will support multi-key ORDER BY as follow-up.

Copilot AI review requested due to automatic review settings April 13, 2026 06:40

github-actions Bot added the datasource Changes to the datasource crate label Apr 13, 2026

zhuqi-lucas force-pushed the feat/reorder-row-groups-by-stats branch from 3700464 to a013bf6 Compare April 13, 2026 06:42

Copilot AI reviewed Apr 13, 2026

View reviewed changes

github-actions Bot added the sqllogictest SQL Logic Tests (.slt) label Apr 13, 2026

Copilot started reviewing on behalf of zhuqi-lucas April 13, 2026 06:50 View session

Dandandan reviewed Apr 13, 2026

View reviewed changes

zhuqi-lucas mentioned this pull request Apr 13, 2026

Reorder row groups by GROUP BY keys to reduce aggregate partition state and improve cache locality #21581

Open

zhuqi-lucas mentioned this pull request Apr 13, 2026

Add benchmark for sort pushdown Inexact path (row group reorder) #21582

Closed

zhuqi-lucas marked this pull request as ready for review April 22, 2026 03:28

fix: use slt:ignore for non-deterministic output_rows_skew metric

5c31674

zhuqi-lucas force-pushed the feat/reorder-row-groups-by-stats branch from bb87ff6 to 5c31674 Compare April 22, 2026 05:21

Merge branch 'main' into feat/reorder-row-groups-by-stats

2081071

zhuqi-lucas mentioned this pull request Apr 22, 2026

Support Morsel output for Parquet known to be non blocking #21598

Open

xudong963 reviewed Apr 22, 2026

View reviewed changes

zhuqi-lucas mentioned this pull request Apr 22, 2026

refactor: make DynamicFilterPhysicalExpr.fetch a shared reference to SortExec.fetch #21780

Open

		// LexOrdering is guaranteed non-empty, so first() returns &PhysicalSortExpr
		let first_sort_expr = sort_order.first();

-        // LexOrdering is guaranteed non-empty, so first() returns &PhysicalSortExpr
-        let first_sort_expr = sort_order.first();
+        let first_sort_expr = match sort_order.iter().next() {
+            Some(expr) => expr,
+            None => {
+                debug!("Skipping RG reorder: empty sort order");
+                return Ok(self);
+            }
+        };

Conversation

zhuqi-lucas commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

1. Global file reorder (FileSource::reorder_files)

2. RG reorder within file (reorder_by_statistics)

3. TopK threshold init from statistics (try_init_topk_threshold)

4. Cumulative RG pruning (truncate_row_groups)

5. Compose reorder + reverse

How they work together

Coverage matrix

Local benchmark (single file, 61 sorted RGs, DESC LIMIT, 1 partition)

Key bug fix: SortExec.fetch ordering

Changes to DynamicFilterPhysicalExpr

Are these changes tested?

Are there any user-facing changes?

Uh oh!

zhuqi-lucas commented Apr 13, 2026

Uh oh!

Dandandan commented Apr 13, 2026

Uh oh!

adriangbot commented Apr 13, 2026

Uh oh!

adriangbot commented Apr 13, 2026

Uh oh!

adriangbot commented Apr 13, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Dandandan Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

zhuqi-lucas Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Dandandan Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhuqi-lucas Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

zhuqi-lucas Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Dandandan commented Apr 13, 2026

Uh oh!

adriangbot commented Apr 13, 2026

Uh oh!

adriangbot commented Apr 13, 2026

Uh oh!

adriangbot commented Apr 13, 2026

Uh oh!

adriangbot commented Apr 13, 2026

Uh oh!

adriangbot commented Apr 13, 2026

Uh oh!

Dandandan commented Apr 13, 2026

Uh oh!

adriangbot commented Apr 13, 2026

zhuqi-lucas commented Apr 13, 2026 •

edited

Loading

1. Global file reorder (`FileSource::reorder_files`)

2. RG reorder within file (`reorder_by_statistics`)

3. TopK threshold init from statistics (`try_init_topk_threshold`)

4. Cumulative RG pruning (`truncate_row_groups`)

Key bug fix: `SortExec.fetch` ordering

Changes to `DynamicFilterPhysicalExpr`

Dandandan Apr 13, 2026 •

edited

Loading

zhuqi-lucas commented Apr 13, 2026 •

edited

Loading

zhuqi-lucas commented Apr 22, 2026 •

edited

Loading