Skip to content

feat: statistics-driven TopK optimization for parquet (file reorder + RG reorder + threshold init + cumulative prune)#21580

Open
zhuqi-lucas wants to merge 36 commits intoapache:mainfrom
zhuqi-lucas:feat/reorder-row-groups-by-stats
Open

feat: statistics-driven TopK optimization for parquet (file reorder + RG reorder + threshold init + cumulative prune)#21580
zhuqi-lucas wants to merge 36 commits intoapache:mainfrom
zhuqi-lucas:feat/reorder-row-groups-by-stats

Conversation

@zhuqi-lucas
Copy link
Copy Markdown
Contributor

@zhuqi-lucas zhuqi-lucas commented Apr 13, 2026

Which issue does this PR close?

Closes #21691
Partial fix for #21399

Rationale for this change

TopK queries (ORDER BY col DESC/ASC LIMIT K) on parquet data have several inefficiencies:

  • Files and RGs are read in arbitrary order, not optimized for the sort direction
  • The dynamic filter threshold starts as lit(true), so early RGs are never pruned
  • All RGs are opened even when the top-K values are concentrated in a few RGs

What changes are included in this PR?

A chain of composable optimizations that minimize I/O for TopK queries:

1. Global file reorder (FileSource::reorder_files)

Sort files in the shared work queue by column statistics. DESC: highest min first; ASC: lowest max first. Works for ALL TopK via DynamicFilterPhysicalExpr.sort_options. Bails fast when sort column not in file schema (GROUP BY + ORDER BY).

2. RG reorder within file (reorder_by_statistics)

Reorder row groups by min values (ASC). Works for all TopK via DynamicFilter sort_options (with file schema check). Combined with reverse for DESC queries.

3. TopK threshold init from statistics (try_init_topk_threshold)

Before reading data, compute threshold from RG min/max stats. Runs BEFORE PruningPredicate build so the threshold is compiled into the predicate. Uses GtEq/LtEq to include boundary values. Null-aware filter for NULLS FIRST. Uses df.fetch() (TopK K value) so stats init skips when K spans multiple RGs. Restricted to sort pushdown + no WHERE (pure DynamicFilter predicate).

4. Cumulative RG pruning (truncate_row_groups)

After reorder + reverse, accumulate rows from the front until >= K, prune the rest. For non-sort-pushdown TopK, guarded by a non-overlap check (max(i) <= min(i+1)). Only when predicate is pure DynamicFilter (no WHERE).

5. Compose reorder + reverse

Sequential steps instead of mutually exclusive. Reverse only triggers when reorder succeeds (sort column found in file schema).

How they work together

File reorder (best file first in shared queue)
  → RG reorder (best RG first within file)
    → Reverse (flip for DESC)
      → Stats init (threshold from RG stats → PruningPredicate)
        → RG pruning (60 of 61 RGs skipped, zero I/O!)
          → Cumulative prune (confirm enough rows for K)
            → Read only 1 RG

Coverage matrix

Scenario File reorder RG reorder Reverse Stats init Cumulative prune
Non-overlapping + no WHERE ✅ (17-60x)
Non-overlapping + WHERE
Overlapping RGs
Sort column not in parquet ❌ fast bail ❌ fast bail

Local benchmark (single file, 61 sorted RGs, DESC LIMIT, 1 partition)

Query Baseline With optimizations Speedup
Q1 (DESC LIMIT 100) 28.48 ms 1.64 ms 17.4x
Q2 (DESC LIMIT 1000) 22.24 ms 0.37 ms 60.1x
Q3 (SELECT * LIMIT 100) 22.51 ms 0.66 ms 34.1x
Q4 (SELECT * LIMIT 1000) 22.37 ms 0.61 ms 36.7x

Key bug fix: SortExec.fetch ordering

create_filter() was called before new_sort.fetch was set, so DynamicFilterPhysicalExpr.fetch was always 0. Fixed by setting fetch before creating the filter.

Changes to DynamicFilterPhysicalExpr

  • sort_options: Option<Vec<SortOptions>> — sort direction for each child
  • fetch: Option<usize> — TopK K value for cumulative pruning
  • new_with_sort_options() constructor, sort_options() and fetch() getters
  • Set by SortExec::create_filter() for all TopK queries

Are these changes tested?

  • 110 unit tests in datafusion-datasource-parquet (all pass)
  • SLT tests: sort_pushdown.slt (Tests H/I/J/K), push_down_filter_parquet.slt, explain_analyze.slt, topk.slt (3 files)
  • Fuzz test: test_fuzz_topk_filter_pushdown — updated with tiebreaker columns for deterministic ORDER BY
  • ClickBench: no regression (fast bail for GROUP BY + ORDER BY queries)

Are there any user-facing changes?

No. Transparent optimization — same results, faster TopK on parquet with statistics.

Copilot AI review requested due to automatic review settings April 13, 2026 06:40
@github-actions github-actions Bot added the datasource Changes to the datasource crate label Apr 13, 2026
@zhuqi-lucas zhuqi-lucas force-pushed the feat/reorder-row-groups-by-stats branch from 3700464 to a013bf6 Compare April 13, 2026 06:42
@zhuqi-lucas
Copy link
Copy Markdown
Contributor Author

cc @alamb @adriangb

@Dandandan
Copy link
Copy Markdown
Contributor

run benchmarks

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4234363943-1136-k2bb2 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/reorder-row-groups-by-stats (a013bf6) to 29c5dd5 (merge-base) diff using: tpch
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4234363943-1134-pcn8f 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/reorder-row-groups-by-stats (a013bf6) to 29c5dd5 (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4234363943-1135-k9dwc 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/reorder-row-groups-by-stats (a013bf6) to 29c5dd5 (merge-base) diff using: tpcds
Results will be posted here when complete


File an issue against this benchmark runner

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR improves TopK performance for Parquet scans when sort pushdown is Inexact by enabling row-group reordering based on statistics, so likely “best” row groups are read earlier and dynamic filters can tighten sooner.

Changes:

  • Thread an optional LexOrdering from ParquetSource::try_pushdown_sort through ParquetMorselizer to the access-plan preparation step.
  • Add PreparedAccessPlan::reorder_by_statistics to reorder row_group_indexes using Parquet statistics.
  • Add unit tests covering reorder/skip behavior for multiple edge cases.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
datafusion/datasource-parquet/src/source.rs Plumbs sort ordering into the file source for later row-group reordering.
datafusion/datasource-parquet/src/opener.rs Carries optional sort ordering into the opener and applies reorder_by_statistics during plan preparation.
datafusion/datasource-parquet/src/access_plan.rs Implements row-group reordering by statistics and adds focused unit tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +824 to +826
let sort_order = LexOrdering::new(order.iter().cloned());
let mut new_source = self.clone().with_reverse_row_groups(true);
new_source.sort_order_for_reorder = sort_order;
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LexOrdering::new(...) appears to return a Result<LexOrdering, _> (as used with .unwrap() in the new unit tests), but here it’s assigned directly without ?/unwrap, and then assigned to sort_order_for_reorder: Option<LexOrdering> without wrapping in Some(...). This should be changed to construct a LexOrdering with error propagation and store it as Some(sort_order) (or skip setting the field on error). Otherwise this won’t compile.

Suggested change
let sort_order = LexOrdering::new(order.iter().cloned());
let mut new_source = self.clone().with_reverse_row_groups(true);
new_source.sort_order_for_reorder = sort_order;
let sort_order = LexOrdering::new(order.iter().cloned())?;
let mut new_source = self.clone().with_reverse_row_groups(true);
new_source.sort_order_for_reorder = Some(sort_order);

Copilot uses AI. Check for mistakes.
Comment on lines +414 to +415
// LexOrdering is guaranteed non-empty, so first() returns &PhysicalSortExpr
let first_sort_expr = sort_order.first();
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sort_order.first() (if LexOrdering is Vec-like) returns Option<&PhysicalSortExpr>, but the code uses it as if it were &PhysicalSortExpr (first_sort_expr.expr...). This is likely a compile error. A concrete fix is to obtain the first element via iteration and handle the empty case (e.g., early-return Ok(self) if no sort expressions), then use the returned &PhysicalSortExpr.

Suggested change
// LexOrdering is guaranteed non-empty, so first() returns &PhysicalSortExpr
let first_sort_expr = sort_order.first();
let first_sort_expr = match sort_order.iter().next() {
Some(expr) => expr,
None => {
debug!("Skipping RG reorder: empty sort order");
return Ok(self);
}
};

Copilot uses AI. Check for mistakes.
}
};

let descending = first_sort_expr.options.descending;
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For DESC ordering, reordering by min values is often a poor proxy for “row group likely contains the largest values first”; typically you want to sort by max when descending == true (and by min when ascending). This can significantly reduce the intended TopK benefit (and can even choose a worse first row group when ranges overlap). Consider switching to row_group_maxs(...) for descending order, and update the doc comment (currently mentions “min/max”) and the DESC unit test accordingly.

Copilot uses AI. Check for mistakes.
Comment on lines +442 to +463
// Get min values for the selected row groups
let rg_metadata: Vec<&RowGroupMetaData> = self
.row_group_indexes
.iter()
.map(|&idx| file_metadata.row_group(idx))
.collect();

let min_values = match converter.row_group_mins(rg_metadata.iter().copied()) {
Ok(vals) => vals,
Err(e) => {
debug!("Skipping RG reorder: cannot get min values: {e}");
return Ok(self);
}
};

// Sort indices by min values
let sort_options = arrow::compute::SortOptions {
descending,
nulls_first: first_sort_expr.options.nulls_first,
};
let sorted_indices = match arrow::compute::sort_to_indices(
&min_values,
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For DESC ordering, reordering by min values is often a poor proxy for “row group likely contains the largest values first”; typically you want to sort by max when descending == true (and by min when ascending). This can significantly reduce the intended TopK benefit (and can even choose a worse first row group when ranges overlap). Consider switching to row_group_maxs(...) for descending order, and update the doc comment (currently mentions “min/max”) and the DESC unit test accordingly.

Suggested change
// Get min values for the selected row groups
let rg_metadata: Vec<&RowGroupMetaData> = self
.row_group_indexes
.iter()
.map(|&idx| file_metadata.row_group(idx))
.collect();
let min_values = match converter.row_group_mins(rg_metadata.iter().copied()) {
Ok(vals) => vals,
Err(e) => {
debug!("Skipping RG reorder: cannot get min values: {e}");
return Ok(self);
}
};
// Sort indices by min values
let sort_options = arrow::compute::SortOptions {
descending,
nulls_first: first_sort_expr.options.nulls_first,
};
let sorted_indices = match arrow::compute::sort_to_indices(
&min_values,
// Get values for the selected row groups: mins for ASC, maxs for DESC
let rg_metadata: Vec<&RowGroupMetaData> = self
.row_group_indexes
.iter()
.map(|&idx| file_metadata.row_group(idx))
.collect();
let sort_values = match if descending {
converter.row_group_maxs(rg_metadata.iter().copied())
} else {
converter.row_group_mins(rg_metadata.iter().copied())
} {
Ok(vals) => vals,
Err(e) => {
debug!("Skipping RG reorder: cannot get min/max values: {e}");
return Ok(self);
}
};
// Sort indices by the statistics that best match the requested order
let sort_options = arrow::compute::SortOptions {
descending,
nulls_first: first_sort_expr.options.nulls_first,
};
let sorted_indices = match arrow::compute::sort_to_indices(
&sort_values,

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is a good point.

Comment on lines +462 to +466
let sorted_indices = match arrow::compute::sort_to_indices(
&min_values,
Some(sort_options),
None,
) {
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If multiple row groups share the same min (or max) statistic, sort_to_indices may not guarantee a deterministic/stable tie-breaker across platforms/versions. Since row-group order can affect scan reproducibility and performance debugging, consider adding a stable secondary key (e.g., original row group index) when statistics are equal.

Copilot uses AI. Check for mistakes.
@github-actions github-actions Bot added the sqllogictest SQL Logic Tests (.slt) label Apr 13, 2026
/// - 0 or 1 row groups (nothing to reorder)
/// - Sort expression is not a simple column reference
/// - Statistics are unavailable
pub(crate) fn reorder_by_statistics(
Copy link
Copy Markdown
Contributor

@Dandandan Dandandan Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @adriangb had the great idea to also order by grouping keys which can

  • reduce cardinality within partitions (partition-local state can be smaller)
  • allow for better cache locality (row groups with more equal keys are grouped together)

Doesn't have to be in this PR but perhaps we can think about how it fits in.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Dandandan for review! That's a great extension. The reorder_by_statistics method is generic enough to take any LexOrdering — it doesn't need to be tied to TopK specifically. So extending this for GROUP BY should be a matter of:

  1. Computing a preferred RG ordering from grouping keys in the aggregate planner
  2. Passing it through to ParquetSource::sort_order_for_reorder

Happy to track this as a follow-up issue. Will open one after this PR lands.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Dandandan! Created #21581 to track this. The existing infrastructure from this PR should be directly reusable — mainly needs the aggregate planner to populate sort_order_for_reorder from grouping keys.

@Dandandan
Copy link
Copy Markdown
Contributor

run benchmarks

env:
    PUSHDOWN_FILTERS: true
    REORDER_FILTERS: true

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4234422910-1137-vvpxc 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/reorder-row-groups-by-stats (5018882) to 29c5dd5 (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4234422910-1138-qjxtt 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/reorder-row-groups-by-stats (5018882) to 29c5dd5 (merge-base) diff using: tpcds
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4234422910-1139-wpv6n 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/reorder-row-groups-by-stats (5018882) to 29c5dd5 (merge-base) diff using: tpch
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and feat_reorder-row-groups-by-stats
--------------------
Benchmark tpcds_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query     ┃                                     HEAD ┃         feat_reorder-row-groups-by-stats ┃       Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 1  │              6.76 / 7.21 ±0.73 / 8.66 ms │              6.66 / 7.05 ±0.75 / 8.56 ms │    no change │
│ QQuery 2  │        145.75 / 146.85 ±1.06 / 148.46 ms │        144.70 / 146.59 ±1.26 / 148.61 ms │    no change │
│ QQuery 3  │        114.48 / 115.73 ±0.96 / 117.37 ms │        113.70 / 114.51 ±0.53 / 115.15 ms │    no change │
│ QQuery 4  │    1336.62 / 1383.93 ±28.36 / 1413.36 ms │    1341.10 / 1369.70 ±25.80 / 1400.67 ms │    no change │
│ QQuery 5  │        172.96 / 173.76 ±0.93 / 175.52 ms │        172.87 / 173.71 ±1.06 / 175.76 ms │    no change │
│ QQuery 6  │       831.80 / 869.64 ±22.94 / 893.03 ms │       817.74 / 885.67 ±36.62 / 924.77 ms │    no change │
│ QQuery 7  │        343.27 / 346.44 ±2.73 / 351.44 ms │        343.10 / 347.19 ±4.26 / 354.07 ms │    no change │
│ QQuery 8  │        115.88 / 117.19 ±0.88 / 118.45 ms │        115.12 / 116.82 ±1.09 / 118.14 ms │    no change │
│ QQuery 9  │        101.81 / 105.19 ±5.49 / 116.11 ms │       100.38 / 109.94 ±10.62 / 129.61 ms │    no change │
│ QQuery 10 │        105.57 / 107.18 ±0.90 / 108.19 ms │        106.58 / 108.12 ±1.52 / 110.94 ms │    no change │
│ QQuery 11 │        951.72 / 963.82 ±6.42 / 968.96 ms │        964.70 / 976.63 ±7.17 / 986.01 ms │    no change │
│ QQuery 12 │           44.04 / 47.17 ±1.95 / 49.81 ms │           46.18 / 47.24 ±0.73 / 48.28 ms │    no change │
│ QQuery 13 │        403.35 / 405.32 ±1.43 / 407.21 ms │        403.48 / 405.90 ±1.57 / 407.89 ms │    no change │
│ QQuery 14 │     1009.53 / 1015.83 ±5.78 / 1026.74 ms │    1002.02 / 1020.87 ±15.11 / 1047.87 ms │    no change │
│ QQuery 15 │           16.12 / 17.91 ±1.76 / 20.55 ms │           17.16 / 19.05 ±1.98 / 22.65 ms │ 1.06x slower │
│ QQuery 16 │              7.31 / 7.59 ±0.20 / 7.91 ms │              7.88 / 8.59 ±0.73 / 9.70 ms │ 1.13x slower │
│ QQuery 17 │        229.47 / 231.19 ±1.44 / 233.08 ms │        240.36 / 243.29 ±2.25 / 246.13 ms │ 1.05x slower │
│ QQuery 18 │        126.61 / 128.79 ±1.68 / 131.55 ms │        134.44 / 135.44 ±0.69 / 136.56 ms │ 1.05x slower │
│ QQuery 19 │        156.40 / 157.51 ±0.85 / 159.00 ms │        162.88 / 164.97 ±1.38 / 166.21 ms │    no change │
│ QQuery 20 │           13.75 / 14.66 ±0.67 / 15.79 ms │           15.45 / 15.77 ±0.24 / 15.99 ms │ 1.08x slower │
│ QQuery 21 │           19.51 / 20.16 ±0.51 / 20.76 ms │           21.05 / 21.45 ±0.35 / 22.06 ms │ 1.06x slower │
│ QQuery 22 │        481.56 / 489.10 ±4.25 / 493.73 ms │        491.92 / 498.86 ±8.92 / 516.06 ms │    no change │
│ QQuery 23 │        873.33 / 884.81 ±6.37 / 892.89 ms │       887.79 / 896.87 ±10.62 / 916.66 ms │    no change │
│ QQuery 24 │        381.97 / 385.40 ±3.50 / 391.57 ms │        382.98 / 387.53 ±2.66 / 390.54 ms │    no change │
│ QQuery 25 │        340.86 / 344.45 ±2.19 / 346.83 ms │        340.45 / 343.50 ±1.73 / 345.12 ms │    no change │
│ QQuery 26 │           80.84 / 81.79 ±0.82 / 83.32 ms │           81.27 / 82.66 ±0.83 / 83.86 ms │    no change │
│ QQuery 27 │              6.88 / 7.52 ±0.74 / 8.95 ms │              6.87 / 7.23 ±0.48 / 8.16 ms │    no change │
│ QQuery 28 │        148.59 / 150.91 ±1.93 / 153.50 ms │        148.77 / 150.19 ±1.12 / 152.08 ms │    no change │
│ QQuery 29 │        282.92 / 285.15 ±1.50 / 287.45 ms │        283.28 / 284.50 ±1.29 / 286.63 ms │    no change │
│ QQuery 30 │           43.82 / 45.47 ±1.49 / 48.04 ms │           42.04 / 44.32 ±1.23 / 45.71 ms │    no change │
│ QQuery 31 │        169.84 / 172.47 ±2.27 / 176.48 ms │        169.29 / 171.30 ±1.68 / 173.99 ms │    no change │
│ QQuery 32 │           57.32 / 58.30 ±0.57 / 59.04 ms │           57.74 / 58.68 ±0.98 / 60.05 ms │    no change │
│ QQuery 33 │        140.09 / 143.20 ±1.85 / 145.61 ms │        141.03 / 142.63 ±0.92 / 143.86 ms │    no change │
│ QQuery 34 │              6.96 / 7.30 ±0.30 / 7.84 ms │              7.00 / 7.23 ±0.26 / 7.61 ms │    no change │
│ QQuery 35 │        107.83 / 109.50 ±1.10 / 110.70 ms │        106.64 / 109.99 ±2.04 / 112.55 ms │    no change │
│ QQuery 36 │              6.51 / 6.99 ±0.48 / 7.88 ms │              6.53 / 6.71 ±0.20 / 7.07 ms │    no change │
│ QQuery 37 │             8.73 / 9.25 ±0.67 / 10.56 ms │              8.21 / 8.81 ±0.46 / 9.57 ms │    no change │
│ QQuery 38 │           85.23 / 88.80 ±4.94 / 98.57 ms │           84.88 / 87.11 ±3.72 / 94.52 ms │    no change │
│ QQuery 39 │        125.77 / 129.50 ±4.44 / 137.69 ms │        127.22 / 128.59 ±1.12 / 130.62 ms │    no change │
│ QQuery 40 │        111.88 / 117.77 ±5.93 / 128.78 ms │        110.11 / 117.40 ±8.84 / 134.20 ms │    no change │
│ QQuery 41 │           15.66 / 16.18 ±0.59 / 17.31 ms │           14.86 / 16.06 ±1.18 / 18.18 ms │    no change │
│ QQuery 42 │        108.25 / 110.15 ±1.58 / 112.51 ms │        107.58 / 109.71 ±1.45 / 111.73 ms │    no change │
│ QQuery 43 │              5.98 / 6.31 ±0.27 / 6.73 ms │              6.03 / 6.53 ±0.80 / 8.12 ms │    no change │
│ QQuery 44 │           11.71 / 12.19 ±0.68 / 13.53 ms │           11.79 / 12.14 ±0.20 / 12.35 ms │    no change │
│ QQuery 45 │           51.06 / 52.40 ±0.78 / 53.28 ms │           50.16 / 52.05 ±1.38 / 54.23 ms │    no change │
│ QQuery 46 │              8.65 / 8.89 ±0.17 / 9.16 ms │              8.63 / 8.88 ±0.20 / 9.11 ms │    no change │
│ QQuery 47 │        710.73 / 722.40 ±6.30 / 729.86 ms │        733.32 / 745.39 ±6.92 / 754.00 ms │    no change │
│ QQuery 48 │        289.87 / 294.72 ±4.74 / 300.92 ms │        288.52 / 294.73 ±6.45 / 306.78 ms │    no change │
│ QQuery 49 │        251.64 / 252.97 ±1.41 / 255.48 ms │        255.34 / 255.97 ±0.56 / 256.78 ms │    no change │
│ QQuery 50 │        222.58 / 228.41 ±4.03 / 235.01 ms │        226.54 / 233.03 ±5.00 / 240.31 ms │    no change │
│ QQuery 51 │        180.59 / 184.08 ±2.83 / 187.39 ms │        183.67 / 186.36 ±1.96 / 188.77 ms │    no change │
│ QQuery 52 │        107.91 / 109.06 ±0.89 / 110.34 ms │        110.03 / 111.42 ±0.81 / 112.47 ms │    no change │
│ QQuery 53 │        104.18 / 105.27 ±1.15 / 107.27 ms │        105.42 / 106.80 ±1.39 / 109.23 ms │    no change │
│ QQuery 54 │        147.22 / 148.77 ±1.46 / 151.25 ms │        149.43 / 151.07 ±0.93 / 152.31 ms │    no change │
│ QQuery 55 │        108.14 / 109.84 ±1.73 / 112.49 ms │        108.46 / 110.09 ±1.84 / 113.65 ms │    no change │
│ QQuery 56 │        141.89 / 144.13 ±1.81 / 146.71 ms │        142.50 / 144.45 ±1.14 / 145.70 ms │    no change │
│ QQuery 57 │        172.45 / 174.79 ±1.90 / 177.95 ms │        176.55 / 178.66 ±1.32 / 180.38 ms │    no change │
│ QQuery 58 │        292.06 / 297.55 ±2.86 / 299.80 ms │        290.26 / 298.13 ±6.17 / 309.20 ms │    no change │
│ QQuery 59 │        199.68 / 202.81 ±4.20 / 210.76 ms │        196.18 / 201.51 ±2.94 / 205.01 ms │    no change │
│ QQuery 60 │        145.63 / 146.61 ±1.30 / 149.18 ms │        144.93 / 145.97 ±0.58 / 146.66 ms │    no change │
│ QQuery 61 │           13.09 / 13.42 ±0.22 / 13.74 ms │           13.26 / 13.48 ±0.20 / 13.79 ms │    no change │
│ QQuery 62 │      903.17 / 972.36 ±73.07 / 1110.97 ms │       893.58 / 940.09 ±28.77 / 975.15 ms │    no change │
│ QQuery 63 │        104.29 / 106.12 ±1.11 / 107.45 ms │        106.05 / 107.70 ±1.58 / 110.72 ms │    no change │
│ QQuery 64 │        684.26 / 697.36 ±8.94 / 709.76 ms │        684.46 / 694.14 ±8.98 / 708.42 ms │    no change │
│ QQuery 65 │        251.36 / 257.09 ±3.42 / 260.31 ms │        254.62 / 257.41 ±1.68 / 259.77 ms │    no change │
│ QQuery 66 │        239.53 / 250.02 ±7.11 / 258.78 ms │        242.28 / 255.27 ±8.91 / 265.85 ms │    no change │
│ QQuery 67 │        312.00 / 314.58 ±2.53 / 317.92 ms │        315.48 / 320.21 ±4.64 / 327.33 ms │    no change │
│ QQuery 68 │             8.86 / 9.83 ±1.00 / 11.23 ms │            8.85 / 10.18 ±0.80 / 10.96 ms │    no change │
│ QQuery 69 │        101.72 / 103.97 ±1.51 / 106.48 ms │        102.05 / 104.18 ±2.49 / 108.67 ms │    no change │
│ QQuery 70 │        349.78 / 359.97 ±8.05 / 369.60 ms │        341.41 / 352.18 ±7.53 / 362.53 ms │    no change │
│ QQuery 71 │        135.76 / 139.31 ±3.34 / 145.38 ms │        136.47 / 138.33 ±2.01 / 141.85 ms │    no change │
│ QQuery 72 │       610.09 / 628.55 ±11.61 / 639.76 ms │        618.47 / 625.85 ±7.96 / 638.13 ms │    no change │
│ QQuery 73 │              7.84 / 8.48 ±0.63 / 9.55 ms │             6.99 / 9.50 ±2.08 / 13.02 ms │ 1.12x slower │
│ QQuery 74 │        578.61 / 594.14 ±8.66 / 602.76 ms │        598.69 / 607.55 ±7.77 / 620.04 ms │    no change │
│ QQuery 75 │        276.80 / 279.31 ±2.06 / 282.54 ms │        280.18 / 283.49 ±2.96 / 288.22 ms │    no change │
│ QQuery 76 │        133.52 / 135.27 ±1.41 / 137.64 ms │        134.75 / 136.00 ±1.23 / 138.35 ms │    no change │
│ QQuery 77 │        187.89 / 190.27 ±1.66 / 192.45 ms │        188.51 / 190.82 ±2.30 / 194.65 ms │    no change │
│ QQuery 78 │        339.14 / 344.57 ±3.86 / 351.08 ms │        335.65 / 342.70 ±4.54 / 349.86 ms │    no change │
│ QQuery 79 │        235.13 / 237.76 ±1.41 / 239.31 ms │        235.50 / 239.71 ±2.39 / 242.04 ms │    no change │
│ QQuery 80 │        321.80 / 323.60 ±1.62 / 326.10 ms │        318.67 / 323.18 ±2.71 / 327.21 ms │    no change │
│ QQuery 81 │           26.78 / 27.93 ±1.53 / 30.94 ms │           26.92 / 27.55 ±0.60 / 28.44 ms │    no change │
│ QQuery 82 │        200.18 / 203.94 ±2.03 / 205.90 ms │        198.56 / 201.76 ±2.24 / 204.67 ms │    no change │
│ QQuery 83 │           38.87 / 40.48 ±1.80 / 43.37 ms │           38.94 / 40.31 ±0.99 / 41.65 ms │    no change │
│ QQuery 84 │           49.28 / 50.21 ±0.71 / 51.04 ms │           48.23 / 49.02 ±0.65 / 49.90 ms │    no change │
│ QQuery 85 │        146.57 / 149.44 ±1.63 / 151.12 ms │        149.68 / 151.20 ±0.91 / 152.20 ms │    no change │
│ QQuery 86 │           39.01 / 40.64 ±0.95 / 41.99 ms │           38.73 / 41.24 ±1.97 / 44.03 ms │    no change │
│ QQuery 87 │           88.77 / 90.74 ±2.50 / 95.56 ms │           87.04 / 89.26 ±2.80 / 94.74 ms │    no change │
│ QQuery 88 │        100.76 / 101.56 ±0.68 / 102.61 ms │        101.35 / 102.49 ±0.61 / 103.04 ms │    no change │
│ QQuery 89 │        118.94 / 121.61 ±1.96 / 124.28 ms │        120.01 / 120.84 ±0.76 / 121.77 ms │    no change │
│ QQuery 90 │           24.31 / 25.44 ±1.09 / 27.13 ms │           24.29 / 24.83 ±0.42 / 25.47 ms │    no change │
│ QQuery 91 │           60.69 / 63.74 ±1.69 / 65.64 ms │           62.38 / 65.54 ±1.79 / 67.28 ms │    no change │
│ QQuery 92 │           57.86 / 58.66 ±0.63 / 59.28 ms │           58.08 / 59.35 ±1.21 / 61.29 ms │    no change │
│ QQuery 93 │        187.91 / 189.87 ±2.11 / 193.21 ms │        187.01 / 189.06 ±1.65 / 192.02 ms │    no change │
│ QQuery 94 │           60.96 / 62.29 ±1.12 / 63.81 ms │           61.39 / 62.58 ±0.65 / 63.32 ms │    no change │
│ QQuery 95 │        129.33 / 129.84 ±0.28 / 130.12 ms │        129.13 / 130.31 ±1.16 / 132.24 ms │    no change │
│ QQuery 96 │           73.64 / 75.06 ±0.88 / 76.02 ms │           72.31 / 74.37 ±1.26 / 76.19 ms │    no change │
│ QQuery 97 │        124.76 / 127.56 ±2.34 / 131.18 ms │        124.99 / 127.31 ±1.47 / 128.75 ms │    no change │
│ QQuery 98 │        152.55 / 155.51 ±2.30 / 158.79 ms │        152.39 / 155.81 ±2.16 / 159.14 ms │    no change │
│ QQuery 99 │ 10799.15 / 10864.77 ±35.59 / 10898.37 ms │ 10810.76 / 10852.07 ±29.64 / 10887.35 ms │    no change │
└───────────┴──────────────────────────────────────────┴──────────────────────────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                               ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                               │ 31773.55ms │
│ Total Time (feat_reorder-row-groups-by-stats)   │ 31858.42ms │
│ Average Time (HEAD)                             │   320.94ms │
│ Average Time (feat_reorder-row-groups-by-stats) │   321.80ms │
│ Queries Faster                                  │          0 │
│ Queries Slower                                  │          7 │
│ Queries with No Change                          │         92 │
│ Queries with Failure                            │          0 │
└─────────────────────────────────────────────────┴────────────┘

Resource Usage

tpcds — base (merge-base)

Metric Value
Wall time 159.2s
Peak memory 5.5 GiB
Avg memory 4.5 GiB
CPU user 262.4s
CPU sys 17.8s
Peak spill 0 B

tpcds — branch

Metric Value
Wall time 159.6s
Peak memory 5.6 GiB
Avg memory 4.5 GiB
CPU user 263.9s
CPU sys 17.1s
Peak spill 0 B

File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and feat_reorder-row-groups-by-stats
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃      feat_reorder-row-groups-by-stats ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │          1.23 / 4.55 ±6.48 / 17.51 ms │          1.19 / 4.44 ±6.37 / 17.18 ms │     no change │
│ QQuery 1  │        14.40 / 14.69 ±0.23 / 14.92 ms │        14.23 / 14.59 ±0.20 / 14.82 ms │     no change │
│ QQuery 2  │        44.12 / 44.31 ±0.17 / 44.54 ms │        44.11 / 44.28 ±0.11 / 44.47 ms │     no change │
│ QQuery 3  │        41.87 / 44.88 ±2.83 / 48.21 ms │        45.19 / 46.05 ±0.92 / 47.82 ms │     no change │
│ QQuery 4  │     301.57 / 305.93 ±4.33 / 312.84 ms │     283.62 / 292.46 ±7.51 / 301.96 ms │     no change │
│ QQuery 5  │     343.76 / 349.26 ±2.90 / 351.52 ms │     340.90 / 346.09 ±3.65 / 350.98 ms │     no change │
│ QQuery 6  │          5.00 / 7.74 ±2.08 / 10.58 ms │          5.80 / 8.85 ±4.45 / 17.63 ms │  1.14x slower │
│ QQuery 7  │        16.79 / 17.42 ±0.40 / 17.97 ms │        16.80 / 16.96 ±0.18 / 17.30 ms │     no change │
│ QQuery 8  │     417.96 / 426.38 ±7.76 / 436.12 ms │     421.02 / 426.12 ±4.84 / 434.24 ms │     no change │
│ QQuery 9  │     666.36 / 676.75 ±8.41 / 686.88 ms │    655.03 / 663.43 ±10.04 / 682.29 ms │     no change │
│ QQuery 10 │        92.21 / 93.67 ±2.08 / 97.80 ms │       93.16 / 96.01 ±4.39 / 104.70 ms │     no change │
│ QQuery 11 │     104.40 / 105.92 ±1.09 / 107.50 ms │     103.33 / 108.14 ±4.01 / 115.55 ms │     no change │
│ QQuery 12 │     345.12 / 351.62 ±5.08 / 358.61 ms │     338.51 / 349.23 ±6.79 / 358.13 ms │     no change │
│ QQuery 13 │    454.95 / 466.86 ±13.39 / 492.97 ms │    459.79 / 482.10 ±32.75 / 546.44 ms │     no change │
│ QQuery 14 │     344.61 / 348.93 ±4.09 / 356.53 ms │     343.57 / 349.57 ±3.91 / 354.96 ms │     no change │
│ QQuery 15 │    354.08 / 376.26 ±20.59 / 412.84 ms │    353.05 / 376.01 ±22.93 / 414.75 ms │     no change │
│ QQuery 16 │    717.04 / 731.91 ±17.98 / 766.45 ms │    714.64 / 749.96 ±28.15 / 784.01 ms │     no change │
│ QQuery 17 │     711.73 / 718.92 ±4.04 / 723.31 ms │     713.17 / 717.82 ±5.10 / 727.41 ms │     no change │
│ QQuery 18 │ 1419.90 / 1476.97 ±45.70 / 1523.25 ms │  1361.04 / 1376.04 ±9.64 / 1390.97 ms │ +1.07x faster │
│ QQuery 19 │       35.97 / 46.26 ±19.56 / 85.37 ms │        35.78 / 38.31 ±1.93 / 41.70 ms │ +1.21x faster │
│ QQuery 20 │    712.30 / 733.16 ±16.60 / 755.99 ms │     707.03 / 714.65 ±8.59 / 731.42 ms │     no change │
│ QQuery 21 │     767.93 / 773.25 ±4.38 / 778.46 ms │     757.44 / 762.39 ±4.21 / 769.10 ms │     no change │
│ QQuery 22 │  1137.01 / 1149.77 ±8.70 / 1162.30 ms │  1134.94 / 1140.41 ±5.69 / 1150.59 ms │     no change │
│ QQuery 23 │ 3090.99 / 3109.19 ±13.77 / 3131.70 ms │ 3079.16 / 3106.21 ±14.77 / 3123.90 ms │     no change │
│ QQuery 24 │     100.24 / 103.67 ±2.55 / 106.89 ms │     100.11 / 102.94 ±1.70 / 105.16 ms │     no change │
│ QQuery 25 │     139.49 / 141.35 ±1.43 / 143.90 ms │     137.95 / 141.81 ±2.79 / 146.51 ms │     no change │
│ QQuery 26 │      98.88 / 101.65 ±1.99 / 104.34 ms │      98.97 / 103.22 ±2.21 / 104.74 ms │     no change │
│ QQuery 27 │     852.53 / 858.70 ±9.74 / 878.11 ms │     846.79 / 851.11 ±4.26 / 857.73 ms │     no change │
│ QQuery 28 │ 3273.16 / 3306.19 ±16.95 / 3319.21 ms │ 3289.86 / 3315.39 ±20.65 / 3344.13 ms │     no change │
│ QQuery 29 │        50.27 / 54.97 ±4.49 / 62.93 ms │        50.24 / 56.60 ±5.57 / 65.85 ms │     no change │
│ QQuery 30 │     361.99 / 367.45 ±5.71 / 374.86 ms │     354.82 / 363.42 ±7.55 / 376.71 ms │     no change │
│ QQuery 31 │     354.41 / 371.28 ±9.10 / 378.15 ms │    361.59 / 378.76 ±12.41 / 394.38 ms │     no change │
│ QQuery 32 │ 1214.59 / 1260.10 ±34.96 / 1305.41 ms │ 1041.72 / 1056.56 ±15.17 / 1084.81 ms │ +1.19x faster │
│ QQuery 33 │ 1515.52 / 1570.85 ±38.41 / 1634.04 ms │  1469.34 / 1474.38 ±7.24 / 1488.32 ms │ +1.07x faster │
│ QQuery 34 │ 1485.86 / 1532.37 ±26.82 / 1565.04 ms │  1477.09 / 1487.24 ±7.28 / 1496.50 ms │     no change │
│ QQuery 35 │    393.36 / 426.17 ±54.55 / 534.85 ms │     391.43 / 401.95 ±7.89 / 411.93 ms │ +1.06x faster │
│ QQuery 36 │     115.02 / 120.80 ±3.82 / 125.19 ms │     118.07 / 122.65 ±3.64 / 128.83 ms │     no change │
│ QQuery 37 │        49.52 / 51.49 ±1.92 / 55.07 ms │        47.48 / 50.62 ±1.83 / 52.79 ms │     no change │
│ QQuery 38 │        74.07 / 76.49 ±1.25 / 77.52 ms │        75.14 / 77.58 ±1.56 / 79.76 ms │     no change │
│ QQuery 39 │     209.85 / 215.78 ±4.18 / 220.73 ms │     203.18 / 218.92 ±8.79 / 228.06 ms │     no change │
│ QQuery 40 │        24.46 / 25.99 ±1.19 / 27.44 ms │        21.42 / 23.66 ±1.70 / 26.54 ms │ +1.10x faster │
│ QQuery 41 │        20.66 / 22.64 ±2.61 / 27.69 ms │        19.87 / 21.02 ±1.09 / 22.36 ms │ +1.08x faster │
│ QQuery 42 │        19.06 / 19.93 ±0.46 / 20.34 ms │        19.08 / 20.03 ±0.64 / 21.10 ms │     no change │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                               ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                               │ 23002.46ms │
│ Total Time (feat_reorder-row-groups-by-stats)   │ 22498.00ms │
│ Average Time (HEAD)                             │   534.94ms │
│ Average Time (feat_reorder-row-groups-by-stats) │   523.21ms │
│ Queries Faster                                  │          7 │
│ Queries Slower                                  │          1 │
│ Queries with No Change                          │         35 │
│ Queries with Failure                            │          0 │
└─────────────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric Value
Wall time 115.8s
Peak memory 36.4 GiB
Avg memory 27.0 GiB
CPU user 1079.4s
CPU sys 98.2s
Peak spill 0 B

clickbench_partitioned — branch

Metric Value
Wall time 113.5s
Peak memory 40.1 GiB
Avg memory 33.2 GiB
CPU user 1075.7s
CPU sys 81.0s
Peak spill 0 B

File an issue against this benchmark runner

@Dandandan
Copy link
Copy Markdown
Contributor

run benchmark clickbench_partitioned clickbench_extended

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4234477729-1140-pwvsf 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/reorder-row-groups-by-stats (5018882) to 29c5dd5 (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4234477729-1141-9x5wm 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/reorder-row-groups-by-stats (5018882) to 29c5dd5 (merge-base) diff using: clickbench_extended
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and feat_reorder-row-groups-by-stats
--------------------
Benchmark tpcds_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                     HEAD ┃         feat_reorder-row-groups-by-stats ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │              6.60 / 7.07 ±0.83 / 8.73 ms │              6.60 / 7.06 ±0.83 / 8.71 ms │     no change │
│ QQuery 2  │        143.88 / 144.99 ±1.22 / 147.28 ms │        145.49 / 146.39 ±0.75 / 147.36 ms │     no change │
│ QQuery 3  │        114.20 / 115.49 ±1.27 / 117.17 ms │        113.54 / 114.31 ±0.75 / 115.71 ms │     no change │
│ QQuery 4  │    1407.17 / 1446.02 ±28.01 / 1488.27 ms │    1346.64 / 1364.53 ±15.80 / 1393.25 ms │ +1.06x faster │
│ QQuery 5  │        171.78 / 174.25 ±2.64 / 179.25 ms │        172.57 / 174.59 ±1.43 / 176.25 ms │     no change │
│ QQuery 6  │       849.48 / 877.16 ±22.87 / 905.49 ms │       828.07 / 868.05 ±30.14 / 901.99 ms │     no change │
│ QQuery 7  │        343.32 / 346.59 ±2.93 / 350.49 ms │        341.03 / 345.45 ±3.39 / 351.37 ms │     no change │
│ QQuery 8  │        117.21 / 119.39 ±1.66 / 121.19 ms │        117.29 / 118.25 ±1.00 / 119.81 ms │     no change │
│ QQuery 9  │        101.98 / 105.46 ±2.06 / 107.89 ms │        101.30 / 103.91 ±2.47 / 107.10 ms │     no change │
│ QQuery 10 │        105.17 / 106.91 ±1.15 / 108.76 ms │        104.99 / 106.26 ±0.68 / 106.86 ms │     no change │
│ QQuery 11 │       950.10 / 964.75 ±15.56 / 992.71 ms │       952.07 / 966.13 ±10.18 / 977.82 ms │     no change │
│ QQuery 12 │           49.17 / 50.87 ±1.79 / 53.14 ms │           44.24 / 45.58 ±1.32 / 48.07 ms │ +1.12x faster │
│ QQuery 13 │        400.79 / 408.52 ±5.63 / 417.80 ms │        401.25 / 405.70 ±3.28 / 410.32 ms │     no change │
│ QQuery 14 │     1004.14 / 1009.46 ±3.40 / 1013.29 ms │     1004.37 / 1006.86 ±1.62 / 1008.42 ms │     no change │
│ QQuery 15 │           15.60 / 16.29 ±0.72 / 17.64 ms │           15.34 / 16.91 ±1.14 / 18.64 ms │     no change │
│ QQuery 16 │              7.29 / 7.58 ±0.23 / 7.82 ms │              7.31 / 7.77 ±0.29 / 8.12 ms │     no change │
│ QQuery 17 │        229.09 / 231.19 ±1.64 / 233.94 ms │        227.78 / 229.37 ±1.34 / 231.54 ms │     no change │
│ QQuery 18 │        129.42 / 129.76 ±0.34 / 130.31 ms │        126.89 / 128.72 ±1.18 / 130.47 ms │     no change │
│ QQuery 19 │        154.96 / 157.06 ±1.46 / 158.84 ms │        155.11 / 156.36 ±1.00 / 157.47 ms │     no change │
│ QQuery 20 │           13.40 / 14.08 ±0.44 / 14.76 ms │           13.71 / 14.26 ±0.30 / 14.57 ms │     no change │
│ QQuery 21 │           18.94 / 19.61 ±0.35 / 19.91 ms │           19.53 / 19.89 ±0.31 / 20.34 ms │     no change │
│ QQuery 22 │        486.41 / 490.25 ±2.46 / 492.89 ms │        485.59 / 488.57 ±2.14 / 491.78 ms │     no change │
│ QQuery 23 │        881.75 / 888.47 ±6.90 / 897.22 ms │        874.89 / 884.92 ±8.92 / 901.54 ms │     no change │
│ QQuery 24 │        382.00 / 384.91 ±2.95 / 389.94 ms │        381.40 / 383.88 ±3.14 / 389.86 ms │     no change │
│ QQuery 25 │        340.39 / 342.38 ±1.35 / 343.76 ms │        336.89 / 340.18 ±2.75 / 343.84 ms │     no change │
│ QQuery 26 │           82.02 / 82.93 ±0.75 / 84.22 ms │           81.69 / 83.69 ±2.44 / 87.06 ms │     no change │
│ QQuery 27 │              7.14 / 7.67 ±0.78 / 9.20 ms │              6.75 / 6.99 ±0.29 / 7.51 ms │ +1.10x faster │
│ QQuery 28 │        148.11 / 151.11 ±2.46 / 155.58 ms │        148.59 / 150.08 ±0.99 / 151.50 ms │     no change │
│ QQuery 29 │        280.02 / 283.14 ±1.85 / 285.10 ms │        278.47 / 282.20 ±2.15 / 284.39 ms │     no change │
│ QQuery 30 │           43.46 / 46.42 ±1.97 / 48.60 ms │           43.38 / 45.04 ±1.56 / 47.82 ms │     no change │
│ QQuery 31 │        169.78 / 171.51 ±1.02 / 172.58 ms │        171.26 / 173.61 ±1.70 / 175.62 ms │     no change │
│ QQuery 32 │           56.82 / 58.73 ±1.23 / 60.51 ms │           57.19 / 57.73 ±0.63 / 58.94 ms │     no change │
│ QQuery 33 │        141.79 / 142.90 ±0.89 / 144.49 ms │        140.06 / 142.63 ±2.83 / 147.88 ms │     no change │
│ QQuery 34 │              7.10 / 7.27 ±0.16 / 7.54 ms │             7.31 / 8.11 ±1.00 / 10.04 ms │  1.12x slower │
│ QQuery 35 │        105.24 / 108.18 ±1.55 / 109.74 ms │        113.10 / 114.31 ±1.13 / 115.81 ms │  1.06x slower │
│ QQuery 36 │              6.52 / 6.61 ±0.11 / 6.82 ms │              6.69 / 7.12 ±0.48 / 8.04 ms │  1.08x slower │
│ QQuery 37 │             8.66 / 9.39 ±0.80 / 10.84 ms │             8.66 / 9.51 ±0.66 / 10.70 ms │     no change │
│ QQuery 38 │           86.45 / 88.58 ±2.96 / 94.37 ms │           87.34 / 90.25 ±4.29 / 98.75 ms │     no change │
│ QQuery 39 │        125.56 / 128.65 ±2.68 / 132.66 ms │        126.42 / 130.74 ±3.44 / 136.60 ms │     no change │
│ QQuery 40 │        108.75 / 116.53 ±6.97 / 129.42 ms │        120.88 / 127.63 ±9.32 / 145.94 ms │  1.10x slower │
│ QQuery 41 │           14.34 / 15.28 ±0.58 / 16.07 ms │           14.30 / 15.82 ±1.19 / 17.47 ms │     no change │
│ QQuery 42 │        108.24 / 109.86 ±1.55 / 112.63 ms │        108.34 / 109.85 ±0.93 / 110.82 ms │     no change │
│ QQuery 43 │              6.00 / 6.12 ±0.12 / 6.31 ms │              5.93 / 6.03 ±0.12 / 6.27 ms │     no change │
│ QQuery 44 │           11.93 / 12.85 ±0.98 / 14.29 ms │           11.79 / 12.23 ±0.34 / 12.81 ms │     no change │
│ QQuery 45 │           51.59 / 52.20 ±0.71 / 53.58 ms │           50.50 / 51.48 ±0.80 / 52.59 ms │     no change │
│ QQuery 46 │              8.37 / 8.86 ±0.32 / 9.30 ms │              8.22 / 8.55 ±0.21 / 8.79 ms │     no change │
│ QQuery 47 │        730.15 / 735.98 ±6.90 / 748.40 ms │        705.59 / 712.82 ±4.86 / 720.66 ms │     no change │
│ QQuery 48 │        293.14 / 296.48 ±3.12 / 301.21 ms │        294.01 / 296.74 ±2.30 / 300.54 ms │     no change │
│ QQuery 49 │        250.28 / 253.44 ±3.22 / 259.53 ms │        251.81 / 253.12 ±1.05 / 254.43 ms │     no change │
│ QQuery 50 │        226.01 / 230.32 ±4.01 / 235.24 ms │        220.67 / 223.64 ±2.76 / 228.09 ms │     no change │
│ QQuery 51 │        183.04 / 185.25 ±2.09 / 189.07 ms │        178.31 / 181.98 ±1.95 / 184.09 ms │     no change │
│ QQuery 52 │        107.65 / 110.58 ±3.03 / 116.28 ms │        108.42 / 110.26 ±2.22 / 114.63 ms │     no change │
│ QQuery 53 │        102.87 / 103.59 ±0.90 / 105.24 ms │        103.27 / 104.20 ±1.01 / 106.01 ms │     no change │
│ QQuery 54 │        144.26 / 147.65 ±2.00 / 150.36 ms │        145.75 / 148.22 ±2.27 / 152.02 ms │     no change │
│ QQuery 55 │        107.20 / 108.13 ±0.76 / 109.28 ms │        107.44 / 109.68 ±1.38 / 111.81 ms │     no change │
│ QQuery 56 │        141.05 / 142.32 ±1.01 / 144.15 ms │        140.48 / 142.52 ±1.42 / 144.84 ms │     no change │
│ QQuery 57 │        172.82 / 175.12 ±1.39 / 176.89 ms │        174.64 / 176.19 ±1.47 / 178.51 ms │     no change │
│ QQuery 58 │        286.62 / 296.24 ±6.87 / 305.53 ms │       285.31 / 298.28 ±13.20 / 317.51 ms │     no change │
│ QQuery 59 │        199.23 / 200.95 ±1.69 / 204.20 ms │        195.69 / 199.36 ±3.05 / 203.59 ms │     no change │
│ QQuery 60 │        144.67 / 145.48 ±0.66 / 146.41 ms │        142.34 / 143.44 ±1.31 / 145.79 ms │     no change │
│ QQuery 61 │           12.99 / 13.45 ±0.35 / 13.95 ms │           12.73 / 13.06 ±0.22 / 13.34 ms │     no change │
│ QQuery 62 │       904.73 / 932.43 ±16.55 / 947.84 ms │       901.55 / 934.20 ±25.10 / 966.87 ms │     no change │
│ QQuery 63 │        103.15 / 106.72 ±3.02 / 110.78 ms │        103.85 / 105.22 ±1.02 / 106.83 ms │     no change │
│ QQuery 64 │        683.07 / 685.79 ±2.81 / 690.87 ms │        680.75 / 687.10 ±3.46 / 690.59 ms │     no change │
│ QQuery 65 │        246.22 / 253.56 ±4.22 / 258.12 ms │        252.05 / 256.03 ±3.55 / 262.20 ms │     no change │
│ QQuery 66 │       234.63 / 253.48 ±10.76 / 265.83 ms │        247.60 / 256.44 ±7.16 / 265.72 ms │     no change │
│ QQuery 67 │        307.25 / 316.77 ±5.71 / 323.28 ms │       319.99 / 334.45 ±14.63 / 357.79 ms │  1.06x slower │
│ QQuery 68 │           10.40 / 11.74 ±1.30 / 14.02 ms │            9.81 / 10.88 ±0.79 / 12.24 ms │ +1.08x faster │
│ QQuery 69 │        100.32 / 103.93 ±2.11 / 106.32 ms │        102.81 / 105.31 ±1.32 / 106.40 ms │     no change │
│ QQuery 70 │       342.77 / 354.40 ±11.96 / 373.37 ms │        337.23 / 344.94 ±6.28 / 351.91 ms │     no change │
│ QQuery 71 │        134.41 / 137.03 ±1.43 / 138.73 ms │        136.55 / 137.88 ±1.11 / 139.85 ms │     no change │
│ QQuery 72 │        611.97 / 618.14 ±5.14 / 627.10 ms │       605.50 / 623.82 ±12.23 / 637.66 ms │     no change │
│ QQuery 73 │              7.45 / 8.17 ±0.58 / 9.07 ms │             7.32 / 8.36 ±1.07 / 10.11 ms │     no change │
│ QQuery 74 │        581.34 / 592.24 ±8.46 / 606.84 ms │        574.83 / 587.08 ±9.45 / 597.06 ms │     no change │
│ QQuery 75 │        277.59 / 280.04 ±2.61 / 285.00 ms │        275.81 / 279.40 ±2.65 / 283.37 ms │     no change │
│ QQuery 76 │        131.53 / 133.57 ±1.57 / 136.07 ms │        131.98 / 133.99 ±1.18 / 135.67 ms │     no change │
│ QQuery 77 │        188.69 / 190.76 ±1.26 / 192.15 ms │        189.33 / 190.25 ±0.58 / 191.04 ms │     no change │
│ QQuery 78 │        340.49 / 343.98 ±3.33 / 350.02 ms │        339.79 / 342.76 ±2.82 / 346.34 ms │     no change │
│ QQuery 79 │        233.09 / 234.57 ±1.62 / 237.15 ms │        233.94 / 236.02 ±1.24 / 237.23 ms │     no change │
│ QQuery 80 │        320.55 / 323.94 ±2.76 / 327.30 ms │        321.31 / 326.39 ±2.91 / 329.11 ms │     no change │
│ QQuery 81 │           26.33 / 27.38 ±0.68 / 28.20 ms │           26.48 / 27.22 ±0.62 / 28.21 ms │     no change │
│ QQuery 82 │        197.82 / 199.31 ±2.29 / 203.86 ms │        198.55 / 200.71 ±2.16 / 203.59 ms │     no change │
│ QQuery 83 │           39.37 / 41.36 ±2.24 / 45.22 ms │           38.52 / 39.36 ±1.33 / 42.00 ms │     no change │
│ QQuery 84 │           48.63 / 49.58 ±0.88 / 50.80 ms │           48.77 / 49.40 ±0.39 / 49.92 ms │     no change │
│ QQuery 85 │        147.39 / 148.66 ±1.16 / 150.63 ms │        147.83 / 148.63 ±0.66 / 149.75 ms │     no change │
│ QQuery 86 │           38.52 / 40.01 ±1.14 / 41.54 ms │           39.86 / 40.90 ±0.87 / 42.04 ms │     no change │
│ QQuery 87 │           85.60 / 88.73 ±3.70 / 95.81 ms │           85.60 / 88.35 ±3.32 / 94.88 ms │     no change │
│ QQuery 88 │        100.63 / 101.95 ±0.96 / 103.51 ms │         99.93 / 101.22 ±1.04 / 102.68 ms │     no change │
│ QQuery 89 │        118.81 / 119.79 ±1.26 / 122.07 ms │        118.70 / 119.92 ±1.02 / 121.42 ms │     no change │
│ QQuery 90 │           23.99 / 24.20 ±0.20 / 24.55 ms │           22.99 / 24.11 ±0.67 / 24.90 ms │     no change │
│ QQuery 91 │           61.98 / 64.38 ±1.66 / 66.73 ms │           62.03 / 64.30 ±2.30 / 68.74 ms │     no change │
│ QQuery 92 │           57.67 / 58.07 ±0.31 / 58.44 ms │           57.81 / 59.39 ±1.21 / 61.43 ms │     no change │
│ QQuery 93 │        184.73 / 185.90 ±0.88 / 187.18 ms │        185.38 / 188.28 ±1.90 / 190.69 ms │     no change │
│ QQuery 94 │           61.74 / 62.66 ±0.75 / 63.87 ms │           60.38 / 62.32 ±1.48 / 64.92 ms │     no change │
│ QQuery 95 │        127.91 / 128.82 ±0.55 / 129.40 ms │        127.77 / 128.56 ±0.72 / 129.74 ms │     no change │
│ QQuery 96 │           73.22 / 74.44 ±0.77 / 75.59 ms │           73.32 / 74.75 ±1.10 / 76.65 ms │     no change │
│ QQuery 97 │        125.16 / 126.41 ±0.79 / 127.42 ms │        124.06 / 127.60 ±2.47 / 130.65 ms │     no change │
│ QQuery 98 │        154.18 / 156.03 ±1.73 / 159.27 ms │        153.08 / 156.89 ±2.23 / 159.74 ms │     no change │
│ QQuery 99 │ 10778.40 / 10822.79 ±36.20 / 10879.92 ms │ 10738.74 / 10797.52 ±47.31 / 10877.80 ms │     no change │
└───────────┴──────────────────────────────────────────┴──────────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                               ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                               │ 31720.03ms │
│ Total Time (feat_reorder-row-groups-by-stats)   │ 31590.96ms │
│ Average Time (HEAD)                             │   320.40ms │
│ Average Time (feat_reorder-row-groups-by-stats) │   319.10ms │
│ Queries Faster                                  │          4 │
│ Queries Slower                                  │          5 │
│ Queries with No Change                          │         90 │
│ Queries with Failure                            │          0 │
└─────────────────────────────────────────────────┴────────────┘

Resource Usage

tpcds — base (merge-base)

Metric Value
Wall time 158.9s
Peak memory 5.5 GiB
Avg memory 4.5 GiB
CPU user 261.6s
CPU sys 17.7s
Peak spill 0 B

tpcds — branch

Metric Value
Wall time 158.3s
Peak memory 5.5 GiB
Avg memory 4.7 GiB
CPU user 260.4s
CPU sys 17.4s
Peak spill 0 B

File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and feat_reorder-row-groups-by-stats
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃      feat_reorder-row-groups-by-stats ┃       Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 0  │          1.19 / 4.47 ±6.41 / 17.29 ms │          1.18 / 4.49 ±6.46 / 17.40 ms │    no change │
│ QQuery 1  │        14.15 / 14.60 ±0.26 / 14.84 ms │        14.15 / 14.56 ±0.22 / 14.82 ms │    no change │
│ QQuery 2  │        44.34 / 44.86 ±0.43 / 45.63 ms │        43.30 / 43.63 ±0.25 / 44.03 ms │    no change │
│ QQuery 3  │        44.54 / 45.82 ±1.10 / 47.71 ms │        43.32 / 44.20 ±1.01 / 46.01 ms │    no change │
│ QQuery 4  │     292.18 / 299.58 ±6.01 / 307.53 ms │     286.50 / 297.56 ±6.69 / 305.63 ms │    no change │
│ QQuery 5  │     347.46 / 350.76 ±2.10 / 353.17 ms │     346.62 / 348.87 ±1.91 / 351.08 ms │    no change │
│ QQuery 6  │          5.72 / 7.22 ±1.67 / 10.45 ms │         5.59 / 10.51 ±5.76 / 21.61 ms │ 1.46x slower │
│ QQuery 7  │        16.96 / 17.07 ±0.12 / 17.27 ms │        16.65 / 16.92 ±0.19 / 17.22 ms │    no change │
│ QQuery 8  │     417.45 / 427.68 ±7.65 / 440.48 ms │     426.14 / 431.89 ±5.38 / 441.00 ms │    no change │
│ QQuery 9  │     677.82 / 684.73 ±7.88 / 698.33 ms │    648.52 / 655.74 ±10.01 / 675.29 ms │    no change │
│ QQuery 10 │        94.83 / 95.71 ±0.80 / 97.09 ms │        90.54 / 93.26 ±2.43 / 97.71 ms │    no change │
│ QQuery 11 │     107.39 / 107.95 ±0.67 / 108.92 ms │     104.16 / 105.15 ±0.72 / 106.34 ms │    no change │
│ QQuery 12 │     349.32 / 356.12 ±4.19 / 361.45 ms │     338.19 / 342.31 ±2.21 / 344.83 ms │    no change │
│ QQuery 13 │    452.34 / 466.84 ±13.61 / 486.92 ms │    441.75 / 464.03 ±17.71 / 491.08 ms │    no change │
│ QQuery 14 │     348.39 / 351.36 ±3.27 / 356.18 ms │     348.22 / 351.57 ±1.94 / 353.55 ms │    no change │
│ QQuery 15 │    357.19 / 373.85 ±17.96 / 408.62 ms │     362.86 / 368.16 ±5.64 / 376.07 ms │    no change │
│ QQuery 16 │     714.87 / 726.32 ±6.98 / 736.12 ms │    741.65 / 757.99 ±13.25 / 781.76 ms │    no change │
│ QQuery 17 │    716.44 / 748.86 ±25.39 / 773.58 ms │     721.88 / 729.91 ±6.46 / 738.41 ms │    no change │
│ QQuery 18 │ 1373.87 / 1427.78 ±45.55 / 1482.21 ms │ 1434.15 / 1503.93 ±35.02 / 1525.61 ms │ 1.05x slower │
│ QQuery 19 │        35.59 / 36.41 ±0.62 / 37.03 ms │        36.31 / 38.22 ±1.99 / 41.73 ms │    no change │
│ QQuery 20 │    713.38 / 725.99 ±13.06 / 742.01 ms │    716.03 / 731.70 ±15.57 / 761.50 ms │    no change │
│ QQuery 21 │     765.34 / 768.87 ±3.09 / 772.97 ms │     762.06 / 764.19 ±1.70 / 767.11 ms │    no change │
│ QQuery 22 │  1134.09 / 1142.01 ±5.27 / 1147.72 ms │  1132.10 / 1137.95 ±4.16 / 1143.66 ms │    no change │
│ QQuery 23 │ 3094.85 / 3120.29 ±14.36 / 3137.68 ms │ 3077.09 / 3115.46 ±20.88 / 3134.31 ms │    no change │
│ QQuery 24 │     100.75 / 103.14 ±1.95 / 106.05 ms │     100.97 / 103.96 ±2.97 / 108.54 ms │    no change │
│ QQuery 25 │     139.98 / 141.47 ±1.44 / 144.06 ms │     138.30 / 140.65 ±1.57 / 142.80 ms │    no change │
│ QQuery 26 │     101.23 / 102.52 ±0.77 / 103.55 ms │     102.40 / 104.22 ±1.40 / 105.89 ms │    no change │
│ QQuery 27 │     855.42 / 859.58 ±5.27 / 869.85 ms │     855.77 / 861.08 ±5.43 / 869.79 ms │    no change │
│ QQuery 28 │ 3284.27 / 3308.55 ±14.18 / 3325.76 ms │ 3289.54 / 3316.91 ±14.83 / 3330.08 ms │    no change │
│ QQuery 29 │        50.39 / 55.78 ±5.11 / 62.80 ms │        51.97 / 56.29 ±4.39 / 63.23 ms │    no change │
│ QQuery 30 │     357.77 / 370.56 ±7.03 / 377.46 ms │     362.16 / 368.32 ±5.55 / 378.64 ms │    no change │
│ QQuery 31 │    363.55 / 385.00 ±12.49 / 398.04 ms │     398.54 / 401.59 ±2.74 / 405.02 ms │    no change │
│ QQuery 32 │ 1034.15 / 1059.31 ±22.35 / 1100.09 ms │ 1173.83 / 1288.89 ±81.25 / 1419.40 ms │ 1.22x slower │
│ QQuery 33 │ 1472.92 / 1487.84 ±11.01 / 1499.14 ms │ 1466.40 / 1513.38 ±43.97 / 1593.67 ms │    no change │
│ QQuery 34 │ 1464.67 / 1499.90 ±31.40 / 1548.80 ms │ 1475.45 / 1491.40 ±14.74 / 1517.30 ms │    no change │
│ QQuery 35 │     390.93 / 396.99 ±5.12 / 404.97 ms │     392.12 / 396.84 ±3.54 / 401.88 ms │    no change │
│ QQuery 36 │     120.42 / 122.98 ±1.62 / 125.38 ms │     119.20 / 123.01 ±3.25 / 127.60 ms │    no change │
│ QQuery 37 │        49.66 / 50.72 ±1.27 / 53.16 ms │        47.35 / 50.08 ±1.55 / 51.79 ms │    no change │
│ QQuery 38 │        76.35 / 78.01 ±1.50 / 80.66 ms │        76.64 / 77.90 ±0.90 / 78.73 ms │    no change │
│ QQuery 39 │     208.12 / 219.98 ±6.84 / 229.12 ms │     220.84 / 223.45 ±1.88 / 225.94 ms │    no change │
│ QQuery 40 │        24.82 / 25.18 ±0.37 / 25.85 ms │        24.34 / 26.23 ±2.29 / 30.09 ms │    no change │
│ QQuery 41 │        20.47 / 21.79 ±1.17 / 23.54 ms │        20.58 / 21.45 ±0.94 / 23.06 ms │    no change │
│ QQuery 42 │        19.76 / 20.16 ±0.31 / 20.63 ms │        19.68 / 20.30 ±0.47 / 21.02 ms │    no change │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                               ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                               │ 22654.62ms │
│ Total Time (feat_reorder-row-groups-by-stats)   │ 22958.16ms │
│ Average Time (HEAD)                             │   526.85ms │
│ Average Time (feat_reorder-row-groups-by-stats) │   533.91ms │
│ Queries Faster                                  │          0 │
│ Queries Slower                                  │          3 │
│ Queries with No Change                          │         40 │
│ Queries with Failure                            │          0 │
└─────────────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric Value
Wall time 114.5s
Peak memory 42.0 GiB
Avg memory 32.4 GiB
CPU user 1080.8s
CPU sys 84.9s
Peak spill 0 B

clickbench_partitioned — branch

Metric Value
Wall time 115.9s
Peak memory 37.7 GiB
Avg memory 28.3 GiB
CPU user 1081.1s
CPU sys 96.2s
Peak spill 0 B

File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and feat_reorder-row-groups-by-stats
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃      feat_reorder-row-groups-by-stats ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │          1.34 / 4.79 ±6.61 / 18.02 ms │          1.22 / 4.58 ±6.53 / 17.64 ms │     no change │
│ QQuery 1  │        15.01 / 15.50 ±0.43 / 16.28 ms │        14.27 / 14.85 ±0.31 / 15.10 ms │     no change │
│ QQuery 2  │        45.69 / 46.01 ±0.28 / 46.38 ms │        44.25 / 44.67 ±0.33 / 45.07 ms │     no change │
│ QQuery 3  │        45.23 / 49.08 ±3.08 / 53.24 ms │        44.46 / 47.22 ±1.51 / 48.61 ms │     no change │
│ QQuery 4  │    307.65 / 329.18 ±15.17 / 353.16 ms │    333.50 / 349.92 ±10.12 / 363.58 ms │  1.06x slower │
│ QQuery 5  │     383.18 / 390.58 ±5.85 / 398.30 ms │    375.25 / 389.29 ±13.55 / 414.94 ms │     no change │
│ QQuery 6  │          5.21 / 7.51 ±2.87 / 13.10 ms │           5.67 / 7.06 ±1.14 / 8.47 ms │ +1.06x faster │
│ QQuery 7  │        17.58 / 18.20 ±0.57 / 19.20 ms │        17.81 / 21.41 ±6.39 / 34.18 ms │  1.18x slower │
│ QQuery 8  │     467.34 / 477.25 ±9.77 / 495.30 ms │    473.21 / 490.70 ±19.41 / 524.50 ms │     no change │
│ QQuery 9  │    699.39 / 729.91 ±22.29 / 765.71 ms │    745.55 / 764.99 ±14.67 / 789.00 ms │     no change │
│ QQuery 10 │      99.31 / 102.80 ±4.68 / 111.76 ms │       95.14 / 99.07 ±3.46 / 104.88 ms │     no change │
│ QQuery 11 │     107.63 / 109.49 ±1.08 / 110.50 ms │     110.95 / 113.80 ±2.44 / 117.91 ms │     no change │
│ QQuery 12 │     389.20 / 393.99 ±3.59 / 399.55 ms │    378.13 / 397.72 ±15.36 / 416.18 ms │     no change │
│ QQuery 13 │    497.82 / 519.79 ±18.11 / 553.00 ms │    507.72 / 534.68 ±18.52 / 564.44 ms │     no change │
│ QQuery 14 │    356.26 / 382.90 ±14.21 / 394.37 ms │     378.78 / 390.21 ±8.26 / 404.19 ms │     no change │
│ QQuery 15 │    397.89 / 421.76 ±20.69 / 455.22 ms │    406.54 / 430.62 ±31.51 / 491.99 ms │     no change │
│ QQuery 16 │    817.79 / 843.15 ±20.23 / 870.39 ms │    795.58 / 834.18 ±21.31 / 858.28 ms │     no change │
│ QQuery 17 │    769.11 / 793.23 ±12.93 / 806.04 ms │    790.19 / 823.71 ±33.40 / 886.71 ms │     no change │
│ QQuery 18 │ 1592.82 / 1638.07 ±31.84 / 1675.50 ms │ 1536.04 / 1625.95 ±49.00 / 1673.65 ms │     no change │
│ QQuery 19 │        36.17 / 38.43 ±2.73 / 41.97 ms │       39.25 / 52.79 ±14.25 / 76.08 ms │  1.37x slower │
│ QQuery 20 │    742.56 / 763.72 ±21.08 / 796.51 ms │    747.36 / 771.03 ±35.37 / 841.37 ms │     no change │
│ QQuery 21 │     787.42 / 799.07 ±8.52 / 810.66 ms │     794.97 / 798.60 ±2.97 / 803.45 ms │     no change │
│ QQuery 22 │  1173.63 / 1184.40 ±7.48 / 1192.57 ms │  1187.50 / 1195.28 ±6.21 / 1202.63 ms │     no change │
│ QQuery 23 │ 3281.57 / 3306.51 ±21.44 / 3343.54 ms │ 3275.73 / 3301.49 ±20.45 / 3332.33 ms │     no change │
│ QQuery 24 │     108.95 / 111.32 ±1.91 / 114.39 ms │     107.23 / 110.08 ±3.29 / 116.30 ms │     no change │
│ QQuery 25 │     144.33 / 146.42 ±1.45 / 148.05 ms │     143.15 / 145.40 ±1.34 / 146.55 ms │     no change │
│ QQuery 26 │     107.01 / 108.68 ±1.66 / 111.45 ms │     105.69 / 108.26 ±1.82 / 110.40 ms │     no change │
│ QQuery 27 │     883.03 / 891.30 ±4.95 / 898.28 ms │    874.98 / 887.56 ±12.77 / 911.22 ms │     no change │
│ QQuery 28 │ 3386.14 / 3425.51 ±28.74 / 3464.68 ms │ 3398.98 / 3422.81 ±12.65 / 3436.83 ms │     no change │
│ QQuery 29 │        53.27 / 58.56 ±6.00 / 69.01 ms │        52.73 / 57.20 ±4.79 / 64.77 ms │     no change │
│ QQuery 30 │     405.31 / 409.26 ±4.00 / 416.32 ms │     393.87 / 407.38 ±7.40 / 414.80 ms │     no change │
│ QQuery 31 │    383.84 / 403.31 ±16.85 / 432.32 ms │    397.47 / 427.43 ±17.62 / 452.58 ms │  1.06x slower │
│ QQuery 32 │ 1072.71 / 1165.40 ±47.01 / 1203.37 ms │ 1232.95 / 1400.42 ±92.57 / 1512.21 ms │  1.20x slower │
│ QQuery 33 │ 1640.38 / 1658.01 ±11.14 / 1675.05 ms │ 1633.63 / 1658.56 ±17.62 / 1686.57 ms │     no change │
│ QQuery 34 │ 1681.98 / 1711.27 ±17.86 / 1732.02 ms │ 1664.80 / 1679.12 ±11.31 / 1694.41 ms │     no change │
│ QQuery 35 │    454.45 / 479.47 ±16.71 / 500.66 ms │    462.20 / 481.40 ±13.83 / 502.10 ms │     no change │
│ QQuery 36 │     122.21 / 128.91 ±3.68 / 132.42 ms │     124.76 / 133.75 ±5.72 / 142.63 ms │     no change │
│ QQuery 37 │        52.60 / 56.72 ±3.07 / 61.58 ms │        52.08 / 54.36 ±1.53 / 56.64 ms │     no change │
│ QQuery 38 │        77.39 / 80.30 ±1.75 / 82.02 ms │        79.32 / 81.94 ±2.16 / 84.78 ms │     no change │
│ QQuery 39 │     242.23 / 248.31 ±4.99 / 255.10 ms │     246.21 / 254.21 ±7.55 / 264.73 ms │     no change │
│ QQuery 40 │        28.18 / 30.63 ±1.34 / 32.00 ms │        24.84 / 27.65 ±1.91 / 29.46 ms │ +1.11x faster │
│ QQuery 41 │        22.58 / 23.91 ±0.89 / 25.28 ms │        22.53 / 23.86 ±1.25 / 26.03 ms │     no change │
│ QQuery 42 │        21.19 / 22.21 ±1.14 / 24.33 ms │        21.33 / 22.81 ±1.23 / 24.84 ms │     no change │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                               ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                               │ 24524.79ms │
│ Total Time (feat_reorder-row-groups-by-stats)   │ 24888.03ms │
│ Average Time (HEAD)                             │   570.34ms │
│ Average Time (feat_reorder-row-groups-by-stats) │   578.79ms │
│ Queries Faster                                  │          2 │
│ Queries Slower                                  │          5 │
│ Queries with No Change                          │         36 │
│ Queries with Failure                            │          0 │
└─────────────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric Value
Wall time 123.9s
Peak memory 42.3 GiB
Avg memory 31.0 GiB
CPU user 1165.0s
CPU sys 98.5s
Peak spill 0 B

clickbench_partitioned — branch

Metric Value
Wall time 125.7s
Peak memory 40.7 GiB
Avg memory 29.1 GiB
CPU user 1166.3s
CPU sys 111.6s
Peak spill 0 B

File an issue against this benchmark runner

@Dandandan
Copy link
Copy Markdown
Contributor

I wonder if the ordering should be done before the files / row groups are assigned to partitions? So they are more globally sorted instead of just per partition? It seems now they are sorted within each partition, which should help, but perhaps not nearly as much as it would be if all the partitions contain the optimal row groups?

This would also help in the case of #21581

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and feat_reorder-row-groups-by-stats
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Query     ┃                                   HEAD ┃        feat_reorder-row-groups-by-stats ┃    Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ QQuery 0  │     806.64 / 824.04 ±15.06 / 844.04 ms │       818.76 / 831.86 ±9.65 / 847.07 ms │ no change │
│ QQuery 1  │      207.91 / 208.37 ±0.34 / 208.85 ms │       208.05 / 209.42 ±1.18 / 211.29 ms │ no change │
│ QQuery 2  │      493.00 / 495.57 ±2.04 / 499.15 ms │       501.52 / 504.02 ±1.65 / 505.95 ms │ no change │
│ QQuery 3  │      313.03 / 314.64 ±0.96 / 315.57 ms │       313.38 / 315.81 ±1.51 / 317.65 ms │ no change │
│ QQuery 4  │     656.64 / 674.45 ±10.93 / 686.40 ms │       663.78 / 674.03 ±8.68 / 688.82 ms │ no change │
│ QQuery 5  │ 9437.73 / 9707.73 ±166.88 / 9887.44 ms │ 9679.36 / 9939.30 ±174.56 / 10160.05 ms │ no change │
│ QQuery 6  │  1002.26 / 1011.57 ±14.99 / 1041.49 ms │     997.60 / 1006.50 ±9.67 / 1023.43 ms │ no change │
│ QQuery 7  │     773.67 / 806.98 ±35.77 / 873.62 ms │       778.19 / 786.06 ±5.20 / 792.91 ms │ no change │
│ QQuery 8  │      397.92 / 404.38 ±5.08 / 412.20 ms │       398.58 / 404.24 ±5.67 / 415.04 ms │ no change │
│ QQuery 9  │  2807.44 / 2826.33 ±16.14 / 2853.16 ms │   2754.46 / 2797.70 ±24.98 / 2824.10 ms │ no change │
│ QQuery 10 │      633.75 / 639.16 ±5.96 / 648.49 ms │      631.36 / 642.65 ±13.99 / 670.06 ms │ no change │
│ QQuery 11 │  2047.27 / 2070.44 ±19.89 / 2101.14 ms │   2049.92 / 2079.78 ±21.09 / 2115.19 ms │ no change │
│ QQuery 12 │      200.39 / 202.67 ±2.01 / 205.97 ms │       194.24 / 202.01 ±6.44 / 213.63 ms │ no change │
└───────────┴────────────────────────────────────────┴─────────────────────────────────────────┴───────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                               ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                               │ 20186.32ms │
│ Total Time (feat_reorder-row-groups-by-stats)   │ 20393.39ms │
│ Average Time (HEAD)                             │  1552.79ms │
│ Average Time (feat_reorder-row-groups-by-stats) │  1568.72ms │
│ Queries Faster                                  │          0 │
│ Queries Slower                                  │          0 │
│ Queries with No Change                          │         13 │
│ Queries with Failure                            │          0 │
└─────────────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_extended — base (merge-base)

Metric Value
Wall time 101.8s
Peak memory 32.6 GiB
Avg memory 27.4 GiB
CPU user 981.3s
CPU sys 48.2s
Peak spill 0 B

clickbench_extended — branch

Metric Value
Wall time 102.8s
Peak memory 34.1 GiB
Avg memory 29.7 GiB
CPU user 986.9s
CPU sys 46.1s
Peak spill 0 B

File an issue against this benchmark runner

@zhuqi-lucas
Copy link
Copy Markdown
Contributor Author

zhuqi-lucas commented Apr 13, 2026

I wonder if the ordering should be done before the files / row groups are assigned to partitions? So they are more globally sorted instead of just per partition? It seems now they are sorted within each partition, which should help, but perhaps not nearly as much as it would be if all the partitions contain the optimal row groups?

This would also help in the case of #21581

Great point @Dandandan — you're right that global reorder is much more effective than per-partition reorder. With global reorder + round-robin distribution, each partition's first RG is close to the global optimum, so:

  1. All partitions quickly converge to tight local TopK thresholds in parallel
  2. SPM merging finishes faster because each partition's first few batches contain optimal values → LIMIT can be satisfied with minimal reads across partitions

The current per-partition reorder is limited because even after sorting, partition 0's "best" RG might be much worse than the global best (which may have landed in partition 2).

Moving to global reorder would require changes at the planning / EnforceDistribution layer to load RG statistics and redistribute RGs across partitions. I'd prefer to keep this PR as an incremental step (per-partition) and track global reorder as a follow-up — it would benefit both #21317 and #21581.

Does this make sense?

@Dandandan
Copy link
Copy Markdown
Contributor

I wonder if the ordering should be done before the files / row groups are assigned to partitions? So they are more globally sorted instead of just per partition? It seems now they are sorted within each partition, which should help, but perhaps not nearly as much as it would be if all the partitions contain the optimal row groups?
This would also help in the case of #21581

Great point @Dandandan — you're right that global reorder is much more effective than per-partition reorder. With global reorder + round-robin distribution, each partition's first RG is close to the global optimum, so:

  1. All partitions quickly converge to tight local TopK thresholds in parallel
  2. SPM merging finishes faster because each partition's first few batches contain optimal values → LIMIT can be satisfied with minimal reads across partitions

The current per-partition reorder is limited because even after sorting, partition 0's "best" RG might be much worse than the global best (which may have landed in partition 2).

Moving to global reorder would require changes at the planning / EnforceDistribution layer to load RG statistics and redistribute RGs across partitions. I'd prefer to keep this PR as an incremental step (per-partition) and track global reorder as a follow-up — it would benefit both #21317 and #21581.

Does this make sense?

Sure, makes sense.

Bring back stats init with all issues fixed:
- GtEq/LtEq instead of Gt/Lt (include boundary values)
- Use df.fetch() as limit (TopK K value, not scan limit)
  When K > single RG rows, stats init skips → cumulative prune handles it
- Cast threshold to column data type (parquet vs table schema mismatch)
- Null-aware filter for NULLS FIRST
- Generation check prevents overwrite by later partitions
- Restricted to sort pushdown + pure DynamicFilter (no WHERE)

Stats init and cumulative prune are complementary:
- Stats init: updates PruningPredicate → prunes at RG statistics level
- Cumulative prune: truncates after reorder+reverse → prunes by row count
Both work together without conflict when using df.fetch().
@zhuqi-lucas
Copy link
Copy Markdown
Contributor Author

run benchmark sort_pushdown_inexact

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4288570205-1674-fm6pm 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/reorder-row-groups-by-stats (839ab5a) to 466c3ea (merge-base) diff using: sort_pushdown_inexact
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and feat_reorder-row-groups-by-stats
--------------------
Benchmark sort_pushdown_inexact.json
--------------------
┏━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query ┃                           HEAD ┃ feat_reorder-row-groups-by-stats ┃        Change ┃
┡━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ Q1    │    7.03 / 8.24 ±0.88 / 9.60 ms │      6.50 / 7.22 ±0.75 / 8.60 ms │ +1.14x faster │
│ Q2    │    6.80 / 7.03 ±0.36 / 7.75 ms │      6.68 / 7.23 ±0.77 / 8.74 ms │     no change │
│ Q3    │ 22.18 / 22.45 ±0.27 / 22.82 ms │   21.85 / 22.38 ±0.40 / 23.04 ms │     no change │
│ Q4    │ 20.18 / 21.01 ±0.82 / 22.33 ms │   20.13 / 21.15 ±0.70 / 21.74 ms │     no change │
└───────┴────────────────────────────────┴──────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┓
┃ Benchmark Summary                               ┃         ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━┩
│ Total Time (HEAD)                               │ 58.74ms │
│ Total Time (feat_reorder-row-groups-by-stats)   │ 57.98ms │
│ Average Time (HEAD)                             │ 14.68ms │
│ Average Time (feat_reorder-row-groups-by-stats) │ 14.49ms │
│ Queries Faster                                  │       1 │
│ Queries Slower                                  │       0 │
│ Queries with No Change                          │       3 │
│ Queries with Failure                            │       0 │
└─────────────────────────────────────────────────┴─────────┘

Resource Usage

sort_pushdown_inexact — base (merge-base)

Metric Value
Wall time 5.0s
Peak memory 4.4 GiB
Avg memory 4.4 GiB
CPU user 2.5s
CPU sys 0.4s
Peak spill 0 B

sort_pushdown_inexact — branch

Metric Value
Wall time 5.0s
Peak memory 4.4 GiB
Avg memory 4.4 GiB
CPU user 2.5s
CPU sys 0.3s
Peak spill 0 B

File an issue against this benchmark runner

create_filter() was called before new_sort.fetch was set, so
DynamicFilterPhysicalExpr.fetch was always 0 (or None from old self).
Fix by setting fetch before creating the filter.

This was the root cause of stats init and cumulative prune not
triggering on CI — fetch=0 meant "no rows needed" → skip.
@zhuqi-lucas
Copy link
Copy Markdown
Contributor Author

run benchmark sort_pushdown_inexact

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4289923585-1686-zszkb 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/reorder-row-groups-by-stats (a269ffd) to 466c3ea (merge-base) diff using: sort_pushdown_inexact
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and feat_reorder-row-groups-by-stats
--------------------
Benchmark sort_pushdown_inexact.json
--------------------
┏━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query ┃                           HEAD ┃ feat_reorder-row-groups-by-stats ┃        Change ┃
┡━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ Q1    │    7.14 / 7.68 ±0.85 / 9.34 ms │      2.77 / 3.19 ±0.54 / 4.26 ms │ +2.41x faster │
│ Q2    │    6.75 / 6.99 ±0.27 / 7.47 ms │      2.94 / 3.12 ±0.22 / 3.54 ms │ +2.24x faster │
│ Q3    │ 21.26 / 22.14 ±0.49 / 22.68 ms │      6.67 / 6.87 ±0.16 / 7.11 ms │ +3.22x faster │
│ Q4    │ 19.84 / 21.47 ±0.88 / 22.35 ms │      6.98 / 7.05 ±0.05 / 7.10 ms │ +3.05x faster │
└───────┴────────────────────────────────┴──────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┓
┃ Benchmark Summary                               ┃         ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━┩
│ Total Time (HEAD)                               │ 58.28ms │
│ Total Time (feat_reorder-row-groups-by-stats)   │ 20.24ms │
│ Average Time (HEAD)                             │ 14.57ms │
│ Average Time (feat_reorder-row-groups-by-stats) │  5.06ms │
│ Queries Faster                                  │       4 │
│ Queries Slower                                  │       0 │
│ Queries with No Change                          │       0 │
│ Queries with Failure                            │       0 │
└─────────────────────────────────────────────────┴─────────┘

Resource Usage

sort_pushdown_inexact — base (merge-base)

Metric Value
Wall time 5.0s
Peak memory 4.4 GiB
Avg memory 4.4 GiB
CPU user 2.5s
CPU sys 0.4s
Peak spill 0 B

sort_pushdown_inexact — branch

Metric Value
Wall time 5.0s
Peak memory 4.4 GiB
Avg memory 4.4 GiB
CPU user 0.8s
CPU sys 0.2s
Peak spill 0 B

File an issue against this benchmark runner

@Dandandan
Copy link
Copy Markdown
Contributor

run benchmark clickbench_1

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4291304401-1707-sqn9h 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/reorder-row-groups-by-stats (a269ffd) to 466c3ea (merge-base) diff using: clickbench_1
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and feat_reorder-row-groups-by-stats
--------------------
Benchmark clickbench_1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                   HEAD ┃       feat_reorder-row-groups-by-stats ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │            0.72 / 1.19 ±0.83 / 2.85 ms │            0.71 / 1.05 ±0.64 / 2.32 ms │ +1.13x faster │
│ QQuery 1  │         14.70 / 14.80 ±0.07 / 14.89 ms │         14.36 / 14.57 ±0.12 / 14.71 ms │     no change │
│ QQuery 2  │         43.06 / 43.26 ±0.18 / 43.59 ms │         42.55 / 42.80 ±0.22 / 43.06 ms │     no change │
│ QQuery 3  │         39.02 / 39.97 ±1.09 / 42.08 ms │         37.69 / 38.07 ±0.33 / 38.67 ms │     no change │
│ QQuery 4  │     269.60 / 285.23 ±10.89 / 303.20 ms │      268.19 / 276.07 ±5.62 / 284.28 ms │     no change │
│ QQuery 5  │     452.84 / 464.41 ±12.40 / 480.58 ms │     437.42 / 480.24 ±34.93 / 526.09 ms │     no change │
│ QQuery 6  │            6.39 / 6.75 ±0.30 / 7.27 ms │            6.91 / 7.21 ±0.21 / 7.50 ms │  1.07x slower │
│ QQuery 7  │         16.86 / 17.16 ±0.16 / 17.30 ms │         17.58 / 17.88 ±0.49 / 18.86 ms │     no change │
│ QQuery 8  │      379.69 / 391.13 ±7.75 / 402.33 ms │     434.77 / 448.19 ±10.33 / 464.14 ms │  1.15x slower │
│ QQuery 9  │     620.60 / 667.83 ±37.22 / 717.37 ms │      636.17 / 639.43 ±2.70 / 644.30 ms │     no change │
│ QQuery 10 │       98.79 / 100.66 ±3.08 / 106.80 ms │        94.96 / 97.27 ±4.26 / 105.78 ms │     no change │
│ QQuery 11 │      111.62 / 112.83 ±0.83 / 113.72 ms │      106.31 / 107.75 ±1.25 / 109.26 ms │     no change │
│ QQuery 12 │     457.64 / 499.44 ±30.51 / 537.14 ms │      457.67 / 463.95 ±7.64 / 478.90 ms │ +1.08x faster │
│ QQuery 13 │      506.16 / 518.33 ±7.09 / 526.32 ms │     499.82 / 518.72 ±10.83 / 532.53 ms │     no change │
│ QQuery 14 │      451.26 / 462.14 ±8.04 / 470.70 ms │     456.63 / 477.29 ±29.22 / 534.19 ms │     no change │
│ QQuery 15 │      336.44 / 340.16 ±3.31 / 344.80 ms │      397.02 / 401.58 ±5.05 / 411.32 ms │  1.18x slower │
│ QQuery 16 │     712.39 / 763.78 ±45.96 / 844.28 ms │     719.00 / 756.44 ±29.24 / 785.67 ms │     no change │
│ QQuery 17 │     733.18 / 766.64 ±18.22 / 787.43 ms │      693.23 / 699.22 ±5.01 / 707.65 ms │ +1.10x faster │
│ QQuery 18 │  1414.07 / 1447.65 ±22.54 / 1483.99 ms │  1425.75 / 1442.05 ±15.35 / 1468.64 ms │     no change │
│ QQuery 19 │       42.02 / 58.90 ±24.48 / 107.30 ms │       38.69 / 63.31 ±38.30 / 139.62 ms │  1.07x slower │
│ QQuery 20 │     620.94 / 638.97 ±28.41 / 695.28 ms │     626.44 / 641.11 ±25.82 / 692.71 ms │     no change │
│ QQuery 21 │      704.25 / 708.46 ±3.42 / 713.89 ms │      721.73 / 726.45 ±2.56 / 729.15 ms │     no change │
│ QQuery 22 │  1361.67 / 1394.23 ±20.70 / 1415.92 ms │  1363.18 / 1381.90 ±17.22 / 1412.23 ms │     no change │
│ QQuery 23 │  3812.19 / 3871.88 ±75.73 / 4018.67 ms │ 3747.74 / 3898.17 ±106.95 / 4053.49 ms │     no change │
│ QQuery 24 │      226.55 / 230.17 ±5.30 / 240.67 ms │      220.38 / 226.97 ±6.00 / 238.05 ms │     no change │
│ QQuery 25 │      190.34 / 192.41 ±1.95 / 195.71 ms │      185.99 / 188.88 ±2.40 / 191.84 ms │     no change │
│ QQuery 26 │      230.69 / 234.19 ±3.49 / 240.21 ms │      215.96 / 219.52 ±2.66 / 223.24 ms │ +1.07x faster │
│ QQuery 27 │     745.57 / 757.23 ±12.75 / 777.76 ms │      743.91 / 747.93 ±3.07 / 752.21 ms │     no change │
│ QQuery 28 │  3397.19 / 3475.39 ±75.18 / 3566.61 ms │  3439.06 / 3512.39 ±62.16 / 3618.52 ms │     no change │
│ QQuery 29 │         47.87 / 53.31 ±5.31 / 62.79 ms │       50.04 / 87.40 ±40.42 / 152.62 ms │  1.64x slower │
│ QQuery 30 │      437.04 / 452.96 ±9.82 / 467.78 ms │      454.30 / 462.68 ±8.36 / 475.58 ms │     no change │
│ QQuery 31 │      428.42 / 435.36 ±6.73 / 446.53 ms │     423.80 / 436.07 ±14.39 / 464.27 ms │     no change │
│ QQuery 32 │  1037.26 / 1059.54 ±20.26 / 1096.61 ms │  1195.72 / 1310.89 ±62.32 / 1383.94 ms │  1.24x slower │
│ QQuery 33 │  1544.01 / 1625.27 ±49.12 / 1676.99 ms │ 1554.73 / 1720.12 ±124.52 / 1863.63 ms │  1.06x slower │
│ QQuery 34 │ 1521.64 / 1615.37 ±109.34 / 1784.34 ms │  1551.66 / 1608.87 ±72.59 / 1746.03 ms │     no change │
│ QQuery 35 │      440.94 / 447.20 ±6.13 / 456.74 ms │     364.32 / 379.31 ±14.37 / 400.46 ms │ +1.18x faster │
│ QQuery 36 │      126.09 / 134.24 ±4.33 / 138.71 ms │      132.94 / 134.55 ±1.19 / 136.59 ms │     no change │
│ QQuery 37 │         59.26 / 61.90 ±2.25 / 64.91 ms │         59.61 / 61.24 ±2.06 / 65.30 ms │     no change │
│ QQuery 38 │         84.97 / 87.72 ±1.80 / 89.88 ms │         89.58 / 90.45 ±0.45 / 90.87 ms │     no change │
│ QQuery 39 │      225.99 / 241.95 ±9.88 / 255.59 ms │      247.68 / 255.39 ±6.31 / 262.91 ms │  1.06x slower │
│ QQuery 40 │         22.20 / 24.41 ±2.59 / 29.33 ms │         27.80 / 30.14 ±1.39 / 31.95 ms │  1.23x slower │
│ QQuery 41 │         20.15 / 20.63 ±0.47 / 21.48 ms │         22.06 / 23.36 ±1.47 / 26.12 ms │  1.13x slower │
│ QQuery 42 │         20.04 / 20.39 ±0.22 / 20.71 ms │         21.12 / 21.65 ±0.36 / 22.24 ms │  1.06x slower │
└───────────┴────────────────────────────────────────┴────────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                               ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                               │ 24785.41ms │
│ Total Time (feat_reorder-row-groups-by-stats)   │ 25158.54ms │
│ Average Time (HEAD)                             │   576.40ms │
│ Average Time (feat_reorder-row-groups-by-stats) │   585.08ms │
│ Queries Faster                                  │          5 │
│ Queries Slower                                  │         11 │
│ Queries with No Change                          │         27 │
│ Queries with Failure                            │          0 │
└─────────────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_1 — base (merge-base)

Metric Value
Wall time 125.0s
Peak memory 37.6 GiB
Avg memory 27.9 GiB
CPU user 1163.7s
CPU sys 94.4s
Peak spill 0 B

clickbench_1 — branch

Metric Value
Wall time 130.0s
Peak memory 36.1 GiB
Avg memory 25.8 GiB
CPU user 1163.3s
CPU sys 111.2s
Peak spill 0 B

File an issue against this benchmark runner

@zhuqi-lucas zhuqi-lucas marked this pull request as ready for review April 22, 2026 03:28
For GROUP BY + ORDER BY queries, the TopK sort column is an aggregate
output (e.g. COUNT(*)) that doesn't exist in the parquet file schema.
Previously we still created ReorderByStatistics which tried to look
up the column in statistics — wasted work.

Now check column existence in file schema before creating the
optimizer. This eliminates overhead for non-scan-level TopK queries
(ClickBench Q40-Q42 regression fix).
zhuqi-lucas added a commit to zhuqi-lucas/arrow-datafusion that referenced this pull request Apr 22, 2026
…pache#21711)

## Which issue does this PR close?

Related to apache#21580

## Rationale for this change

The sort pushdown benchmark had two problems:

1. **Broken data generation**: The single-file ORDER BY approach caused
the parquet writer to merge rows from adjacent chunks at RG boundaries,
widening RG ranges to ~6M. The per-file split fix gave each file only 1
RG, so `reorder_by_statistics` (intra-file optimization) had nothing to
reorder.

2. **Missing DESC LIMIT queries**: The `sort_pushdown` benchmark only
had ASC queries (sort elimination). No queries tested the reverse scan +
TopK path (Inexact sort pushdown), which is where RG reorder, stats
init, and cumulative pruning provide 20-58x improvement.

## What changes are included in this PR?

### 1. Fix benchmark data generation

Generate **multiple files with multiple scrambled RGs each**:
- `inexact`: 3 files x ~20 RGs each
- `overlap`: 5 files x ~12 RGs each

Uses pyarrow to redistribute RGs from a sorted temp file into multiple
output files with scrambled RG order. Each RG has a narrow `l_orderkey`
range (~100K) but appears in scrambled order within its file.

### 2. Add DESC LIMIT queries to sort_pushdown benchmark

New q5-q8 for `sort_pushdown` (sorted data, `WITH ORDER`):

| Query | Description |
|-------|-------------|
| q5 | `ORDER BY l_orderkey DESC LIMIT 100` (narrow projection) |
| q6 | `ORDER BY l_orderkey DESC LIMIT 1000` (narrow projection) |
| q7 | `SELECT * ORDER BY l_orderkey DESC LIMIT 100` (wide projection) |
| q8 | `SELECT * ORDER BY l_orderkey DESC LIMIT 1000` (wide projection)
|

These test the Inexact sort pushdown path: reverse scan + TopK + dynamic
filter, which benefits from the optimizations in apache#21580.

## Are these changes tested?

Benchmark changes only. Verified locally:
- Data generation produces correct multi-file multi-RG output
- DESC LIMIT queries return correct results
- q5-q8 show 20-58x improvement with apache#21580 optimizations

## Are there any user-facing changes?

No. Adds pyarrow as a dependency for generating benchmark datasets (`pip
install pyarrow`).
@zhuqi-lucas
Copy link
Copy Markdown
Contributor Author

run benchmark sort_pushdown_inexact

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4293595834-1736-l7lsc 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/reorder-row-groups-by-stats (bb87ff6) to 64619a6 (merge-base) diff using: sort_pushdown_inexact
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and feat_reorder-row-groups-by-stats
--------------------
Benchmark sort_pushdown_inexact.json
--------------------
┏━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query ┃                           HEAD ┃ feat_reorder-row-groups-by-stats ┃        Change ┃
┡━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ Q1    │   6.71 / 8.17 ±1.20 / 10.30 ms │      3.18 / 3.79 ±0.80 / 5.29 ms │ +2.15x faster │
│ Q2    │    6.92 / 7.24 ±0.39 / 7.98 ms │      3.16 / 3.39 ±0.39 / 4.17 ms │ +2.13x faster │
│ Q3    │ 22.09 / 22.87 ±0.67 / 23.75 ms │      7.01 / 7.20 ±0.26 / 7.71 ms │ +3.17x faster │
│ Q4    │ 20.72 / 21.83 ±0.80 / 22.77 ms │      7.17 / 7.21 ±0.03 / 7.24 ms │ +3.03x faster │
└───────┴────────────────────────────────┴──────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┓
┃ Benchmark Summary                               ┃         ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━┩
│ Total Time (HEAD)                               │ 60.11ms │
│ Total Time (feat_reorder-row-groups-by-stats)   │ 21.59ms │
│ Average Time (HEAD)                             │ 15.03ms │
│ Average Time (feat_reorder-row-groups-by-stats) │  5.40ms │
│ Queries Faster                                  │       4 │
│ Queries Slower                                  │       0 │
│ Queries with No Change                          │       0 │
│ Queries with Failure                            │       0 │
└─────────────────────────────────────────────────┴─────────┘

Resource Usage

sort_pushdown_inexact — base (merge-base)

Metric Value
Wall time 5.0s
Peak memory 4.4 GiB
Avg memory 4.4 GiB
CPU user 2.6s
CPU sys 0.3s
Peak spill 0 B

sort_pushdown_inexact — branch

Metric Value
Wall time 5.0s
Peak memory 4.4 GiB
Avg memory 4.4 GiB
CPU user 0.9s
CPU sys 0.2s
Peak spill 0 B

File an issue against this benchmark runner

@zhuqi-lucas zhuqi-lucas force-pushed the feat/reorder-row-groups-by-stats branch from bb87ff6 to 5c31674 Compare April 22, 2026 05:21
@zhuqi-lucas
Copy link
Copy Markdown
Contributor Author

run benchmark clickbench_1

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4293745755-1737-k7jcp 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/reorder-row-groups-by-stats (2081071) to 64619a6 (merge-base) diff using: clickbench_1
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and feat_reorder-row-groups-by-stats
--------------------
Benchmark clickbench_1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃      feat_reorder-row-groups-by-stats ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │           0.71 / 1.06 ±0.62 / 2.29 ms │           0.72 / 1.06 ±0.64 / 2.34 ms │     no change │
│ QQuery 1  │        14.01 / 14.32 ±0.38 / 15.06 ms │        13.92 / 14.29 ±0.41 / 15.07 ms │     no change │
│ QQuery 2  │        41.81 / 42.19 ±0.32 / 42.74 ms │        42.04 / 42.16 ±0.09 / 42.31 ms │     no change │
│ QQuery 3  │        38.16 / 39.03 ±0.92 / 40.23 ms │        38.09 / 38.29 ±0.13 / 38.47 ms │     no change │
│ QQuery 4  │     254.08 / 258.22 ±3.78 / 264.69 ms │     251.30 / 254.54 ±1.76 / 256.25 ms │     no change │
│ QQuery 5  │     420.94 / 428.30 ±6.34 / 439.75 ms │     422.07 / 424.77 ±2.18 / 428.75 ms │     no change │
│ QQuery 6  │           6.26 / 6.58 ±0.30 / 7.04 ms │           6.22 / 6.53 ±0.33 / 7.04 ms │     no change │
│ QQuery 7  │        15.98 / 16.07 ±0.08 / 16.16 ms │        16.15 / 16.35 ±0.16 / 16.64 ms │     no change │
│ QQuery 8  │     352.57 / 359.77 ±4.57 / 366.75 ms │     357.49 / 363.37 ±5.07 / 372.55 ms │     no change │
│ QQuery 9  │     526.54 / 528.54 ±1.63 / 530.86 ms │    521.91 / 533.38 ±10.46 / 548.04 ms │     no change │
│ QQuery 10 │        87.84 / 90.57 ±4.13 / 98.77 ms │       88.36 / 91.46 ±4.75 / 100.91 ms │     no change │
│ QQuery 11 │     101.01 / 101.58 ±0.49 / 102.39 ms │     102.57 / 104.09 ±1.60 / 106.86 ms │     no change │
│ QQuery 12 │    429.19 / 445.98 ±15.37 / 474.36 ms │    427.82 / 441.76 ±10.28 / 459.23 ms │     no change │
│ QQuery 13 │    498.66 / 506.89 ±11.66 / 530.02 ms │    478.98 / 494.18 ±12.72 / 514.17 ms │     no change │
│ QQuery 14 │     434.91 / 439.48 ±3.41 / 443.94 ms │     431.07 / 434.42 ±2.74 / 438.70 ms │     no change │
│ QQuery 15 │     294.99 / 302.55 ±6.70 / 314.09 ms │     301.08 / 304.46 ±3.17 / 310.42 ms │     no change │
│ QQuery 16 │     631.99 / 645.31 ±7.92 / 652.37 ms │     642.57 / 650.64 ±4.97 / 657.25 ms │     no change │
│ QQuery 17 │     642.32 / 644.89 ±2.35 / 649.31 ms │     645.75 / 654.34 ±8.71 / 668.26 ms │     no change │
│ QQuery 18 │ 1272.13 / 1288.08 ±13.56 / 1303.09 ms │ 1272.54 / 1284.75 ±15.54 / 1313.68 ms │     no change │
│ QQuery 19 │        35.15 / 36.51 ±2.35 / 41.21 ms │        35.04 / 35.22 ±0.14 / 35.47 ms │     no change │
│ QQuery 20 │    622.40 / 639.57 ±11.75 / 653.31 ms │    623.51 / 646.35 ±20.78 / 681.03 ms │     no change │
│ QQuery 21 │    709.07 / 719.07 ±10.40 / 737.61 ms │    708.72 / 725.80 ±11.27 / 738.38 ms │     no change │
│ QQuery 22 │ 1380.42 / 1392.99 ±13.47 / 1418.77 ms │  1389.23 / 1395.73 ±7.34 / 1407.26 ms │     no change │
│ QQuery 23 │ 3794.98 / 3808.40 ±13.62 / 3834.12 ms │ 3762.24 / 3794.71 ±20.43 / 3824.48 ms │     no change │
│ QQuery 24 │     214.71 / 221.35 ±7.70 / 234.91 ms │     214.71 / 219.78 ±6.48 / 232.33 ms │     no change │
│ QQuery 25 │     184.80 / 187.12 ±2.01 / 190.52 ms │     182.92 / 187.87 ±5.63 / 198.71 ms │     no change │
│ QQuery 26 │     214.55 / 216.73 ±1.71 / 219.71 ms │     214.26 / 217.59 ±3.03 / 222.64 ms │     no change │
│ QQuery 27 │    750.65 / 766.51 ±13.58 / 783.15 ms │     752.21 / 762.04 ±8.56 / 775.46 ms │     no change │
│ QQuery 28 │ 3467.77 / 3489.47 ±21.57 / 3519.84 ms │ 3453.68 / 3470.58 ±12.28 / 3483.38 ms │     no change │
│ QQuery 29 │       46.44 / 54.72 ±10.81 / 73.28 ms │        46.28 / 46.74 ±0.30 / 47.22 ms │ +1.17x faster │
│ QQuery 30 │    411.70 / 425.53 ±13.42 / 448.77 ms │    416.34 / 430.10 ±13.01 / 452.11 ms │     no change │
│ QQuery 31 │    388.41 / 402.03 ±13.07 / 425.76 ms │     389.94 / 396.69 ±5.70 / 406.16 ms │     no change │
│ QQuery 32 │ 1015.39 / 1030.76 ±18.04 / 1065.01 ms │ 1025.02 / 1050.65 ±22.54 / 1087.76 ms │     no change │
│ QQuery 33 │ 1457.03 / 1477.87 ±12.07 / 1490.70 ms │ 1465.08 / 1483.09 ±23.57 / 1528.64 ms │     no change │
│ QQuery 34 │ 1476.66 / 1491.32 ±11.26 / 1508.97 ms │ 1477.56 / 1510.65 ±40.61 / 1589.72 ms │     no change │
│ QQuery 35 │    324.15 / 353.47 ±40.51 / 432.19 ms │     320.39 / 332.61 ±9.84 / 347.03 ms │ +1.06x faster │
│ QQuery 36 │     119.12 / 126.53 ±5.93 / 136.26 ms │    130.31 / 141.15 ±11.60 / 156.11 ms │  1.12x slower │
│ QQuery 37 │       54.08 / 63.48 ±10.61 / 81.14 ms │        52.48 / 56.63 ±5.69 / 67.90 ms │ +1.12x faster │
│ QQuery 38 │        84.47 / 86.47 ±1.15 / 87.93 ms │        84.86 / 86.49 ±1.59 / 88.94 ms │     no change │
│ QQuery 39 │     239.98 / 244.46 ±4.33 / 252.55 ms │     234.40 / 245.65 ±6.93 / 252.64 ms │     no change │
│ QQuery 40 │        22.27 / 22.67 ±0.34 / 23.14 ms │        22.88 / 25.58 ±3.64 / 32.58 ms │  1.13x slower │
│ QQuery 41 │        20.14 / 20.72 ±0.69 / 22.02 ms │        19.81 / 20.27 ±0.49 / 21.19 ms │     no change │
│ QQuery 42 │       19.89 / 26.41 ±12.17 / 50.73 ms │        19.62 / 19.85 ±0.15 / 20.09 ms │ +1.33x faster │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                               ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                               │ 23463.59ms │
│ Total Time (feat_reorder-row-groups-by-stats)   │ 23456.64ms │
│ Average Time (HEAD)                             │   545.66ms │
│ Average Time (feat_reorder-row-groups-by-stats) │   545.50ms │
│ Queries Faster                                  │          4 │
│ Queries Slower                                  │          2 │
│ Queries with No Change                          │         37 │
│ Queries with Failure                            │          0 │
└─────────────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_1 — base (merge-base)

Metric Value
Wall time 120.0s
Peak memory 30.0 GiB
Avg memory 21.9 GiB
CPU user 1118.2s
CPU sys 67.2s
Peak spill 0 B

clickbench_1 — branch

Metric Value
Wall time 120.0s
Peak memory 28.7 GiB
Avg memory 21.9 GiB
CPU user 1118.6s
CPU sys 67.3s
Peak spill 0 B

File an issue against this benchmark runner

@zhuqi-lucas
Copy link
Copy Markdown
Contributor Author

run benchmark sort_pushdown_inexact

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4293875595-1738-krt7d 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/reorder-row-groups-by-stats (2081071) to 64619a6 (merge-base) diff using: sort_pushdown_inexact
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and feat_reorder-row-groups-by-stats
--------------------
Benchmark sort_pushdown_inexact.json
--------------------
┏━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query ┃                           HEAD ┃ feat_reorder-row-groups-by-stats ┃        Change ┃
┡━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ Q1    │    6.80 / 7.59 ±1.12 / 9.76 ms │      2.90 / 3.43 ±0.84 / 5.11 ms │ +2.21x faster │
│ Q2    │    6.67 / 6.85 ±0.16 / 7.05 ms │      3.06 / 3.14 ±0.07 / 3.24 ms │ +2.19x faster │
│ Q3    │ 21.04 / 21.89 ±0.66 / 22.79 ms │      6.85 / 7.07 ±0.17 / 7.31 ms │ +3.10x faster │
│ Q4    │ 20.21 / 20.93 ±0.71 / 21.98 ms │      6.92 / 7.06 ±0.14 / 7.26 ms │ +2.96x faster │
└───────┴────────────────────────────────┴──────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┓
┃ Benchmark Summary                               ┃         ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━┩
│ Total Time (HEAD)                               │ 57.26ms │
│ Total Time (feat_reorder-row-groups-by-stats)   │ 20.70ms │
│ Average Time (HEAD)                             │ 14.32ms │
│ Average Time (feat_reorder-row-groups-by-stats) │  5.17ms │
│ Queries Faster                                  │       4 │
│ Queries Slower                                  │       0 │
│ Queries with No Change                          │       0 │
│ Queries with Failure                            │       0 │
└─────────────────────────────────────────────────┴─────────┘

Resource Usage

sort_pushdown_inexact — base (merge-base)

Metric Value
Wall time 5.0s
Peak memory 4.4 GiB
Avg memory 4.4 GiB
CPU user 2.5s
CPU sys 0.3s
Peak spill 0 B

sort_pushdown_inexact — branch

Metric Value
Wall time 5.0s
Peak memory 4.4 GiB
Avg memory 4.4 GiB
CPU user 0.9s
CPU sys 0.2s
Peak spill 0 B

File an issue against this benchmark runner

@zhuqi-lucas
Copy link
Copy Markdown
Contributor Author

zhuqi-lucas commented Apr 22, 2026

cc @adriangb @Dandandan @alamb

The latest CI benchmark results show 2x-3x faster on sort_pushdown_inexact with no regression on ClickBench:

┏━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query ┃                           HEAD ┃ feat_reorder-row-groups-by-stats ┃        Change ┃
┡━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ Q1    │    6.80 / 7.59 ±1.12 / 9.76 ms │      2.90 / 3.43 ±0.84 / 5.11 ms │ +2.21x faster │
│ Q2    │    6.67 / 6.85 ±0.16 / 7.05 ms │      3.06 / 3.14 ±0.07 / 3.24 ms │ +2.19x faster │
│ Q3    │ 21.04 / 21.89 ±0.66 / 22.79 ms │      6.85 / 7.07 ±0.17 / 7.31 ms │ +3.10x faster │
│ Q4    │ 20.21 / 20.93 ±0.71 / 21.98 ms │      6.92 / 7.06 ±0.14 / 7.26 ms │ +2.96x faster │
└───────┴────────────────────────────────┴──────────────────────────────────┴───────────────┘

Local benchmark on sorted single-file (61 RGs, narrow ranges) shows 17-60x faster — the full optimization chain (stats init + cumulative RG prune) skips 60 of 61 RGs with zero I/O.

Would appreciate a review when you get a chance! The PR is ready for review.

let mut cumulative = 0usize;
let mut keep_count = 0;
for &idx in rg_indexes {
cumulative += file_metadata.row_group(idx).num_rows() as usize;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parquet opener first turns the dynamic filter into a pushed-down row_filter / row_selection, but later the cumulative cutoff still sums raw row_group.num_rows() and truncates once that raw count reaches fetch. That is unsafe when early RGs contain many rows, but few rows survive the filter.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Currently create_filter and fetch are set in the same method (with_fetch), and we fixed the ordering so fetch is set before create_filter is called. There's no separate code path that updates fetch
without recreating the filter.

But you're right that this coupling is fragile, if a future optimizer calls with_fetch independently, the filter's fetch would go stale. I'll optimize it as follow-up to consider making fetch on DynamicFilterPhysicalExpr read directly from SortExec.fetch (via shared reference) instead of copying the value at creation time.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created the follow-up issue:
#21780


/// Keep only the first `count` row groups, dropping the rest.
/// Used for TopK cumulative pruning after reorder + reverse.
pub(crate) fn truncate_row_groups(mut self, count: usize) -> Self {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it drops existing row_selection entirely after truncation, which can widen the scan back to full row groups or discard exact page-level pruning state

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch @xudong963, currently truncate_row_groups drops row_selection entirely, which loses page-level pruning state for the retained RGs. I'll fix this by skipping truncation when row_selection is present cumulative
prune will only apply when there's no page-level pruning active.

This is safe because page pruning is already reducing I/O within those RGs.

new_sort.filter = fetch.is_some().then(|| {
// If we already have a filter, keep it. Otherwise, create a new one.
// Must be called after setting fetch so DynamicFilter gets the K value.
self.filter
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SortExec::with_fetch can leave the embedded dynamic filter’s fetch stale. The new parquet optimizations read df.fetch() for threshold init and cumulative prune, so if any later optimizer rewrites fetch through with_fetch, parquet may prune using the old K

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually safe in the current implementation because cumulative prune is guarded by is_pure_dynamic_filter — it only fires when the predicate is purely the DynamicFilterPhysicalExpr (no WHERE
clause). And cumulative prune runs before any data is read, so row_filter hasn't filtered any rows yet — num_rows() is accurate at this point.

If we extend this to WHERE queries in the future (e.g. with dynamic RG pruning at runtime via #21399), we'd need to switch from static row counting to runtime early termination.


// Get the first sort expression
// LexOrdering is guaranteed non-empty, so first() returns &PhysicalSortExpr
let first_sort_expr = sort_order.first();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about multi-key ORDER BY?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @xudong963 for review, yes, now only support sort one key, i will support multi-key ORDER BY as follow-up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate datasource Changes to the datasource crate physical-expr Changes to the physical-expr crates physical-plan Changes to the physical-plan crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Initialize TopK from file / rowgroup / .. statistics

7 participants