Skip to content

Handle canceled partitioned hash join dynamic filters lazily#21666

Merged
adriangb merged 18 commits intoapache:mainfrom
pydantic:codex/hash-join-empty-partition-reporting
Apr 22, 2026
Merged

Handle canceled partitioned hash join dynamic filters lazily#21666
adriangb merged 18 commits intoapache:mainfrom
pydantic:codex/hash-join-empty-partition-reporting

Conversation

@adriangb
Copy link
Copy Markdown
Contributor

@adriangb adriangb commented Apr 16, 2026

Which issue does this PR close?

Rationale for this change

Partitioned hash join dynamic filters assumed every build-side partition would eventually report build data to the shared coordinator. That assumption breaks when an upstream partitioned operator legally short-circuits and drops a child hash-join partition before it is ever polled far enough to report.

In the original reproducer, a parent RightSemi join completes early for partitions whose own build side is empty. That causes child partitioned hash-join streams to be dropped while still waiting to build/report their dynamic-filter contribution. Sibling partitions then wait forever for reports that will never arrive.

What changes are included in this PR?

  • teach the shared partitioned dynamic-filter coordinator to distinguish terminal partition states:
    • reported build data
    • canceled before build data was known
  • mark unreported partitioned hash-join streams as canceled on Drop
  • treat canceled partitions as true in the synthesized partitioned filter so they do not block completion or incorrectly filter probe rows
  • preserve existing empty-partition behavior so known-empty partitions still contribute false
  • preserve the existing compact filter plan shapes when there are no canceled partitions, including the single-branch collapse used in hash-collision mode
  • add a regression test for the cancellation pattern that previously hung

Are these changes tested?

  • cargo fmt --all
  • cargo test -p datafusion-physical-plan test_partitioned_dynamic_filter_reports_empty_canceled_partitions -- --nocapture
  • cargo test -p datafusion --test core_integration physical_optimizer::filter_pushdown::test_hashjoin_dynamic_filter_pushdown_partitioned -- --nocapture
  • cargo test -p datafusion --test core_integration physical_optimizer::filter_pushdown::test_hashjoin_dynamic_filter_pushdown_partitioned --features force_hash_collisions -- --nocapture
  • verified that test_partitioned_dynamic_filter_reports_empty_canceled_partitions times out on the pre-fix revision and passes on this branch

cargo clippy --all-targets --all-features -- -D warnings still fails on an unrelated existing workspace lint in datafusion/expr/src/logical_plan/plan.rs:3773 (clippy::mutable_key_type).

Are there any user-facing changes?

No.

@github-actions github-actions Bot added the physical-plan Changes to the physical-plan crate label Apr 16, 2026
@adriangb adriangb marked this pull request as ready for review April 16, 2026 11:55
@adriangb
Copy link
Copy Markdown
Contributor Author

run benchmarks

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4259875560-1362-nk27n 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing codex/hash-join-empty-partition-reporting (d17d5e4) to 5c653be (merge-base) diff using: tpcds
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4259875560-1361-d7znb 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing codex/hash-join-empty-partition-reporting (d17d5e4) to 5c653be (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4259875560-1363-qrwrt 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing codex/hash-join-empty-partition-reporting (d17d5e4) to 5c653be (merge-base) diff using: tpch
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and codex_hash-join-empty-partition-reporting
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                   HEAD ┃ codex_hash-join-empty-partition-reporting ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │           1.19 / 4.37 ±6.28 / 16.93 ms │              1.19 / 4.46 ±6.37 / 17.19 ms │     no change │
│ QQuery 1  │         14.88 / 15.01 ±0.11 / 15.17 ms │            15.14 / 15.45 ±0.17 / 15.63 ms │     no change │
│ QQuery 2  │         43.43 / 43.74 ±0.26 / 44.21 ms │            43.98 / 44.39 ±0.30 / 44.90 ms │     no change │
│ QQuery 3  │         41.57 / 44.12 ±1.69 / 46.34 ms │            42.77 / 44.97 ±1.35 / 46.94 ms │     no change │
│ QQuery 4  │      282.16 / 293.79 ±9.62 / 307.92 ms │         285.44 / 296.97 ±7.26 / 308.10 ms │     no change │
│ QQuery 5  │      337.41 / 342.71 ±2.83 / 345.05 ms │         352.81 / 358.06 ±4.34 / 364.86 ms │     no change │
│ QQuery 6  │            4.98 / 5.41 ±0.22 / 5.55 ms │              5.30 / 9.08 ±3.72 / 15.49 ms │  1.68x slower │
│ QQuery 7  │         16.78 / 17.38 ±0.47 / 18.15 ms │            17.51 / 17.64 ±0.10 / 17.76 ms │     no change │
│ QQuery 8  │      415.76 / 424.23 ±8.58 / 440.27 ms │        410.23 / 429.54 ±12.05 / 446.87 ms │     no change │
│ QQuery 9  │      648.84 / 658.48 ±9.24 / 671.44 ms │         631.88 / 643.91 ±7.25 / 654.13 ms │     no change │
│ QQuery 10 │         92.78 / 95.81 ±2.44 / 99.64 ms │            92.15 / 94.38 ±1.43 / 96.26 ms │     no change │
│ QQuery 11 │      105.44 / 106.03 ±0.58 / 107.09 ms │         105.57 / 107.75 ±2.37 / 111.84 ms │     no change │
│ QQuery 12 │      342.54 / 352.42 ±9.61 / 370.28 ms │         334.21 / 339.65 ±3.70 / 345.28 ms │     no change │
│ QQuery 13 │      461.57 / 470.47 ±8.18 / 484.10 ms │        448.28 / 464.82 ±10.82 / 476.52 ms │     no change │
│ QQuery 14 │      342.47 / 343.60 ±0.93 / 345.24 ms │         343.75 / 346.33 ±2.36 / 349.96 ms │     no change │
│ QQuery 15 │     345.85 / 364.92 ±26.66 / 416.23 ms │        346.80 / 366.10 ±19.82 / 398.34 ms │     no change │
│ QQuery 16 │     698.51 / 714.81 ±12.20 / 730.96 ms │        712.57 / 727.80 ±15.51 / 752.28 ms │     no change │
│ QQuery 17 │      709.30 / 715.14 ±4.25 / 720.85 ms │         701.39 / 712.84 ±8.43 / 722.39 ms │     no change │
│ QQuery 18 │  1440.24 / 1484.07 ±37.55 / 1529.27 ms │     1411.15 / 1454.53 ±35.56 / 1512.41 ms │     no change │
│ QQuery 19 │        37.57 / 47.40 ±18.04 / 83.45 ms │          36.08 / 65.17 ±51.55 / 167.88 ms │  1.38x slower │
│ QQuery 20 │     716.24 / 769.21 ±74.56 / 914.63 ms │        716.23 / 740.68 ±36.54 / 813.18 ms │     no change │
│ QQuery 21 │     764.55 / 784.51 ±14.91 / 804.97 ms │         760.98 / 768.17 ±6.05 / 777.61 ms │     no change │
│ QQuery 22 │   1129.82 / 1135.88 ±6.00 / 1147.22 ms │      1131.04 / 1135.15 ±2.58 / 1138.99 ms │     no change │
│ QQuery 23 │ 3062.46 / 3218.46 ±160.84 / 3497.84 ms │     3097.45 / 3120.40 ±19.89 / 3147.34 ms │     no change │
│ QQuery 24 │       99.25 / 104.82 ±3.72 / 110.55 ms │         102.05 / 105.35 ±2.89 / 109.04 ms │     no change │
│ QQuery 25 │      138.23 / 140.44 ±1.84 / 142.65 ms │         139.62 / 141.81 ±2.43 / 146.55 ms │     no change │
│ QQuery 26 │      101.02 / 104.52 ±1.91 / 106.72 ms │         100.96 / 104.84 ±2.15 / 106.89 ms │     no change │
│ QQuery 27 │      846.99 / 853.43 ±8.79 / 870.68 ms │         843.80 / 850.00 ±4.96 / 858.11 ms │     no change │
│ QQuery 28 │  3256.52 / 3283.26 ±23.95 / 3320.66 ms │     3196.58 / 3236.01 ±22.69 / 3263.48 ms │     no change │
│ QQuery 29 │         50.38 / 54.22 ±4.93 / 63.77 ms │            50.65 / 57.18 ±7.12 / 70.39 ms │  1.05x slower │
│ QQuery 30 │      367.60 / 370.98 ±2.69 / 374.68 ms │        358.05 / 368.90 ±10.14 / 386.78 ms │     no change │
│ QQuery 31 │      367.20 / 382.34 ±9.63 / 396.41 ms │         370.31 / 378.77 ±5.44 / 386.87 ms │     no change │
│ QQuery 32 │  1207.62 / 1293.57 ±62.74 / 1382.69 ms │     1034.23 / 1158.96 ±97.51 / 1291.15 ms │ +1.12x faster │
│ QQuery 33 │  1495.17 / 1561.23 ±50.59 / 1642.46 ms │     1453.73 / 1464.62 ±11.31 / 1485.93 ms │ +1.07x faster │
│ QQuery 34 │  1542.70 / 1599.44 ±40.23 / 1662.87 ms │     1450.90 / 1488.12 ±22.30 / 1512.07 ms │ +1.07x faster │
│ QQuery 35 │      391.34 / 401.63 ±6.18 / 410.23 ms │         379.14 / 386.43 ±5.84 / 396.76 ms │     no change │
│ QQuery 36 │      112.72 / 119.13 ±4.25 / 124.89 ms │         118.38 / 120.30 ±1.19 / 121.57 ms │     no change │
│ QQuery 37 │         49.84 / 50.67 ±0.72 / 51.75 ms │            47.24 / 49.95 ±1.77 / 52.79 ms │     no change │
│ QQuery 38 │         74.20 / 75.96 ±1.12 / 77.60 ms │            74.12 / 75.68 ±1.42 / 78.11 ms │     no change │
│ QQuery 39 │      214.53 / 225.33 ±6.52 / 232.69 ms │         201.01 / 207.48 ±6.44 / 215.99 ms │ +1.09x faster │
│ QQuery 40 │         22.69 / 26.00 ±1.80 / 27.81 ms │            22.12 / 26.61 ±2.91 / 30.73 ms │     no change │
│ QQuery 41 │         19.88 / 20.98 ±0.71 / 21.98 ms │            18.39 / 19.77 ±1.22 / 21.79 ms │ +1.06x faster │
│ QQuery 42 │         19.78 / 20.76 ±1.19 / 23.06 ms │            19.10 / 19.96 ±0.69 / 21.19 ms │     no change │
└───────────┴────────────────────────────────────────┴───────────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                        ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                                        │ 23140.67ms │
│ Total Time (codex_hash-join-empty-partition-reporting)   │ 22568.97ms │
│ Average Time (HEAD)                                      │   538.16ms │
│ Average Time (codex_hash-join-empty-partition-reporting) │   524.86ms │
│ Queries Faster                                           │          5 │
│ Queries Slower                                           │          3 │
│ Queries with No Change                                   │         35 │
│ Queries with Failure                                     │          0 │
└──────────────────────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric Value
Wall time 116.8s
Peak memory 39.2 GiB
Avg memory 28.8 GiB
CPU user 1082.4s
CPU sys 102.9s
Peak spill 0 B

clickbench_partitioned — branch

Metric Value
Wall time 113.9s
Peak memory 41.0 GiB
Avg memory 29.4 GiB
CPU user 1069.7s
CPU sys 89.9s
Peak spill 0 B

File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and codex_hash-join-empty-partition-reporting
--------------------
Benchmark tpcds_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                     HEAD ┃ codex_hash-join-empty-partition-reporting ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │              6.98 / 7.37 ±0.69 / 8.75 ms │               7.09 / 7.54 ±0.69 / 8.90 ms │     no change │
│ QQuery 2  │        145.73 / 146.31 ±0.32 / 146.60 ms │         146.27 / 147.38 ±0.76 / 148.07 ms │     no change │
│ QQuery 3  │        114.06 / 114.75 ±0.58 / 115.63 ms │         115.04 / 116.01 ±0.69 / 116.89 ms │     no change │
│ QQuery 4  │    1410.96 / 1432.76 ±11.84 / 1445.23 ms │     1421.36 / 1449.73 ±18.93 / 1470.63 ms │     no change │
│ QQuery 5  │        174.22 / 175.39 ±1.01 / 177.11 ms │         175.06 / 175.90 ±0.69 / 176.96 ms │     no change │
│ QQuery 6  │       845.04 / 889.68 ±26.52 / 915.48 ms │        857.43 / 886.16 ±21.55 / 910.23 ms │     no change │
│ QQuery 7  │        348.38 / 350.63 ±2.63 / 354.78 ms │         348.24 / 350.59 ±1.51 / 352.86 ms │     no change │
│ QQuery 8  │        117.72 / 118.21 ±0.41 / 118.88 ms │         117.47 / 118.30 ±0.76 / 119.43 ms │     no change │
│ QQuery 9  │        102.35 / 105.91 ±2.84 / 109.15 ms │        102.56 / 111.85 ±11.17 / 133.78 ms │  1.06x slower │
│ QQuery 10 │        107.24 / 108.52 ±0.75 / 109.28 ms │         109.81 / 111.12 ±0.86 / 112.11 ms │     no change │
│ QQuery 11 │     1007.00 / 1021.26 ±9.23 / 1034.34 ms │      1007.07 / 1024.01 ±8.78 / 1031.00 ms │     no change │
│ QQuery 12 │           45.77 / 49.05 ±1.75 / 50.94 ms │            46.99 / 48.84 ±1.52 / 50.68 ms │     no change │
│ QQuery 13 │        413.26 / 417.25 ±5.89 / 428.86 ms │         408.54 / 411.67 ±3.27 / 417.80 ms │     no change │
│ QQuery 14 │     1009.07 / 1017.56 ±7.45 / 1029.43 ms │      999.17 / 1018.18 ±11.21 / 1031.36 ms │     no change │
│ QQuery 15 │           16.03 / 17.07 ±1.02 / 18.34 ms │            13.46 / 14.11 ±0.57 / 15.08 ms │ +1.21x faster │
│ QQuery 16 │             7.71 / 8.42 ±0.82 / 10.03 ms │              8.17 / 8.97 ±0.70 / 10.09 ms │  1.07x slower │
│ QQuery 17 │        230.76 / 232.93 ±1.84 / 235.83 ms │         264.16 / 265.73 ±1.29 / 267.81 ms │  1.14x slower │
│ QQuery 18 │        129.11 / 130.77 ±0.99 / 131.81 ms │         126.14 / 127.62 ±1.63 / 130.54 ms │     no change │
│ QQuery 19 │        157.98 / 159.68 ±1.07 / 161.01 ms │         157.97 / 160.54 ±1.39 / 162.22 ms │     no change │
│ QQuery 20 │           14.66 / 14.92 ±0.24 / 15.28 ms │            14.43 / 14.97 ±0.40 / 15.62 ms │     no change │
│ QQuery 21 │           20.07 / 20.47 ±0.23 / 20.75 ms │            19.56 / 20.43 ±0.48 / 21.02 ms │     no change │
│ QQuery 22 │        495.02 / 498.04 ±2.66 / 501.21 ms │         498.85 / 503.37 ±3.27 / 508.29 ms │     no change │
│ QQuery 23 │       906.32 / 918.96 ±14.83 / 947.35 ms │         904.95 / 917.85 ±9.09 / 931.57 ms │     no change │
│ QQuery 24 │        393.52 / 397.02 ±2.61 / 401.58 ms │         393.38 / 397.47 ±3.17 / 402.93 ms │     no change │
│ QQuery 25 │        349.38 / 351.01 ±1.44 / 352.96 ms │         319.05 / 322.76 ±2.66 / 325.97 ms │ +1.09x faster │
│ QQuery 26 │           82.33 / 84.45 ±1.31 / 86.02 ms │            82.27 / 84.29 ±1.17 / 85.90 ms │     no change │
│ QQuery 27 │              7.31 / 7.71 ±0.33 / 8.25 ms │               7.42 / 7.57 ±0.10 / 7.68 ms │     no change │
│ QQuery 28 │        150.00 / 152.07 ±2.03 / 155.86 ms │         151.41 / 154.11 ±2.28 / 157.50 ms │     no change │
│ QQuery 29 │        287.62 / 289.81 ±1.96 / 293.12 ms │         261.80 / 263.62 ±1.34 / 265.32 ms │ +1.10x faster │
│ QQuery 30 │           45.06 / 46.90 ±2.11 / 50.83 ms │            44.31 / 47.17 ±2.99 / 52.74 ms │     no change │
│ QQuery 31 │        172.82 / 174.61 ±1.44 / 176.55 ms │         173.06 / 175.43 ±1.50 / 176.95 ms │     no change │
│ QQuery 32 │           58.31 / 59.03 ±0.47 / 59.55 ms │            58.55 / 59.72 ±1.01 / 61.10 ms │     no change │
│ QQuery 33 │        142.58 / 144.46 ±1.37 / 146.65 ms │         143.26 / 144.66 ±0.84 / 145.74 ms │     no change │
│ QQuery 34 │              7.58 / 7.85 ±0.20 / 8.15 ms │               7.58 / 7.86 ±0.40 / 8.66 ms │     no change │
│ QQuery 35 │        108.25 / 110.55 ±1.38 / 112.22 ms │         112.09 / 112.68 ±0.70 / 114.00 ms │     no change │
│ QQuery 36 │              6.80 / 6.93 ±0.12 / 7.08 ms │               6.86 / 7.20 ±0.38 / 7.89 ms │     no change │
│ QQuery 37 │             8.75 / 9.46 ±0.91 / 11.20 ms │               8.72 / 9.33 ±0.36 / 9.80 ms │     no change │
│ QQuery 38 │           84.75 / 89.94 ±4.99 / 99.24 ms │            84.51 / 89.22 ±4.63 / 98.06 ms │     no change │
│ QQuery 39 │        128.40 / 130.29 ±1.38 / 131.72 ms │         130.12 / 132.66 ±2.17 / 135.25 ms │     no change │
│ QQuery 40 │        113.02 / 119.36 ±5.74 / 129.07 ms │         114.91 / 120.91 ±6.13 / 132.45 ms │     no change │
│ QQuery 41 │           14.95 / 16.04 ±1.21 / 17.60 ms │            15.47 / 16.31 ±1.03 / 18.16 ms │     no change │
│ QQuery 42 │        107.46 / 109.74 ±1.51 / 111.04 ms │         110.15 / 111.56 ±1.54 / 113.94 ms │     no change │
│ QQuery 43 │              6.28 / 6.95 ±1.06 / 9.06 ms │               6.41 / 6.47 ±0.06 / 6.56 ms │ +1.07x faster │
│ QQuery 44 │           12.29 / 12.64 ±0.27 / 13.10 ms │            12.00 / 12.77 ±0.67 / 14.01 ms │     no change │
│ QQuery 45 │           52.73 / 53.71 ±0.87 / 54.78 ms │            49.30 / 50.52 ±0.68 / 51.31 ms │ +1.06x faster │
│ QQuery 46 │              9.02 / 9.27 ±0.24 / 9.61 ms │              8.83 / 9.55 ±0.68 / 10.68 ms │     no change │
│ QQuery 47 │       765.87 / 780.37 ±11.90 / 800.33 ms │         777.23 / 782.52 ±3.26 / 786.69 ms │     no change │
│ QQuery 48 │        290.90 / 298.19 ±5.32 / 306.01 ms │         290.08 / 300.28 ±6.94 / 311.90 ms │     no change │
│ QQuery 49 │        254.22 / 256.67 ±2.54 / 259.93 ms │         253.39 / 257.06 ±2.61 / 260.94 ms │     no change │
│ QQuery 50 │        230.79 / 236.93 ±4.85 / 244.81 ms │         232.60 / 237.21 ±2.99 / 241.39 ms │     no change │
│ QQuery 51 │        181.69 / 184.70 ±3.85 / 191.91 ms │         183.36 / 185.15 ±1.25 / 186.51 ms │     no change │
│ QQuery 52 │        108.23 / 110.22 ±1.94 / 112.97 ms │         108.85 / 110.10 ±0.77 / 110.96 ms │     no change │
│ QQuery 53 │        104.62 / 105.64 ±1.10 / 107.73 ms │         104.07 / 104.74 ±0.54 / 105.69 ms │     no change │
│ QQuery 54 │        149.20 / 151.72 ±1.57 / 153.93 ms │         148.23 / 149.33 ±0.84 / 150.54 ms │     no change │
│ QQuery 55 │        107.44 / 109.58 ±1.84 / 112.25 ms │         108.66 / 109.76 ±0.67 / 110.57 ms │     no change │
│ QQuery 56 │        140.98 / 144.07 ±2.29 / 147.58 ms │         142.51 / 145.21 ±2.20 / 148.40 ms │     no change │
│ QQuery 57 │        174.73 / 177.74 ±2.30 / 181.53 ms │         176.21 / 178.75 ±1.92 / 182.09 ms │     no change │
│ QQuery 58 │        296.62 / 299.49 ±2.62 / 304.22 ms │         293.92 / 299.65 ±6.07 / 310.52 ms │     no change │
│ QQuery 59 │        201.64 / 203.67 ±1.17 / 204.95 ms │         200.42 / 202.67 ±1.68 / 204.76 ms │     no change │
│ QQuery 60 │        144.32 / 145.47 ±1.11 / 146.96 ms │         145.76 / 147.27 ±1.68 / 149.56 ms │     no change │
│ QQuery 61 │           13.40 / 13.68 ±0.32 / 14.21 ms │            13.52 / 13.80 ±0.25 / 14.21 ms │     no change │
│ QQuery 62 │      917.49 / 969.47 ±31.93 / 1012.15 ms │       909.86 / 940.20 ±34.01 / 1005.96 ms │     no change │
│ QQuery 63 │        102.76 / 105.09 ±1.59 / 107.33 ms │         105.51 / 106.84 ±1.27 / 108.79 ms │     no change │
│ QQuery 64 │        699.39 / 704.27 ±4.71 / 710.91 ms │         695.69 / 702.11 ±4.87 / 708.56 ms │     no change │
│ QQuery 65 │        267.81 / 270.51 ±2.68 / 275.65 ms │         264.08 / 266.55 ±1.84 / 269.28 ms │     no change │
│ QQuery 66 │        252.58 / 261.43 ±8.61 / 271.97 ms │         252.22 / 261.07 ±5.75 / 267.11 ms │     no change │
│ QQuery 67 │        315.54 / 327.77 ±8.16 / 336.44 ms │         324.90 / 326.54 ±1.56 / 328.45 ms │     no change │
│ QQuery 68 │            9.93 / 11.59 ±1.39 / 13.55 ms │            11.37 / 12.03 ±0.42 / 12.65 ms │     no change │
│ QQuery 69 │        103.94 / 105.06 ±1.15 / 106.88 ms │         104.48 / 107.07 ±2.06 / 109.30 ms │     no change │
│ QQuery 70 │       325.61 / 349.88 ±14.55 / 369.65 ms │         360.98 / 369.08 ±7.01 / 378.65 ms │  1.05x slower │
│ QQuery 71 │        136.25 / 139.69 ±4.27 / 148.04 ms │         134.49 / 137.33 ±2.17 / 140.13 ms │     no change │
│ QQuery 72 │       631.27 / 644.57 ±10.03 / 654.27 ms │     3224.68 / 3303.87 ±41.92 / 3343.97 ms │  5.13x slower │
│ QQuery 73 │              7.01 / 8.18 ±1.10 / 9.99 ms │              7.56 / 8.89 ±0.97 / 10.07 ms │  1.09x slower │
│ QQuery 74 │        631.33 / 635.03 ±3.65 / 640.88 ms │         638.45 / 648.92 ±8.96 / 659.56 ms │     no change │
│ QQuery 75 │        280.59 / 283.43 ±1.91 / 286.55 ms │         279.43 / 283.06 ±2.74 / 286.19 ms │     no change │
│ QQuery 76 │        134.37 / 135.35 ±1.10 / 137.16 ms │         131.74 / 135.45 ±2.15 / 137.77 ms │     no change │
│ QQuery 77 │        188.53 / 190.93 ±2.67 / 195.68 ms │         188.45 / 191.14 ±1.83 / 193.30 ms │     no change │
│ QQuery 78 │        350.04 / 353.68 ±2.48 / 357.68 ms │         351.20 / 356.89 ±3.79 / 362.64 ms │     no change │
│ QQuery 79 │        238.31 / 244.67 ±3.95 / 249.32 ms │         246.32 / 248.66 ±3.13 / 254.76 ms │     no change │
│ QQuery 80 │        325.79 / 328.11 ±2.14 / 331.55 ms │         322.59 / 328.01 ±5.48 / 336.60 ms │     no change │
│ QQuery 81 │           26.61 / 28.23 ±1.28 / 30.20 ms │            26.69 / 28.05 ±0.78 / 28.83 ms │     no change │
│ QQuery 82 │        200.20 / 201.28 ±0.95 / 203.02 ms │         201.73 / 203.23 ±2.10 / 207.35 ms │     no change │
│ QQuery 83 │           39.65 / 40.63 ±1.23 / 42.85 ms │            39.48 / 40.14 ±0.67 / 41.43 ms │     no change │
│ QQuery 84 │           50.05 / 51.18 ±0.95 / 52.45 ms │            49.80 / 50.38 ±0.47 / 51.06 ms │     no change │
│ QQuery 85 │        150.07 / 151.32 ±1.63 / 154.31 ms │         151.64 / 153.39 ±1.40 / 155.66 ms │     no change │
│ QQuery 86 │           39.46 / 40.90 ±0.92 / 42.09 ms │            39.80 / 40.70 ±0.46 / 41.15 ms │     no change │
│ QQuery 87 │           84.65 / 90.97 ±5.20 / 97.63 ms │            86.86 / 90.93 ±3.64 / 97.69 ms │     no change │
│ QQuery 88 │        101.82 / 103.50 ±1.44 / 105.76 ms │         102.38 / 103.76 ±1.32 / 105.87 ms │     no change │
│ QQuery 89 │        119.48 / 121.03 ±1.20 / 123.07 ms │         120.23 / 121.92 ±1.14 / 123.33 ms │     no change │
│ QQuery 90 │           24.24 / 24.73 ±0.31 / 24.99 ms │            24.15 / 24.97 ±0.56 / 25.71 ms │     no change │
│ QQuery 91 │           65.04 / 65.96 ±0.67 / 67.13 ms │            64.12 / 65.41 ±1.19 / 67.23 ms │     no change │
│ QQuery 92 │           58.60 / 59.91 ±1.42 / 62.09 ms │            57.83 / 58.98 ±0.69 / 59.82 ms │     no change │
│ QQuery 93 │        191.50 / 194.73 ±1.89 / 196.34 ms │         190.36 / 192.97 ±2.24 / 196.15 ms │     no change │
│ QQuery 94 │           61.71 / 63.17 ±1.37 / 65.51 ms │            62.95 / 64.33 ±1.40 / 66.98 ms │     no change │
│ QQuery 95 │        131.47 / 132.82 ±0.78 / 133.90 ms │          99.53 / 100.71 ±0.63 / 101.26 ms │ +1.32x faster │
│ QQuery 96 │           74.82 / 75.25 ±0.44 / 76.03 ms │            71.29 / 75.21 ±2.25 / 77.81 ms │     no change │
│ QQuery 97 │        127.11 / 130.63 ±3.04 / 134.26 ms │         127.13 / 129.78 ±1.91 / 131.88 ms │     no change │
│ QQuery 98 │        157.22 / 161.33 ±2.07 / 162.60 ms │         157.53 / 159.58 ±1.56 / 161.73 ms │     no change │
│ QQuery 99 │ 10864.11 / 10904.87 ±35.39 / 10961.30 ms │  10865.66 / 10930.68 ±36.39 / 10967.71 ms │     no change │
└───────────┴──────────────────────────────────────────┴───────────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                        ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                                        │ 32274.96ms │
│ Total Time (codex_hash-join-empty-partition-reporting)   │ 34947.05ms │
│ Average Time (HEAD)                                      │   326.01ms │
│ Average Time (codex_hash-join-empty-partition-reporting) │   353.00ms │
│ Queries Faster                                           │          6 │
│ Queries Slower                                           │          6 │
│ Queries with No Change                                   │         87 │
│ Queries with Failure                                     │          0 │
└──────────────────────────────────────────────────────────┴────────────┘

Resource Usage

tpcds — base (merge-base)

Metric Value
Wall time 161.7s
Peak memory 5.4 GiB
Avg memory 4.4 GiB
CPU user 267.0s
CPU sys 17.7s
Peak spill 0 B

tpcds — branch

Metric Value
Wall time 175.0s
Peak memory 5.6 GiB
Avg memory 4.6 GiB
CPU user 417.3s
CPU sys 20.2s
Peak spill 0 B

File an issue against this benchmark runner

@adriangb
Copy link
Copy Markdown
Contributor Author

run benchmark tpcds

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4260131019-1367-smhnv 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing codex/hash-join-empty-partition-reporting (d17d5e4) to 5c653be (merge-base) diff using: tpcds
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and codex_hash-join-empty-partition-reporting
--------------------
Benchmark tpcds_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                     HEAD ┃ codex_hash-join-empty-partition-reporting ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │              6.55 / 7.03 ±0.79 / 8.60 ms │               6.62 / 7.11 ±0.87 / 8.84 ms │     no change │
│ QQuery 2  │        143.53 / 143.94 ±0.32 / 144.23 ms │         142.69 / 144.27 ±0.86 / 145.12 ms │     no change │
│ QQuery 3  │        113.45 / 113.89 ±0.36 / 114.35 ms │         113.86 / 114.43 ±0.60 / 115.53 ms │     no change │
│ QQuery 4  │    1334.29 / 1362.57 ±21.81 / 1394.94 ms │     1360.31 / 1390.10 ±16.45 / 1405.65 ms │     no change │
│ QQuery 5  │        170.59 / 172.99 ±1.76 / 175.28 ms │         172.45 / 174.47 ±1.77 / 177.23 ms │     no change │
│ QQuery 6  │       826.41 / 876.05 ±36.80 / 938.37 ms │        862.48 / 884.01 ±15.84 / 909.86 ms │     no change │
│ QQuery 7  │        339.57 / 344.15 ±2.45 / 346.94 ms │         340.62 / 345.25 ±2.84 / 348.46 ms │     no change │
│ QQuery 8  │        115.88 / 116.71 ±0.58 / 117.66 ms │         116.03 / 116.77 ±0.46 / 117.50 ms │     no change │
│ QQuery 9  │        100.96 / 103.05 ±2.10 / 106.73 ms │         100.29 / 106.99 ±5.92 / 114.42 ms │     no change │
│ QQuery 10 │        106.11 / 106.48 ±0.35 / 107.03 ms │         107.15 / 108.84 ±1.32 / 110.87 ms │     no change │
│ QQuery 11 │        930.37 / 943.58 ±8.27 / 952.46 ms │         956.74 / 960.95 ±3.40 / 966.02 ms │     no change │
│ QQuery 12 │           44.35 / 46.11 ±1.12 / 47.41 ms │            45.31 / 46.13 ±0.70 / 47.37 ms │     no change │
│ QQuery 13 │        399.86 / 401.27 ±1.29 / 402.88 ms │         400.47 / 403.82 ±2.05 / 406.38 ms │     no change │
│ QQuery 14 │     992.83 / 1007.54 ±10.23 / 1022.21 ms │        991.24 / 996.86 ±4.77 / 1002.00 ms │     no change │
│ QQuery 15 │           16.18 / 16.98 ±0.81 / 18.47 ms │            12.63 / 13.16 ±0.69 / 14.53 ms │ +1.29x faster │
│ QQuery 16 │              6.92 / 7.61 ±0.88 / 9.31 ms │               7.21 / 7.59 ±0.42 / 8.41 ms │     no change │
│ QQuery 17 │        227.27 / 230.80 ±1.97 / 233.06 ms │         260.00 / 262.30 ±1.30 / 263.95 ms │  1.14x slower │
│ QQuery 18 │        126.07 / 127.16 ±1.03 / 128.81 ms │         120.95 / 122.72 ±2.12 / 126.75 ms │     no change │
│ QQuery 19 │        153.77 / 155.14 ±0.79 / 156.14 ms │         155.05 / 156.46 ±0.77 / 157.17 ms │     no change │
│ QQuery 20 │           13.55 / 14.26 ±0.58 / 15.09 ms │            13.69 / 14.39 ±0.50 / 15.09 ms │     no change │
│ QQuery 21 │           18.98 / 19.52 ±0.48 / 20.33 ms │            19.28 / 20.07 ±0.57 / 20.97 ms │     no change │
│ QQuery 22 │        489.02 / 491.75 ±2.21 / 494.72 ms │         479.34 / 485.29 ±5.35 / 493.36 ms │     no change │
│ QQuery 23 │        874.09 / 884.62 ±7.47 / 896.56 ms │         860.00 / 872.32 ±9.91 / 885.16 ms │     no change │
│ QQuery 24 │        380.25 / 383.23 ±2.75 / 388.16 ms │         382.04 / 385.17 ±2.55 / 388.46 ms │     no change │
│ QQuery 25 │        338.10 / 340.05 ±1.53 / 341.97 ms │         310.76 / 312.97 ±2.58 / 316.31 ms │ +1.09x faster │
│ QQuery 26 │           80.05 / 82.05 ±1.91 / 85.51 ms │            80.82 / 83.45 ±1.46 / 85.20 ms │     no change │
│ QQuery 27 │              6.95 / 7.61 ±0.60 / 8.50 ms │               7.01 / 7.19 ±0.17 / 7.44 ms │ +1.06x faster │
│ QQuery 28 │        148.91 / 150.04 ±1.18 / 151.93 ms │         148.98 / 150.92 ±1.79 / 153.92 ms │     no change │
│ QQuery 29 │        280.58 / 283.06 ±1.41 / 284.44 ms │         253.40 / 255.68 ±1.39 / 257.33 ms │ +1.11x faster │
│ QQuery 30 │           41.88 / 44.45 ±1.89 / 47.71 ms │            43.93 / 44.97 ±0.98 / 46.25 ms │     no change │
│ QQuery 31 │        168.14 / 171.18 ±1.55 / 172.37 ms │         168.34 / 172.03 ±1.89 / 173.49 ms │     no change │
│ QQuery 32 │           57.14 / 57.93 ±0.64 / 58.58 ms │            57.35 / 58.01 ±0.83 / 59.45 ms │     no change │
│ QQuery 33 │        139.01 / 140.87 ±1.01 / 142.11 ms │         141.92 / 143.19 ±1.44 / 145.94 ms │     no change │
│ QQuery 34 │              6.96 / 7.11 ±0.18 / 7.44 ms │               6.80 / 7.23 ±0.35 / 7.73 ms │     no change │
│ QQuery 35 │        105.44 / 107.39 ±1.24 / 109.26 ms │         108.27 / 109.72 ±1.22 / 111.38 ms │     no change │
│ QQuery 36 │              6.65 / 6.84 ±0.16 / 7.06 ms │               6.70 / 6.83 ±0.14 / 7.06 ms │     no change │
│ QQuery 37 │              8.23 / 8.76 ±0.28 / 8.98 ms │               8.22 / 8.54 ±0.28 / 9.05 ms │     no change │
│ QQuery 38 │           84.46 / 88.06 ±4.03 / 95.63 ms │            83.88 / 86.97 ±3.18 / 92.75 ms │     no change │
│ QQuery 39 │        120.84 / 126.50 ±3.14 / 129.43 ms │         126.61 / 128.15 ±1.22 / 129.94 ms │     no change │
│ QQuery 40 │        109.43 / 114.57 ±4.81 / 121.98 ms │         109.14 / 114.41 ±4.76 / 122.94 ms │     no change │
│ QQuery 41 │           14.34 / 15.35 ±0.89 / 16.68 ms │            14.72 / 16.00 ±0.91 / 17.31 ms │     no change │
│ QQuery 42 │        106.72 / 109.59 ±1.60 / 111.52 ms │         107.04 / 108.95 ±1.25 / 110.85 ms │     no change │
│ QQuery 43 │              5.95 / 6.24 ±0.19 / 6.52 ms │               5.91 / 6.14 ±0.21 / 6.53 ms │     no change │
│ QQuery 44 │           11.92 / 12.35 ±0.45 / 13.22 ms │            11.54 / 11.92 ±0.33 / 12.43 ms │     no change │
│ QQuery 45 │           51.10 / 51.66 ±0.35 / 52.08 ms │            47.86 / 48.45 ±0.38 / 48.86 ms │ +1.07x faster │
│ QQuery 46 │              8.34 / 8.64 ±0.28 / 9.11 ms │               8.26 / 8.72 ±0.27 / 9.07 ms │     no change │
│ QQuery 47 │        689.32 / 694.93 ±3.78 / 700.37 ms │         703.02 / 717.25 ±8.67 / 727.94 ms │     no change │
│ QQuery 48 │        284.37 / 291.10 ±3.59 / 294.46 ms │         285.49 / 290.64 ±3.24 / 294.21 ms │     no change │
│ QQuery 49 │        252.09 / 254.02 ±1.30 / 256.01 ms │         251.74 / 254.61 ±2.33 / 258.33 ms │     no change │
│ QQuery 50 │        222.61 / 225.53 ±3.69 / 232.47 ms │         219.64 / 226.93 ±4.47 / 232.77 ms │     no change │
│ QQuery 51 │        177.97 / 181.66 ±3.63 / 188.50 ms │         182.49 / 184.50 ±1.53 / 186.33 ms │     no change │
│ QQuery 52 │        107.14 / 107.61 ±0.43 / 108.39 ms │         106.56 / 108.59 ±1.36 / 110.46 ms │     no change │
│ QQuery 53 │        102.58 / 103.15 ±0.61 / 104.26 ms │         102.60 / 105.01 ±1.30 / 106.57 ms │     no change │
│ QQuery 54 │        145.21 / 147.67 ±1.98 / 150.22 ms │         145.60 / 147.41 ±1.87 / 150.93 ms │     no change │
│ QQuery 55 │        106.03 / 108.38 ±1.53 / 109.99 ms │         106.12 / 108.19 ±1.59 / 110.14 ms │     no change │
│ QQuery 56 │        138.98 / 141.90 ±1.82 / 143.74 ms │         140.54 / 142.74 ±1.40 / 144.46 ms │     no change │
│ QQuery 57 │        171.93 / 174.45 ±1.86 / 177.43 ms │         172.63 / 175.75 ±2.53 / 178.75 ms │     no change │
│ QQuery 58 │        289.22 / 298.36 ±5.56 / 305.43 ms │         290.79 / 299.43 ±6.11 / 309.87 ms │     no change │
│ QQuery 59 │        200.66 / 203.01 ±1.31 / 204.31 ms │         197.14 / 199.53 ±3.23 / 205.71 ms │     no change │
│ QQuery 60 │        144.39 / 146.08 ±1.11 / 147.39 ms │        143.75 / 154.53 ±12.18 / 172.91 ms │  1.06x slower │
│ QQuery 61 │           13.22 / 13.48 ±0.21 / 13.78 ms │            13.18 / 13.70 ±0.36 / 14.15 ms │     no change │
│ QQuery 62 │    1026.94 / 1087.68 ±51.27 / 1182.26 ms │       917.43 / 941.71 ±38.20 / 1016.17 ms │ +1.16x faster │
│ QQuery 63 │        107.07 / 109.36 ±1.40 / 111.17 ms │         102.50 / 105.94 ±1.90 / 107.72 ms │     no change │
│ QQuery 64 │        701.02 / 709.35 ±8.82 / 726.37 ms │         676.43 / 681.30 ±3.51 / 685.59 ms │     no change │
│ QQuery 65 │        259.07 / 266.35 ±4.42 / 270.99 ms │         251.78 / 266.12 ±7.43 / 273.09 ms │     no change │
│ QQuery 66 │        253.83 / 260.64 ±4.46 / 266.95 ms │         251.75 / 263.38 ±6.36 / 269.18 ms │     no change │
│ QQuery 67 │        309.00 / 324.33 ±9.90 / 340.10 ms │         313.28 / 322.40 ±8.67 / 338.52 ms │     no change │
│ QQuery 68 │             9.37 / 9.98 ±0.50 / 10.69 ms │             9.49 / 10.11 ±0.69 / 11.36 ms │     no change │
│ QQuery 69 │        100.90 / 104.02 ±2.91 / 109.46 ms │         104.87 / 107.05 ±2.16 / 110.54 ms │     no change │
│ QQuery 70 │       318.57 / 341.84 ±18.33 / 372.24 ms │         337.85 / 347.83 ±9.34 / 360.11 ms │     no change │
│ QQuery 71 │        135.34 / 136.48 ±1.35 / 139.11 ms │         138.33 / 140.65 ±1.35 / 141.78 ms │     no change │
│ QQuery 72 │       609.61 / 631.81 ±18.10 / 656.17 ms │     3158.98 / 3200.01 ±30.95 / 3249.97 ms │  5.06x slower │
│ QQuery 73 │             7.45 / 9.15 ±1.31 / 11.25 ms │               6.50 / 7.32 ±0.54 / 8.02 ms │ +1.25x faster │
│ QQuery 74 │        638.77 / 647.33 ±5.31 / 653.18 ms │        587.67 / 610.11 ±17.70 / 631.91 ms │ +1.06x faster │
│ QQuery 75 │        278.30 / 282.95 ±3.00 / 286.74 ms │         273.74 / 277.11 ±2.04 / 279.36 ms │     no change │
│ QQuery 76 │        132.14 / 134.00 ±1.62 / 136.85 ms │         130.93 / 132.38 ±1.59 / 135.32 ms │     no change │
│ QQuery 77 │        188.88 / 191.01 ±1.22 / 192.51 ms │         186.77 / 190.45 ±2.17 / 193.32 ms │     no change │
│ QQuery 78 │        343.14 / 347.84 ±3.52 / 352.99 ms │         337.85 / 342.34 ±5.76 / 353.73 ms │     no change │
│ QQuery 79 │        240.90 / 243.12 ±1.40 / 245.01 ms │         229.94 / 234.76 ±3.13 / 239.09 ms │     no change │
│ QQuery 80 │        323.29 / 325.41 ±1.58 / 327.51 ms │         321.08 / 324.22 ±3.30 / 329.16 ms │     no change │
│ QQuery 81 │           26.82 / 27.71 ±0.81 / 29.12 ms │            28.77 / 29.65 ±0.81 / 30.90 ms │  1.07x slower │
│ QQuery 82 │        196.24 / 200.34 ±2.47 / 203.64 ms │         198.51 / 202.02 ±2.38 / 204.68 ms │     no change │
│ QQuery 83 │           38.59 / 39.23 ±0.66 / 40.24 ms │            38.61 / 39.22 ±0.68 / 40.50 ms │     no change │
│ QQuery 84 │           48.71 / 49.00 ±0.18 / 49.21 ms │            48.11 / 49.10 ±0.50 / 49.45 ms │     no change │
│ QQuery 85 │        147.82 / 149.07 ±1.22 / 151.32 ms │         149.45 / 150.01 ±0.32 / 150.37 ms │     no change │
│ QQuery 86 │           38.46 / 39.61 ±1.01 / 41.13 ms │            39.77 / 40.59 ±0.93 / 42.30 ms │     no change │
│ QQuery 87 │           85.43 / 87.09 ±1.43 / 89.51 ms │            86.33 / 88.41 ±2.34 / 91.86 ms │     no change │
│ QQuery 88 │         99.07 / 100.54 ±1.34 / 103.04 ms │          99.85 / 100.60 ±0.91 / 102.37 ms │     no change │
│ QQuery 89 │        119.81 / 120.85 ±0.83 / 121.94 ms │         118.38 / 120.15 ±1.25 / 121.74 ms │     no change │
│ QQuery 90 │           23.63 / 24.41 ±0.73 / 25.79 ms │            23.00 / 23.65 ±0.52 / 24.39 ms │     no change │
│ QQuery 91 │           61.29 / 63.25 ±1.75 / 65.81 ms │            61.33 / 63.97 ±1.50 / 65.77 ms │     no change │
│ QQuery 92 │           57.52 / 59.49 ±1.18 / 61.23 ms │            57.33 / 57.98 ±0.55 / 58.69 ms │     no change │
│ QQuery 93 │        187.02 / 188.50 ±1.23 / 190.50 ms │         186.69 / 188.54 ±1.05 / 189.88 ms │     no change │
│ QQuery 94 │           61.18 / 61.91 ±0.51 / 62.56 ms │            61.57 / 62.17 ±0.67 / 63.47 ms │     no change │
│ QQuery 95 │        126.74 / 129.68 ±1.77 / 131.27 ms │            96.36 / 97.12 ±0.61 / 98.21 ms │ +1.34x faster │
│ QQuery 96 │           71.55 / 73.60 ±1.32 / 75.30 ms │            73.49 / 74.96 ±1.49 / 77.72 ms │     no change │
│ QQuery 97 │        127.56 / 128.98 ±1.06 / 130.82 ms │         123.50 / 125.88 ±1.45 / 128.07 ms │     no change │
│ QQuery 98 │        153.79 / 154.99 ±0.75 / 156.09 ms │         151.88 / 154.09 ±1.87 / 157.21 ms │     no change │
│ QQuery 99 │ 10793.82 / 10824.89 ±19.01 / 10848.29 ms │  10821.63 / 10881.53 ±35.06 / 10929.49 ms │     no change │
└───────────┴──────────────────────────────────────────┴───────────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                        ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                                        │ 31793.82ms │
│ Total Time (codex_hash-join-empty-partition-reporting)   │ 34213.55ms │
│ Average Time (HEAD)                                      │   321.15ms │
│ Average Time (codex_hash-join-empty-partition-reporting) │   345.59ms │
│ Queries Faster                                           │          9 │
│ Queries Slower                                           │          4 │
│ Queries with No Change                                   │         86 │
│ Queries with Failure                                     │          0 │
└──────────────────────────────────────────────────────────┴────────────┘

Resource Usage

tpcds — base (merge-base)

Metric Value
Wall time 159.3s
Peak memory 5.6 GiB
Avg memory 4.7 GiB
CPU user 262.1s
CPU sys 17.2s
Peak spill 0 B

tpcds — branch

Metric Value
Wall time 171.4s
Peak memory 5.6 GiB
Avg memory 4.7 GiB
CPU user 407.4s
CPU sys 18.8s
Peak spill 0 B

File an issue against this benchmark runner

@Omega359
Copy link
Copy Markdown
Contributor

Omega359 commented Apr 16, 2026

query 72 took a bit of a hit here.

Query72
select  i_item_desc
      ,w_warehouse_name
      ,d1.d_week_seq
      ,sum(case when p_promo_sk is null then 1 else 0 end) no_promo
      ,sum(case when p_promo_sk is not null then 1 else 0 end) promo
      ,count(*) total_cnt
from catalog_sales
join inventory on (cs_item_sk = inv_item_sk)
join warehouse on (w_warehouse_sk=inv_warehouse_sk)
join item on (i_item_sk = cs_item_sk)
join customer_demographics on (cs_bill_cdemo_sk = cd_demo_sk)
join household_demographics on (cs_bill_hdemo_sk = hd_demo_sk)
join date_dim d1 on (cs_sold_date_sk = d1.d_date_sk)
join date_dim d2 on (inv_date_sk = d2.d_date_sk)
join date_dim d3 on (cs_ship_date_sk = d3.d_date_sk)
left outer join promotion on (cs_promo_sk=p_promo_sk)
left outer join catalog_returns on (cr_item_sk = cs_item_sk and cr_order_number = cs_order_number)
where d1.d_week_seq = d2.d_week_seq
  and inv_quantity_on_hand < cs_quantity
  and d3.d_date > (d1.d_date + INTERVAL '5 days')
  and hd_buy_potential = '1001-5000'
  and d1.d_year = 2001
  and cd_marital_status = 'M'
group by i_item_desc,w_warehouse_name,d1.d_week_seq
order by total_cnt desc, i_item_desc, w_warehouse_name, d_week_seq
limit 100;

@adriangb
Copy link
Copy Markdown
Contributor Author

Yes. And I think this was another bandaid. But it's closer to the root cause than previous attempts. This has to do with cancellation when multiple joins are involved.

TLDR I think what is happening is when you have multiple joins you end up with a tree of operators. One of the joins up higher in the tree hits the new optimization and aborts work, dropping tasks that would have polled downstream joins. But not the downstream join is stuck waiting for all of it's partition tasks to finish even though they never will. I think we were all operating under the assumption that the issue was within a single join operator but really it's an issue any time an upstream operator cancels on a join.

I think the real solution is to track when a join build partition task gets dropped and report that to the dynamic filter building so that it doesn't wait for that partition to report.

@adriangb
Copy link
Copy Markdown
Contributor Author

run benchmark tpcds

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4260950110-1371-8wgd5 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing codex/hash-join-empty-partition-reporting (0584854) to 5c653be (merge-base) diff using: tpcds
Results will be posted here when complete


File an issue against this benchmark runner

@adriangb adriangb changed the title Report empty build partitions for partitioned hash join filters Handle canceled partitioned hash join dynamic filters lazily Apr 16, 2026
@adriangb adriangb requested a review from Copilot April 16, 2026 14:48
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a hang in partitioned hash join dynamic filter coordination by allowing partitions that are dropped early to be treated as “canceled” and not block filter finalization (DataFusion issue #21625).

Changes:

  • Add cancellation tracking for partitioned build-side reports and treat canceled partitions as true in the synthesized partitioned dynamic filter.
  • Mark partitioned HashJoinStream partitions as canceled on Drop when they never reported build data.
  • Add a regression test covering the early-completing RightSemi parent join scenario that previously hung.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
datafusion/physical-plan/src/joins/hash_join/stream.rs Track whether build info was reported and report partition cancellation to the coordinator on Drop.
datafusion/physical-plan/src/joins/hash_join/shared_bounds.rs Replace barrier-based coordination with explicit partition status + notify-based completion; synthesize filters that handle canceled partitions.
datafusion/physical-plan/src/joins/hash_join/exec.rs Refactor dynamic-filter accumulator initialization and add a regression test for the cancellation/hang scenario.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

enum CompletionState {
Pending,
Finalizing,
Ready(std::result::Result<(), String>),
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CompletionState::Ready stores errors as Result<(), String>, which forces later callers to lose the original DataFusionError variant/backtrace/context. Consider storing Result<(), Arc<DataFusionError>> (or datafusion_common::SharedResult<()>) in CompletionState instead, so you can propagate DataFusionError::Shared(...) to all waiters without stringifying.

Suggested change
Ready(std::result::Result<(), String>),
Ready(std::result::Result<(), Arc<datafusion_common::DataFusionError>>),

Copilot uses AI. Check for mistakes.
Comment thread datafusion/physical-plan/src/joins/hash_join/shared_bounds.rs
Comment on lines 391 to +394
/// # Returns
/// * `Result<()>` - Ok if successful, Err if filter update failed or mode mismatch
pub(crate) async fn report_build_data(&self, data: PartitionBuildData) -> Result<()> {
// Store data in the accumulator
{
let finalize_input = {
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The report_build_data doc comment still says "have reported (barrier wait)", but this method no longer uses tokio::sync::Barrier (it uses Notify/CompletionState). Please update the docs to match the current synchronization mechanism so the comment doesn’t mislead future changes.

Copilot uses AI. Check for mistakes.
@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and codex_hash-join-empty-partition-reporting
--------------------
Benchmark tpcds_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                     HEAD ┃ codex_hash-join-empty-partition-reporting ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │              6.92 / 7.40 ±0.82 / 9.04 ms │               7.00 / 7.45 ±0.78 / 9.01 ms │     no change │
│ QQuery 2  │        144.54 / 145.39 ±0.79 / 146.62 ms │         147.27 / 148.61 ±0.75 / 149.50 ms │     no change │
│ QQuery 3  │        115.14 / 115.85 ±0.68 / 116.87 ms │         114.50 / 115.93 ±1.13 / 117.57 ms │     no change │
│ QQuery 4  │    1335.48 / 1382.00 ±36.82 / 1440.86 ms │     1402.24 / 1427.26 ±15.58 / 1447.66 ms │     no change │
│ QQuery 5  │        174.17 / 175.13 ±0.83 / 176.53 ms │         172.07 / 174.75 ±2.27 / 177.94 ms │     no change │
│ QQuery 6  │       853.20 / 880.67 ±22.38 / 913.37 ms │        863.36 / 885.51 ±22.93 / 927.68 ms │     no change │
│ QQuery 7  │        344.50 / 348.21 ±4.06 / 355.46 ms │         346.17 / 347.70 ±0.89 / 348.94 ms │     no change │
│ QQuery 8  │        115.84 / 116.59 ±0.68 / 117.69 ms │         116.73 / 118.22 ±1.22 / 119.85 ms │     no change │
│ QQuery 9  │        102.18 / 105.91 ±2.54 / 108.50 ms │         100.90 / 106.70 ±7.24 / 120.08 ms │     no change │
│ QQuery 10 │        107.43 / 108.45 ±0.84 / 109.44 ms │         106.39 / 108.37 ±1.16 / 109.59 ms │     no change │
│ QQuery 11 │        969.56 / 981.48 ±7.26 / 988.61 ms │       954.27 / 979.77 ±14.85 / 1000.34 ms │     no change │
│ QQuery 12 │           45.11 / 46.95 ±1.34 / 49.09 ms │            44.90 / 45.78 ±0.72 / 46.82 ms │     no change │
│ QQuery 13 │        403.49 / 408.01 ±3.30 / 412.85 ms │         400.55 / 404.63 ±2.29 / 407.48 ms │     no change │
│ QQuery 14 │     1006.48 / 1017.03 ±8.22 / 1025.61 ms │      1010.30 / 1015.54 ±6.32 / 1026.77 ms │     no change │
│ QQuery 15 │           16.63 / 17.24 ±0.63 / 18.26 ms │            16.87 / 17.94 ±0.77 / 18.73 ms │     no change │
│ QQuery 16 │              7.72 / 8.54 ±0.56 / 9.30 ms │               7.22 / 7.73 ±0.54 / 8.75 ms │ +1.11x faster │
│ QQuery 17 │        231.06 / 233.39 ±1.70 / 235.79 ms │         231.81 / 233.42 ±0.94 / 234.62 ms │     no change │
│ QQuery 18 │        129.42 / 130.07 ±0.45 / 130.62 ms │         131.82 / 133.23 ±1.06 / 134.94 ms │     no change │
│ QQuery 19 │        155.63 / 156.89 ±1.15 / 158.85 ms │         154.75 / 157.32 ±1.32 / 158.53 ms │     no change │
│ QQuery 20 │           14.37 / 15.27 ±1.16 / 17.56 ms │            13.87 / 14.71 ±0.69 / 15.70 ms │     no change │
│ QQuery 21 │           19.53 / 19.89 ±0.36 / 20.41 ms │            19.62 / 20.04 ±0.26 / 20.40 ms │     no change │
│ QQuery 22 │        486.19 / 493.40 ±8.86 / 510.86 ms │         492.47 / 496.82 ±2.72 / 500.59 ms │     no change │
│ QQuery 23 │        881.36 / 890.22 ±6.35 / 900.98 ms │        872.74 / 885.76 ±16.25 / 917.42 ms │     no change │
│ QQuery 24 │        388.20 / 389.79 ±2.17 / 394.08 ms │         384.80 / 388.73 ±4.76 / 398.03 ms │     no change │
│ QQuery 25 │        341.61 / 344.05 ±1.24 / 345.01 ms │         343.50 / 344.98 ±1.24 / 346.45 ms │     no change │
│ QQuery 26 │           82.30 / 82.66 ±0.48 / 83.58 ms │            83.14 / 84.50 ±0.98 / 85.88 ms │     no change │
│ QQuery 27 │              6.96 / 7.69 ±0.64 / 8.61 ms │               7.28 / 7.40 ±0.11 / 7.54 ms │     no change │
│ QQuery 28 │        150.81 / 152.63 ±1.52 / 154.95 ms │         149.78 / 150.58 ±0.85 / 152.10 ms │     no change │
│ QQuery 29 │        285.06 / 286.51 ±1.33 / 288.63 ms │         282.04 / 285.54 ±1.92 / 287.21 ms │     no change │
│ QQuery 30 │           45.50 / 46.31 ±0.59 / 47.04 ms │            43.61 / 45.80 ±1.39 / 47.98 ms │     no change │
│ QQuery 31 │        174.26 / 175.97 ±1.17 / 177.08 ms │         170.89 / 173.48 ±1.66 / 175.21 ms │     no change │
│ QQuery 32 │           58.35 / 59.11 ±0.83 / 60.36 ms │            57.92 / 58.65 ±0.54 / 59.58 ms │     no change │
│ QQuery 33 │        144.63 / 145.81 ±1.50 / 148.74 ms │         141.98 / 143.27 ±0.92 / 144.38 ms │     no change │
│ QQuery 34 │              7.41 / 7.67 ±0.23 / 7.98 ms │               7.15 / 7.52 ±0.29 / 7.88 ms │     no change │
│ QQuery 35 │        109.04 / 110.12 ±0.61 / 110.84 ms │         107.55 / 109.84 ±1.46 / 111.56 ms │     no change │
│ QQuery 36 │              6.86 / 7.09 ±0.23 / 7.47 ms │               6.57 / 6.89 ±0.39 / 7.61 ms │     no change │
│ QQuery 37 │              8.71 / 9.06 ±0.29 / 9.41 ms │               8.29 / 8.48 ±0.12 / 8.61 ms │ +1.07x faster │
│ QQuery 38 │           86.14 / 89.02 ±2.40 / 93.42 ms │            83.84 / 88.70 ±4.04 / 95.56 ms │     no change │
│ QQuery 39 │        129.89 / 132.81 ±2.41 / 136.09 ms │         127.71 / 129.86 ±1.91 / 132.57 ms │     no change │
│ QQuery 40 │        115.47 / 118.01 ±3.84 / 125.63 ms │         109.81 / 116.50 ±7.23 / 130.37 ms │     no change │
│ QQuery 41 │           15.11 / 16.82 ±1.63 / 19.73 ms │            14.14 / 15.39 ±0.86 / 16.42 ms │ +1.09x faster │
│ QQuery 42 │        109.75 / 111.47 ±1.31 / 112.93 ms │         107.93 / 109.57 ±1.38 / 111.45 ms │     no change │
│ QQuery 43 │              6.13 / 6.29 ±0.17 / 6.61 ms │               5.93 / 6.15 ±0.21 / 6.55 ms │     no change │
│ QQuery 44 │           12.58 / 12.96 ±0.26 / 13.25 ms │            12.34 / 12.56 ±0.15 / 12.80 ms │     no change │
│ QQuery 45 │           52.79 / 53.56 ±0.87 / 54.92 ms │            51.18 / 52.36 ±0.65 / 52.88 ms │     no change │
│ QQuery 46 │              8.78 / 9.15 ±0.30 / 9.60 ms │               8.74 / 8.93 ±0.16 / 9.17 ms │     no change │
│ QQuery 47 │        742.93 / 753.23 ±5.85 / 758.52 ms │         708.30 / 717.30 ±5.38 / 724.42 ms │     no change │
│ QQuery 48 │        290.05 / 295.39 ±3.74 / 301.60 ms │         285.59 / 288.81 ±2.62 / 292.36 ms │     no change │
│ QQuery 49 │        250.24 / 255.09 ±2.81 / 258.28 ms │         251.10 / 254.72 ±2.73 / 259.58 ms │     no change │
│ QQuery 50 │        223.40 / 234.27 ±6.15 / 241.49 ms │         218.87 / 224.72 ±5.18 / 233.14 ms │     no change │
│ QQuery 51 │        180.95 / 184.85 ±2.47 / 188.02 ms │         182.51 / 184.37 ±2.70 / 189.65 ms │     no change │
│ QQuery 52 │        109.08 / 110.67 ±1.28 / 112.66 ms │         108.18 / 109.57 ±1.54 / 111.78 ms │     no change │
│ QQuery 53 │        103.46 / 104.00 ±0.53 / 104.88 ms │         104.24 / 105.46 ±1.06 / 106.83 ms │     no change │
│ QQuery 54 │        148.06 / 149.13 ±0.68 / 149.96 ms │         149.61 / 150.82 ±0.71 / 151.58 ms │     no change │
│ QQuery 55 │        108.04 / 109.20 ±1.60 / 112.27 ms │         108.22 / 109.47 ±1.00 / 110.68 ms │     no change │
│ QQuery 56 │        144.70 / 145.47 ±1.23 / 147.92 ms │         142.57 / 143.39 ±0.73 / 144.48 ms │     no change │
│ QQuery 57 │        175.10 / 177.24 ±2.43 / 181.70 ms │         173.46 / 176.66 ±2.03 / 178.71 ms │     no change │
│ QQuery 58 │        295.65 / 312.83 ±9.63 / 324.54 ms │         293.44 / 299.98 ±4.50 / 305.03 ms │     no change │
│ QQuery 59 │        198.60 / 203.87 ±4.01 / 208.21 ms │         199.76 / 203.17 ±3.02 / 208.39 ms │     no change │
│ QQuery 60 │        145.57 / 147.62 ±1.30 / 149.01 ms │         144.75 / 146.38 ±1.00 / 147.76 ms │     no change │
│ QQuery 61 │           13.91 / 14.07 ±0.17 / 14.32 ms │            13.47 / 13.74 ±0.25 / 14.12 ms │     no change │
│ QQuery 62 │      910.53 / 952.12 ±38.07 / 1016.94 ms │        889.46 / 923.63 ±27.09 / 954.89 ms │     no change │
│ QQuery 63 │        105.01 / 106.27 ±1.50 / 109.21 ms │         105.84 / 107.25 ±1.09 / 109.01 ms │     no change │
│ QQuery 64 │        693.19 / 700.25 ±4.96 / 707.76 ms │         683.20 / 693.67 ±7.30 / 704.24 ms │     no change │
│ QQuery 65 │        256.32 / 258.81 ±2.14 / 261.65 ms │         251.02 / 254.74 ±5.12 / 264.08 ms │     no change │
│ QQuery 66 │       245.58 / 257.95 ±14.16 / 282.61 ms │        230.14 / 250.48 ±10.85 / 259.27 ms │     no change │
│ QQuery 67 │        320.13 / 324.04 ±3.78 / 330.13 ms │         309.64 / 318.64 ±8.28 / 332.71 ms │     no change │
│ QQuery 68 │            9.58 / 11.27 ±1.40 / 13.31 ms │              8.56 / 9.91 ±1.72 / 13.07 ms │ +1.14x faster │
│ QQuery 69 │        102.97 / 104.95 ±1.68 / 108.02 ms │          99.44 / 102.68 ±1.91 / 105.35 ms │     no change │
│ QQuery 70 │       344.34 / 357.33 ±12.58 / 380.42 ms │         340.10 / 349.58 ±7.08 / 360.43 ms │     no change │
│ QQuery 71 │        136.54 / 138.70 ±1.45 / 140.47 ms │         135.40 / 138.53 ±2.72 / 142.01 ms │     no change │
│ QQuery 72 │        612.25 / 621.71 ±6.86 / 630.78 ms │         627.79 / 634.31 ±5.88 / 644.35 ms │     no change │
│ QQuery 73 │              7.04 / 8.14 ±0.99 / 9.93 ms │               6.83 / 7.65 ±0.81 / 8.94 ms │ +1.06x faster │
│ QQuery 74 │        595.53 / 604.64 ±5.65 / 612.39 ms │        555.25 / 573.01 ±14.36 / 597.16 ms │ +1.06x faster │
│ QQuery 75 │        275.82 / 279.56 ±2.96 / 284.12 ms │         277.37 / 279.35 ±1.61 / 281.82 ms │     no change │
│ QQuery 76 │        132.50 / 134.20 ±1.70 / 137.20 ms │         134.73 / 135.28 ±0.49 / 135.88 ms │     no change │
│ QQuery 77 │        189.31 / 190.65 ±0.75 / 191.45 ms │         189.69 / 191.41 ±1.61 / 194.40 ms │     no change │
│ QQuery 78 │        343.15 / 346.69 ±2.56 / 349.66 ms │         346.51 / 350.83 ±2.82 / 353.82 ms │     no change │
│ QQuery 79 │        235.73 / 238.17 ±2.98 / 243.76 ms │         234.05 / 236.15 ±1.40 / 237.63 ms │     no change │
│ QQuery 80 │        322.76 / 325.99 ±2.78 / 331.08 ms │         323.23 / 324.45 ±0.78 / 325.24 ms │     no change │
│ QQuery 81 │           26.96 / 28.33 ±0.97 / 29.72 ms │            26.97 / 28.15 ±0.89 / 29.30 ms │     no change │
│ QQuery 82 │        201.49 / 203.50 ±2.11 / 206.09 ms │         199.24 / 201.41 ±2.23 / 205.59 ms │     no change │
│ QQuery 83 │           40.50 / 40.85 ±0.36 / 41.45 ms │            38.74 / 39.67 ±0.62 / 40.57 ms │     no change │
│ QQuery 84 │           48.94 / 50.73 ±0.95 / 51.76 ms │            48.23 / 48.85 ±0.51 / 49.79 ms │     no change │
│ QQuery 85 │        148.70 / 149.82 ±0.90 / 151.22 ms │         146.47 / 149.55 ±2.58 / 153.68 ms │     no change │
│ QQuery 86 │           39.34 / 41.24 ±1.54 / 43.37 ms │            39.48 / 39.97 ±0.47 / 40.77 ms │     no change │
│ QQuery 87 │           85.55 / 89.57 ±3.09 / 94.57 ms │            85.92 / 90.79 ±4.84 / 98.07 ms │     no change │
│ QQuery 88 │        101.46 / 103.45 ±2.06 / 106.73 ms │         100.73 / 101.37 ±0.63 / 102.18 ms │     no change │
│ QQuery 89 │        120.02 / 121.59 ±1.12 / 123.23 ms │         119.28 / 120.62 ±0.74 / 121.30 ms │     no change │
│ QQuery 90 │           24.12 / 24.79 ±0.43 / 25.43 ms │            23.89 / 24.38 ±0.63 / 25.60 ms │     no change │
│ QQuery 91 │           64.15 / 65.50 ±1.01 / 67.04 ms │            61.94 / 64.90 ±2.53 / 68.60 ms │     no change │
│ QQuery 92 │           58.17 / 60.01 ±1.14 / 61.32 ms │            57.94 / 58.96 ±0.75 / 59.84 ms │     no change │
│ QQuery 93 │        188.85 / 192.03 ±2.29 / 194.61 ms │         188.31 / 189.89 ±1.05 / 191.57 ms │     no change │
│ QQuery 94 │           63.51 / 64.16 ±0.69 / 65.42 ms │            61.83 / 63.45 ±1.36 / 65.15 ms │     no change │
│ QQuery 95 │        130.25 / 131.04 ±0.55 / 131.83 ms │         128.96 / 130.98 ±1.19 / 132.43 ms │     no change │
│ QQuery 96 │           71.56 / 75.16 ±1.87 / 76.72 ms │            74.33 / 74.93 ±0.67 / 76.23 ms │     no change │
│ QQuery 97 │        128.99 / 130.75 ±1.38 / 133.23 ms │         125.02 / 127.78 ±1.59 / 129.80 ms │     no change │
│ QQuery 98 │        156.22 / 157.55 ±0.81 / 158.67 ms │         154.80 / 157.76 ±1.98 / 160.27 ms │     no change │
│ QQuery 99 │ 10863.95 / 10885.69 ±22.13 / 10924.58 ms │  10794.69 / 10840.18 ±28.73 / 10881.09 ms │     no change │
└───────────┴──────────────────────────────────────────┴───────────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                        ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                                        │ 31952.71ms │
│ Total Time (codex_hash-join-empty-partition-reporting)   │ 31772.51ms │
│ Average Time (HEAD)                                      │   322.75ms │
│ Average Time (codex_hash-join-empty-partition-reporting) │   320.93ms │
│ Queries Faster                                           │          6 │
│ Queries Slower                                           │          0 │
│ Queries with No Change                                   │         93 │
│ Queries with Failure                                     │          0 │
└──────────────────────────────────────────────────────────┴────────────┘

Resource Usage

tpcds — base (merge-base)

Metric Value
Wall time 160.1s
Peak memory 5.7 GiB
Avg memory 4.5 GiB
CPU user 263.8s
CPU sys 17.7s
Peak spill 0 B

tpcds — branch

Metric Value
Wall time 159.2s
Peak memory 5.4 GiB
Avg memory 4.5 GiB
CPU user 262.8s
CPU sys 17.1s
Peak spill 0 B

File an issue against this benchmark runner

@adriangb
Copy link
Copy Markdown
Contributor Author

run benchmarks

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4261207193-1375-rv9zw 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing codex/hash-join-empty-partition-reporting (7011c5d) to 5c653be (merge-base) diff using: tpch
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4261207193-1374-6k5jz 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing codex/hash-join-empty-partition-reporting (7011c5d) to 5c653be (merge-base) diff using: tpcds
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4261207193-1373-pk7t6 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing codex/hash-join-empty-partition-reporting (7011c5d) to 5c653be (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and codex_hash-join-empty-partition-reporting
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃ codex_hash-join-empty-partition-reporting ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │          1.21 / 4.56 ±6.51 / 17.58 ms │              1.21 / 4.48 ±6.40 / 17.28 ms │     no change │
│ QQuery 1  │        15.53 / 15.68 ±0.17 / 15.97 ms │            14.97 / 15.50 ±0.28 / 15.72 ms │     no change │
│ QQuery 2  │        44.79 / 45.06 ±0.32 / 45.59 ms │            43.95 / 44.23 ±0.20 / 44.44 ms │     no change │
│ QQuery 3  │        43.28 / 45.82 ±1.56 / 47.87 ms │            43.79 / 44.60 ±1.19 / 46.95 ms │     no change │
│ QQuery 4  │     300.16 / 305.00 ±3.93 / 311.02 ms │         291.01 / 298.90 ±6.49 / 306.78 ms │     no change │
│ QQuery 5  │     350.72 / 358.13 ±4.67 / 365.27 ms │         344.86 / 347.96 ±3.76 / 355.04 ms │     no change │
│ QQuery 6  │          6.02 / 7.78 ±2.92 / 13.60 ms │               5.30 / 6.27 ±0.60 / 6.94 ms │ +1.24x faster │
│ QQuery 7  │        16.97 / 17.52 ±0.66 / 18.53 ms │            17.14 / 17.38 ±0.18 / 17.64 ms │     no change │
│ QQuery 8  │     410.17 / 420.59 ±8.39 / 431.59 ms │         417.02 / 428.47 ±6.85 / 435.06 ms │     no change │
│ QQuery 9  │     652.25 / 661.89 ±5.75 / 668.45 ms │         644.64 / 655.84 ±7.14 / 665.75 ms │     no change │
│ QQuery 10 │        92.52 / 94.00 ±1.40 / 96.34 ms │            92.71 / 95.10 ±2.60 / 99.97 ms │     no change │
│ QQuery 11 │     105.99 / 107.24 ±1.18 / 109.13 ms │         105.92 / 107.73 ±1.33 / 109.64 ms │     no change │
│ QQuery 12 │     340.77 / 348.75 ±5.66 / 356.06 ms │         350.33 / 353.92 ±2.95 / 358.53 ms │     no change │
│ QQuery 13 │    455.88 / 472.88 ±14.96 / 498.25 ms │        476.85 / 495.23 ±14.45 / 513.87 ms │     no change │
│ QQuery 14 │     344.86 / 348.07 ±3.10 / 352.53 ms │         355.46 / 363.51 ±6.21 / 372.83 ms │     no change │
│ QQuery 15 │     348.06 / 357.64 ±7.32 / 366.36 ms │        362.13 / 385.40 ±23.59 / 430.46 ms │  1.08x slower │
│ QQuery 16 │     710.70 / 719.10 ±6.50 / 729.54 ms │        729.48 / 740.26 ±14.95 / 768.21 ms │     no change │
│ QQuery 17 │     708.45 / 712.99 ±3.91 / 719.70 ms │        730.10 / 763.36 ±33.17 / 819.83 ms │  1.07x slower │
│ QQuery 18 │ 1425.99 / 1466.24 ±23.31 / 1493.34 ms │     1466.34 / 1507.16 ±27.51 / 1542.96 ms │     no change │
│ QQuery 19 │       36.19 / 43.90 ±14.45 / 72.78 ms │            34.81 / 37.07 ±1.44 / 38.96 ms │ +1.18x faster │
│ QQuery 20 │    719.05 / 735.55 ±19.57 / 765.34 ms │        722.25 / 734.81 ±15.17 / 763.07 ms │     no change │
│ QQuery 21 │     765.26 / 767.77 ±2.08 / 770.71 ms │        771.59 / 782.17 ±13.09 / 807.23 ms │     no change │
│ QQuery 22 │  1128.42 / 1137.21 ±9.90 / 1155.46 ms │      1134.06 / 1137.51 ±2.71 / 1140.71 ms │     no change │
│ QQuery 23 │ 3078.94 / 3114.20 ±21.83 / 3147.29 ms │     3076.66 / 3104.17 ±24.78 / 3139.26 ms │     no change │
│ QQuery 24 │     100.36 / 102.87 ±2.23 / 106.11 ms │         101.64 / 103.90 ±1.81 / 106.87 ms │     no change │
│ QQuery 25 │     139.21 / 140.77 ±1.29 / 142.61 ms │         139.80 / 141.41 ±1.81 / 143.92 ms │     no change │
│ QQuery 26 │      98.30 / 102.32 ±2.42 / 105.03 ms │          98.68 / 101.50 ±1.74 / 103.28 ms │     no change │
│ QQuery 27 │     849.95 / 857.35 ±6.67 / 869.17 ms │         852.03 / 856.61 ±3.72 / 862.10 ms │     no change │
│ QQuery 28 │ 3243.78 / 3270.64 ±15.40 / 3290.54 ms │     3257.78 / 3301.90 ±28.61 / 3329.77 ms │     no change │
│ QQuery 29 │      49.88 / 77.08 ±48.53 / 173.78 ms │           50.04 / 57.73 ±10.85 / 78.66 ms │ +1.34x faster │
│ QQuery 30 │     355.29 / 359.90 ±4.76 / 368.54 ms │         360.76 / 371.94 ±7.15 / 383.19 ms │     no change │
│ QQuery 31 │    360.74 / 378.80 ±11.38 / 394.94 ms │        361.60 / 381.13 ±12.84 / 396.99 ms │     no change │
│ QQuery 32 │ 1120.91 / 1276.78 ±84.79 / 1378.17 ms │     1243.27 / 1283.99 ±51.24 / 1385.18 ms │     no change │
│ QQuery 33 │ 1467.54 / 1510.83 ±37.89 / 1578.55 ms │     1504.55 / 1559.73 ±37.66 / 1599.80 ms │     no change │
│ QQuery 34 │  1449.11 / 1465.87 ±9.25 / 1476.99 ms │     1550.56 / 1598.95 ±52.06 / 1670.22 ms │  1.09x slower │
│ QQuery 35 │     388.11 / 395.21 ±5.32 / 403.09 ms │         396.52 / 410.10 ±9.56 / 426.36 ms │     no change │
│ QQuery 36 │     113.98 / 120.84 ±3.55 / 123.83 ms │         112.49 / 119.79 ±4.86 / 125.89 ms │     no change │
│ QQuery 37 │        48.15 / 51.18 ±1.57 / 52.70 ms │            48.52 / 50.43 ±1.79 / 53.42 ms │     no change │
│ QQuery 38 │        74.24 / 76.31 ±1.79 / 78.79 ms │            77.49 / 78.49 ±0.85 / 79.80 ms │     no change │
│ QQuery 39 │     203.95 / 213.99 ±6.78 / 223.86 ms │         204.72 / 212.87 ±8.80 / 229.49 ms │     no change │
│ QQuery 40 │        24.38 / 27.00 ±2.19 / 30.63 ms │            23.04 / 25.37 ±1.51 / 27.72 ms │ +1.06x faster │
│ QQuery 41 │        20.35 / 21.47 ±1.10 / 23.16 ms │            20.66 / 21.02 ±0.28 / 21.39 ms │     no change │
│ QQuery 42 │        19.42 / 20.44 ±0.75 / 21.43 ms │            20.06 / 20.59 ±0.33 / 21.07 ms │     no change │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                        ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                                        │ 22777.20ms │
│ Total Time (codex_hash-join-empty-partition-reporting)   │ 23168.50ms │
│ Average Time (HEAD)                                      │   529.70ms │
│ Average Time (codex_hash-join-empty-partition-reporting) │   538.80ms │
│ Queries Faster                                           │          4 │
│ Queries Slower                                           │          3 │
│ Queries with No Change                                   │         36 │
│ Queries with Failure                                     │          0 │
└──────────────────────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric Value
Wall time 115.0s
Peak memory 42.1 GiB
Avg memory 29.3 GiB
CPU user 1076.3s
CPU sys 95.2s
Peak spill 0 B

clickbench_partitioned — branch

Metric Value
Wall time 116.9s
Peak memory 39.0 GiB
Avg memory 29.0 GiB
CPU user 1083.5s
CPU sys 104.1s
Peak spill 0 B

File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and codex_hash-join-empty-partition-reporting
--------------------
Benchmark tpcds_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                     HEAD ┃ codex_hash-join-empty-partition-reporting ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │              7.07 / 7.57 ±0.73 / 9.00 ms │               6.97 / 7.43 ±0.74 / 8.91 ms │     no change │
│ QQuery 2  │        146.94 / 148.20 ±1.03 / 149.53 ms │         145.91 / 146.88 ±0.82 / 147.85 ms │     no change │
│ QQuery 3  │        114.73 / 115.34 ±0.69 / 116.66 ms │         113.82 / 114.05 ±0.17 / 114.26 ms │     no change │
│ QQuery 4  │    1446.30 / 1484.55 ±24.78 / 1512.49 ms │     1428.52 / 1468.10 ±25.80 / 1498.60 ms │     no change │
│ QQuery 5  │        174.45 / 176.13 ±1.14 / 177.61 ms │         175.16 / 177.00 ±1.82 / 179.68 ms │     no change │
│ QQuery 6  │       892.55 / 911.70 ±20.43 / 948.24 ms │        854.85 / 896.75 ±23.73 / 920.00 ms │     no change │
│ QQuery 7  │        348.04 / 352.12 ±2.92 / 357.04 ms │         346.46 / 348.34 ±1.90 / 351.83 ms │     no change │
│ QQuery 8  │        118.14 / 118.71 ±0.56 / 119.74 ms │         118.42 / 119.54 ±0.99 / 121.06 ms │     no change │
│ QQuery 9  │        107.55 / 110.79 ±3.53 / 117.29 ms │         103.18 / 106.62 ±2.30 / 108.76 ms │     no change │
│ QQuery 10 │        108.45 / 109.88 ±0.84 / 110.81 ms │         107.59 / 109.38 ±1.47 / 111.94 ms │     no change │
│ QQuery 11 │     1057.30 / 1064.65 ±7.61 / 1078.23 ms │      1040.15 / 1049.87 ±7.29 / 1061.85 ms │     no change │
│ QQuery 12 │           47.28 / 48.55 ±0.95 / 49.41 ms │            45.67 / 47.06 ±0.83 / 47.91 ms │     no change │
│ QQuery 13 │        415.70 / 423.17 ±4.91 / 428.64 ms │         403.64 / 410.71 ±3.64 / 413.85 ms │     no change │
│ QQuery 14 │     1026.57 / 1032.91 ±3.77 / 1037.06 ms │      1007.54 / 1017.70 ±8.62 / 1030.60 ms │     no change │
│ QQuery 15 │           16.18 / 17.07 ±0.88 / 18.54 ms │            16.34 / 17.66 ±0.84 / 18.57 ms │     no change │
│ QQuery 16 │              8.07 / 8.70 ±0.57 / 9.45 ms │               7.64 / 7.98 ±0.40 / 8.76 ms │ +1.09x faster │
│ QQuery 17 │        238.35 / 241.53 ±2.69 / 246.07 ms │         235.18 / 237.56 ±1.73 / 239.59 ms │     no change │
│ QQuery 18 │        132.90 / 133.81 ±0.60 / 134.34 ms │         131.77 / 133.55 ±1.30 / 135.47 ms │     no change │
│ QQuery 19 │        161.00 / 162.14 ±1.13 / 164.18 ms │         159.67 / 161.06 ±0.87 / 162.24 ms │     no change │
│ QQuery 20 │           14.57 / 14.83 ±0.29 / 15.40 ms │            14.49 / 14.97 ±0.32 / 15.26 ms │     no change │
│ QQuery 21 │           20.33 / 20.84 ±0.39 / 21.33 ms │            20.07 / 20.57 ±0.35 / 21.08 ms │     no change │
│ QQuery 22 │        520.91 / 525.56 ±3.25 / 528.33 ms │         517.43 / 520.58 ±2.20 / 523.72 ms │     no change │
│ QQuery 23 │        930.90 / 938.56 ±7.29 / 952.45 ms │         918.42 / 924.72 ±3.65 / 929.43 ms │     no change │
│ QQuery 24 │        396.36 / 400.85 ±3.58 / 405.99 ms │         391.89 / 394.35 ±2.22 / 396.93 ms │     no change │
│ QQuery 25 │        353.39 / 355.48 ±1.27 / 357.00 ms │         350.65 / 354.41 ±2.11 / 356.18 ms │     no change │
│ QQuery 26 │           83.34 / 84.90 ±1.92 / 88.45 ms │            82.26 / 85.01 ±1.92 / 87.07 ms │     no change │
│ QQuery 27 │              7.19 / 7.54 ±0.25 / 7.90 ms │               7.19 / 7.42 ±0.19 / 7.67 ms │     no change │
│ QQuery 28 │        154.64 / 155.46 ±0.92 / 157.09 ms │         150.66 / 152.60 ±1.60 / 154.93 ms │     no change │
│ QQuery 29 │        287.68 / 292.87 ±3.22 / 297.77 ms │         290.15 / 291.81 ±1.40 / 293.83 ms │     no change │
│ QQuery 30 │           45.11 / 46.01 ±0.86 / 47.26 ms │            44.10 / 46.57 ±1.43 / 48.02 ms │     no change │
│ QQuery 31 │        174.13 / 175.86 ±1.21 / 177.77 ms │         175.46 / 176.85 ±1.40 / 179.33 ms │     no change │
│ QQuery 32 │         57.69 / 67.84 ±16.42 / 100.47 ms │            58.21 / 59.18 ±1.35 / 61.82 ms │ +1.15x faster │
│ QQuery 33 │        144.84 / 146.64 ±1.65 / 149.66 ms │         145.12 / 146.40 ±1.19 / 148.67 ms │     no change │
│ QQuery 34 │              7.50 / 8.15 ±0.87 / 9.89 ms │               7.51 / 7.63 ±0.12 / 7.84 ms │ +1.07x faster │
│ QQuery 35 │        110.32 / 112.92 ±1.57 / 114.82 ms │         108.24 / 110.53 ±1.83 / 113.04 ms │     no change │
│ QQuery 36 │              6.89 / 7.19 ±0.21 / 7.52 ms │               6.93 / 7.16 ±0.23 / 7.56 ms │     no change │
│ QQuery 37 │             8.79 / 9.60 ±0.57 / 10.49 ms │              9.07 / 9.39 ±0.45 / 10.28 ms │     no change │
│ QQuery 38 │           87.81 / 92.44 ±4.06 / 99.17 ms │           86.98 / 91.19 ±5.36 / 101.50 ms │     no change │
│ QQuery 39 │        133.09 / 136.35 ±2.16 / 139.87 ms │         132.22 / 133.59 ±1.00 / 134.63 ms │     no change │
│ QQuery 40 │        116.24 / 121.72 ±6.69 / 134.57 ms │         110.79 / 119.84 ±7.98 / 134.43 ms │     no change │
│ QQuery 41 │           15.16 / 15.79 ±0.63 / 16.94 ms │            15.14 / 15.52 ±0.33 / 16.02 ms │     no change │
│ QQuery 42 │        110.16 / 112.18 ±1.35 / 114.22 ms │         107.95 / 109.09 ±1.19 / 110.84 ms │     no change │
│ QQuery 43 │              6.54 / 6.62 ±0.07 / 6.71 ms │               6.35 / 6.54 ±0.12 / 6.66 ms │     no change │
│ QQuery 44 │           12.74 / 13.00 ±0.24 / 13.32 ms │            12.17 / 12.63 ±0.32 / 13.03 ms │     no change │
│ QQuery 45 │           51.58 / 52.66 ±0.88 / 54.08 ms │            52.11 / 52.62 ±0.53 / 53.56 ms │     no change │
│ QQuery 46 │             8.86 / 9.58 ±0.49 / 10.12 ms │              8.88 / 9.42 ±0.54 / 10.43 ms │     no change │
│ QQuery 47 │       798.09 / 816.66 ±13.47 / 833.81 ms │         797.59 / 810.08 ±7.03 / 818.75 ms │     no change │
│ QQuery 48 │        296.70 / 301.03 ±3.27 / 305.09 ms │         291.52 / 297.98 ±3.93 / 303.10 ms │     no change │
│ QQuery 49 │        257.67 / 260.29 ±1.81 / 262.97 ms │         253.95 / 255.69 ±1.00 / 257.00 ms │     no change │
│ QQuery 50 │        232.58 / 239.49 ±4.59 / 245.23 ms │         228.78 / 237.23 ±5.54 / 244.33 ms │     no change │
│ QQuery 51 │        183.34 / 187.08 ±3.28 / 191.33 ms │         179.95 / 185.03 ±3.22 / 189.68 ms │     no change │
│ QQuery 52 │        108.86 / 111.77 ±1.91 / 114.28 ms │         109.18 / 110.68 ±1.12 / 112.57 ms │     no change │
│ QQuery 53 │        104.43 / 106.32 ±1.43 / 108.00 ms │         104.05 / 105.09 ±1.26 / 107.57 ms │     no change │
│ QQuery 54 │        151.14 / 152.99 ±1.02 / 154.17 ms │         150.89 / 153.62 ±1.66 / 155.58 ms │     no change │
│ QQuery 55 │        109.27 / 110.83 ±1.82 / 113.83 ms │         107.82 / 109.52 ±1.45 / 111.35 ms │     no change │
│ QQuery 56 │        143.61 / 145.72 ±1.33 / 147.31 ms │         143.83 / 145.03 ±0.97 / 146.64 ms │     no change │
│ QQuery 57 │        176.50 / 179.98 ±1.84 / 181.69 ms │         174.38 / 176.65 ±2.56 / 181.57 ms │     no change │
│ QQuery 58 │        306.80 / 308.48 ±1.24 / 310.44 ms │         290.48 / 301.66 ±7.15 / 311.46 ms │     no change │
│ QQuery 59 │        204.21 / 207.22 ±2.33 / 210.15 ms │         201.50 / 203.43 ±1.35 / 204.87 ms │     no change │
│ QQuery 60 │        147.42 / 147.92 ±0.60 / 148.91 ms │         145.40 / 147.62 ±1.49 / 149.88 ms │     no change │
│ QQuery 61 │           13.68 / 13.93 ±0.23 / 14.28 ms │            13.72 / 14.14 ±0.69 / 15.52 ms │     no change │
│ QQuery 62 │      945.03 / 980.43 ±21.86 / 1009.49 ms │        933.92 / 963.43 ±22.58 / 991.33 ms │     no change │
│ QQuery 63 │        106.57 / 107.60 ±0.85 / 108.93 ms │         105.58 / 106.79 ±1.17 / 108.44 ms │     no change │
│ QQuery 64 │        711.49 / 714.22 ±2.03 / 717.09 ms │         707.18 / 715.13 ±5.30 / 721.61 ms │     no change │
│ QQuery 65 │        271.03 / 272.39 ±1.25 / 273.90 ms │         271.63 / 274.99 ±2.14 / 277.99 ms │     no change │
│ QQuery 66 │       247.17 / 267.28 ±13.18 / 286.63 ms │         245.66 / 256.27 ±9.68 / 274.33 ms │     no change │
│ QQuery 67 │        327.31 / 335.79 ±4.71 / 341.09 ms │         324.25 / 332.57 ±4.99 / 339.00 ms │     no change │
│ QQuery 68 │            9.14 / 11.18 ±1.37 / 13.41 ms │            10.26 / 12.01 ±1.33 / 14.33 ms │  1.07x slower │
│ QQuery 69 │        103.98 / 106.72 ±1.85 / 108.50 ms │         103.63 / 105.31 ±1.31 / 107.50 ms │     no change │
│ QQuery 70 │       340.51 / 355.79 ±11.32 / 372.52 ms │         343.42 / 349.94 ±9.36 / 368.37 ms │     no change │
│ QQuery 71 │        138.81 / 141.08 ±1.83 / 143.34 ms │         136.84 / 138.60 ±0.94 / 139.54 ms │     no change │
│ QQuery 72 │        631.14 / 644.46 ±9.36 / 655.45 ms │         641.20 / 650.22 ±8.72 / 665.55 ms │     no change │
│ QQuery 73 │             7.12 / 8.73 ±1.00 / 10.15 ms │               7.40 / 8.40 ±0.86 / 9.75 ms │     no change │
│ QQuery 74 │        651.44 / 662.25 ±5.69 / 667.28 ms │         654.58 / 665.06 ±6.73 / 673.65 ms │     no change │
│ QQuery 75 │        281.77 / 284.64 ±3.09 / 290.39 ms │         282.98 / 285.19 ±1.84 / 287.88 ms │     no change │
│ QQuery 76 │        135.75 / 136.99 ±0.93 / 138.12 ms │         133.63 / 135.81 ±1.62 / 137.82 ms │     no change │
│ QQuery 77 │        191.29 / 193.09 ±1.76 / 196.24 ms │         191.16 / 194.25 ±2.46 / 197.55 ms │     no change │
│ QQuery 78 │        353.84 / 358.39 ±3.14 / 361.81 ms │         346.77 / 352.78 ±5.18 / 360.82 ms │     no change │
│ QQuery 79 │        252.73 / 255.79 ±2.70 / 260.62 ms │         249.60 / 251.65 ±1.74 / 253.82 ms │     no change │
│ QQuery 80 │        326.74 / 328.76 ±1.61 / 330.74 ms │         322.63 / 325.84 ±2.00 / 328.90 ms │     no change │
│ QQuery 81 │           27.52 / 28.43 ±0.67 / 29.48 ms │            27.32 / 28.10 ±0.62 / 29.21 ms │     no change │
│ QQuery 82 │        203.59 / 205.63 ±1.73 / 207.94 ms │         198.72 / 202.06 ±2.08 / 205.19 ms │     no change │
│ QQuery 83 │           40.88 / 42.30 ±1.09 / 43.62 ms │            39.87 / 40.72 ±0.99 / 42.03 ms │     no change │
│ QQuery 84 │           49.85 / 50.78 ±0.61 / 51.76 ms │            50.25 / 50.71 ±0.69 / 52.08 ms │     no change │
│ QQuery 85 │        152.30 / 155.19 ±2.06 / 158.47 ms │         151.23 / 152.68 ±1.24 / 154.38 ms │     no change │
│ QQuery 86 │           40.69 / 41.78 ±0.96 / 43.34 ms │            40.10 / 41.10 ±0.54 / 41.60 ms │     no change │
│ QQuery 87 │           93.16 / 95.13 ±1.59 / 97.63 ms │            88.19 / 91.34 ±3.55 / 98.16 ms │     no change │
│ QQuery 88 │        105.15 / 105.87 ±0.72 / 107.20 ms │         102.25 / 103.80 ±1.10 / 105.48 ms │     no change │
│ QQuery 89 │        124.67 / 125.76 ±1.25 / 127.94 ms │         119.69 / 121.22 ±1.31 / 122.85 ms │     no change │
│ QQuery 90 │           24.77 / 25.25 ±0.44 / 25.92 ms │            24.26 / 25.50 ±1.19 / 27.73 ms │     no change │
│ QQuery 91 │           63.61 / 67.15 ±2.13 / 69.35 ms │            64.09 / 66.89 ±1.43 / 68.12 ms │     no change │
│ QQuery 92 │           59.95 / 60.68 ±0.78 / 62.10 ms │            58.73 / 59.54 ±0.60 / 60.59 ms │     no change │
│ QQuery 93 │        195.21 / 195.88 ±0.75 / 196.89 ms │         191.87 / 193.52 ±1.08 / 195.21 ms │     no change │
│ QQuery 94 │           63.75 / 64.70 ±0.50 / 65.20 ms │            62.45 / 63.60 ±0.74 / 64.58 ms │     no change │
│ QQuery 95 │        132.82 / 134.65 ±1.66 / 137.70 ms │         129.96 / 130.39 ±0.24 / 130.61 ms │     no change │
│ QQuery 96 │           77.82 / 78.34 ±0.42 / 79.02 ms │            75.04 / 75.59 ±0.36 / 76.08 ms │     no change │
│ QQuery 97 │        135.03 / 136.38 ±1.12 / 138.42 ms │         129.53 / 130.64 ±0.88 / 131.90 ms │     no change │
│ QQuery 98 │        162.89 / 164.68 ±1.67 / 166.70 ms │         155.77 / 158.79 ±2.32 / 162.24 ms │     no change │
│ QQuery 99 │ 10908.05 / 10955.34 ±34.31 / 11010.59 ms │  10893.43 / 10947.81 ±46.52 / 11008.84 ms │     no change │
└───────────┴──────────────────────────────────────────┴───────────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                        ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                                        │ 32788.21ms │
│ Total Time (codex_hash-join-empty-partition-reporting)   │ 32519.50ms │
│ Average Time (HEAD)                                      │   331.19ms │
│ Average Time (codex_hash-join-empty-partition-reporting) │   328.48ms │
│ Queries Faster                                           │          3 │
│ Queries Slower                                           │          1 │
│ Queries with No Change                                   │         95 │
│ Queries with Failure                                     │          0 │
└──────────────────────────────────────────────────────────┴────────────┘

Resource Usage

tpcds — base (merge-base)

Metric Value
Wall time 164.3s
Peak memory 5.5 GiB
Avg memory 4.5 GiB
CPU user 271.7s
CPU sys 18.0s
Peak spill 0 B

tpcds — branch

Metric Value
Wall time 162.9s
Peak memory 6.0 GiB
Avg memory 4.8 GiB
CPU user 269.6s
CPU sys 16.9s
Peak spill 0 B

File an issue against this benchmark runner

@adriangb
Copy link
Copy Markdown
Contributor Author

Okay I think this addresses the root cause with no performance regression or behavior changes.

@adriangb
Copy link
Copy Markdown
Contributor Author

@RatulDawar could you let me know what you think of this solution?

@Omega359
Copy link
Copy Markdown
Contributor

I'm not actually a committer @adriangb ... legal at work never got back to approving me :( In any case I'm not knowledgeable enough about this area of the code to be much help I'm afraid.

@RatulDawar
Copy link
Copy Markdown
Contributor

@RatulDawar does 5ad96d9 help?

Looks great now !

@adriangb
Copy link
Copy Markdown
Contributor Author

Great all we need is review from a committer now 😄

Copy link
Copy Markdown
Contributor

@gabotechs gabotechs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! left some comments mainly with questions for my understanding, as I'm not very familiar with this code.

Comment on lines +1319 to +1332
// Initialize build_accumulator lazily with runtime partition counts (only if enabled)
// Use RepartitionExec's random state (seeds: 0,0,0,0) for partition routing
let repartition_random_state = REPARTITION_RANDOM_STATE;
let on_right_exprs = self
.on
.iter()
.map(|(_, right_expr)| Arc::clone(right_expr))
.collect::<Vec<_>>();
let build_accumulator = enable_dynamic_filter_pushdown
.then(|| {
self.dynamic_filter.as_ref().map(|df| {
let filter = Arc::clone(&df.filter);
Some(Arc::clone(df.build_accumulator.get_or_init(|| {
Arc::new(SharedBuildAccumulator::new_from_partition_mode(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I'm trying to deduce what does this change do, but I'm seeing that it's not related with the fix and it's just a refactor? is that right or did I missed something?

self.build_waiter = Some(OnceFut::new(async move {
acc.report_build_data(build_data).await
}));
self.build_reported = true;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also trying to understand what exactly changed here, and it looks like the only logical change is that now there's a self.build_reported = true flag set, but everything else is pretty much a refactor. Does that sound right?

Comment on lines +520 to +535
loop {
let notified = {
let guard = self.inner.lock();
match &guard.completion {
CompletionState::Ready(Ok(())) => return Ok(()),
CompletionState::Ready(Err(err)) => {
return Err(DataFusionError::Shared(Arc::clone(err)));
}
CompletionState::Pending | CompletionState::Finalizing => {
self.completion_notify.notified()
}
}
// Partitioned: CASE expression routing to per-partition filters
AccumulatedBuildData::Partitioned { partitions } => {
// Collect all partition data (should all be Some at this point)
let partition_data: Vec<_> =
partitions.iter().filter_map(|p| p.as_ref()).collect();

if !partition_data.is_empty() {
// Build a CASE expression that combines range checks AND membership checks
// CASE (hash_repartition(join_keys) % num_partitions)
// WHEN 0 THEN (col >= min_0 AND col <= max_0 AND ...) AND membership_check_0
// WHEN 1 THEN (col >= min_1 AND col <= max_1 AND ...) AND membership_check_1
// ...
// ELSE false
// END

let num_partitions = partition_data.len();

// Create base expression: hash_repartition(join_keys) % num_partitions
let routing_hash_expr = Arc::new(HashExpr::new(
self.on_right.clone(),
self.repartition_random_state.clone(),
"hash_repartition".to_string(),
))
as Arc<dyn PhysicalExpr>;

let modulo_expr = Arc::new(BinaryExpr::new(
routing_hash_expr,
Operator::Modulo,
lit(ScalarValue::UInt64(Some(num_partitions as u64))),
))
as Arc<dyn PhysicalExpr>;

// Create WHEN branches for each partition
let when_then_branches: Vec<(
Arc<dyn PhysicalExpr>,
Arc<dyn PhysicalExpr>,
)> = partitions
.iter()
.enumerate()
.filter_map(|(partition_id, partition_opt)| {
partition_opt.as_ref().and_then(|partition| {
// Skip empty partitions - they would always return false anyway
match &partition.pushdown {
PushdownStrategy::Empty => None,
_ => Some((partition_id, partition)),
}
})
})
.map(|(partition_id, partition)| -> Result<_> {
// WHEN partition_id
let when_expr =
lit(ScalarValue::UInt64(Some(partition_id as u64)));

// THEN: Combine bounds check AND membership predicate

// 1. Create membership predicate (InList for small build sides, hash lookup otherwise)
let membership_expr = create_membership_predicate(
&self.on_right,
partition.pushdown.clone(),
&HASH_JOIN_SEED,
self.probe_schema.as_ref(),
)?;

// 2. Create bounds check expression for this partition (if bounds available)
let bounds_expr = create_bounds_predicate(
&self.on_right,
&partition.bounds,
);

// 3. Combine membership and bounds expressions
let then_expr = match (membership_expr, bounds_expr) {
(Some(membership), Some(bounds)) => {
// Both available: combine with AND
Arc::new(BinaryExpr::new(
bounds,
Operator::And,
membership,
))
as Arc<dyn PhysicalExpr>
}
(Some(membership), None) => {
// Membership available but no bounds (e.g., unsupported data types)
membership
}
(None, Some(bounds)) => {
// Bounds available but no membership.
// This should be unreachable in practice: we can always push down a reference
// to the hash table.
// But it seems safer to handle it defensively.
bounds
}
(None, None) => {
// No filter for this partition - should not happen due to filter_map above
// but handle defensively by returning a "true" literal
lit(true)
}
};

Ok((when_expr, then_expr))
})
.collect::<Result<Vec<_>>>()?;

// Optimize for single partition: skip CASE expression entirely
let filter_expr = if when_then_branches.is_empty() {
// All partitions are empty: no rows can match
lit(false)
} else if when_then_branches.len() == 1 {
// Single partition: just use the condition directly
// since hash % 1 == 0 always, the WHEN 0 branch will always match
Arc::clone(&when_then_branches[0].1)
} else {
// Multiple partitions: create CASE expression
Arc::new(CaseExpr::try_new(
Some(modulo_expr),
when_then_branches,
Some(lit(false)), // ELSE false
)?) as Arc<dyn PhysicalExpr>
};
};
notified.await;
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand much the difference between the previous barrier and the current loop + Notify approach. At first sight, it looks like a convoluted way of re-implementing what the barrier, and the differential factor that seems key to solve the deadlock is calling report_canceled_partition on HashJoinStream drop.

Is it not possible to just gracefully handling the cancellation by doing something with the Barrier on drop?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is a refactor from #21666 (comment)

Comment on lines +513 to +517
let mut guard = self.inner.lock();
guard.completion = CompletionState::Ready(result);
drop(guard);
self.completion_notify.notify_waiters();
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A Mutex with shared state + a Notify for broadcasting update signals sounds essentially a reimplementation of tokio::sync::watch.

I don't have the full context so the current solution might actually be the best one, just want to double-check that other synchronization primitives where considered before choosing this one.

}

#[tokio::test]
async fn test_partitioned_dynamic_filter_reports_empty_canceled_partitions()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double checked with main and indeed this test reproduces the error 👍

adriangb and others added 12 commits April 19, 2026 10:33
Replaces the manual PartitionBuildData construction + report_build_data
call + build_reported flag set in collect_build_side with a single
transition_after_build_collected method, making it impossible to forget
to report build data when transitioning state.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Per review feedback, the hoisting of on_right_exprs above the
SharedBuildAccumulator init was an unrelated cleanup (same Arc-clone
and allocation cost) that distracted from the cancellation fix. Restore
the two separate constructions to match main.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@adriangb adriangb force-pushed the codex/hash-join-empty-partition-reporting branch from 5ad96d9 to 9023c19 Compare April 19, 2026 15:46
adriangb and others added 3 commits April 19, 2026 18:26
Per review feedback, replace the hand-rolled Mutex<CompletionState> +
Notify pair with a watch::Sender<Option<SharedResult<()>>> for the
terminal completion broadcast. The mutex still guards the incremental
per-partition accumulation (which needs partial writes) and the
finalizer-election bool, but wait_for_completion collapses to a single
rx.wait_for(|v| v.is_some()).await instead of a manual re-check loop.

Net -8 lines and drops the three-state CompletionState enum.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Clarify the role of the Notify primitive so the coordination pattern
is obvious without reading the wait/finish methods: it wakes parked
partitions once the elected finalizer has published CompletionState::Ready.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@kosiew kosiew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adriangb

Thanks for the ping.
I dug into the lifecycle around reporting and cancellation, and there is a subtle timing issue that could reintroduce a hang. I left details inline. I also added a couple of suggestions that could help keep things easier to reason about going forward.

self.build_waiter = Some(OnceFut::new(async move {
acc.report_build_data(build_data).await
}));
self.build_reported = true;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is still a race here. build_reported gets flipped as soon as the OnceFut is created, but the future is lazy and only runs once wait_for_partition_bounds_report() actually polls it.

If a parent drops this stream after transition_after_build_collected() returns, but before the waiter is ever polled, Drop will skip report_canceled_partition() even though nothing was delivered to the coordinator. That feels like it recreates the original hang, just in a narrower timing window.

Would it make sense to only mark this as reported after the waiter completes successfully? Alternatively, maybe replace the bool with a small lifecycle state so Drop can still cancel something that was only scheduled but never observed.

let bounds_expr =
create_bounds_predicate(&self.on_right, &partition_data.bounds);

if let Some(filter_expr) = match (membership_expr, bounds_expr) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed the “membership + bounds gives final predicate” logic now exists in both the CollectLeft and Partitioned finalize paths, with slightly different control flow.

It might be worth pulling that into a small helper so both branches stay aligned as the dynamic filter logic evolves. That would make future changes a bit safer.

/// Optional future to signal when build information has been reported by all partitions
/// and the dynamic filter has been updated
build_waiter: Option<OnceFut<()>>,
/// Tracks whether this partition has already reported build information to the coordinator.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once the blocking issue above is sorted out, this field might benefit from a stronger type than just bool.

Something like NotReported, ReportScheduled, and ReportDelivered would make the drop semantics much clearer and help avoid another mix-up between "scheduled" and "actually seen by the coordinator".

adriangb and others added 2 commits April 21, 2026 15:48
…lper

Per PR apache#21666 review feedback:

* Replace `build_reported: bool` with a `BuildReportState` enum
  (NotReported / ReportScheduled / ReportDelivered). `ReportDelivered`
  is only set after the build waiter's `get_shared` returns `Ok`, so
  `Drop` still cancels a partition whose report was merely scheduled
  but never observed by the coordinator. `store_canceled_partition`
  guards on `PartitionStatus::Pending`, so a late cancel arriving
  after the coordinator already saw the report is a harmless no-op.

* Extract `combine_membership_and_bounds` so the CollectLeft and
  Partitioned arms of `SharedBuildAccumulator::build_filter` share a
  single (membership, bounds) -> predicate combinator. Partitioned
  keeps its `lit(true)` fallback via `unwrap_or_else`, preserving the
  existing CASE-branch shape optimizations.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The `BuildReportState::ReportScheduled` → `Drop` → `report_canceled_partition`
path is correct only because `store_canceled_partition` no-ops when the
partition is already `Reported`. Previously this invariant was implicit;
add unit tests so a future refactor that, say, unconditionally overwrites
the partition status can't silently reintroduce the hang.

Also pin that `report_canceled_partition` is idempotent — a stray
double-drop must not double-count `completed_partitions`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@kosiew kosiew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me
except a comment clean-up for Barrier.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be updated since this PR replaced the Barrier-based coordination.

The synchronization-strategy comment still described a barrier even though
this PR replaced it with a per-partition report counter, a CompletionState
lifecycle finalized exactly once, and a Notify that wakes parked waiters.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@adriangb adriangb added this pull request to the merge queue Apr 22, 2026
@adriangb
Copy link
Copy Markdown
Contributor Author

Thanks for the review @kosiew !

@gabotechs I did try to use an existing primitive, but I couldn't quite get anything to fit the shape we need. If you or anyone else can improve it a followup is welcome, but I didn't want to hold up unblocking benchmarks.

Merged via the queue into apache:main with commit 8875956 Apr 22, 2026
35 checks passed
@adriangb adriangb deleted the codex/hash-join-empty-partition-reporting branch April 22, 2026 13:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: TPCH 18 query hangs

7 participants