Skip to content

fix: resolve HashJoin deadlock with dynamic filtering and empty partitions#21631

Closed
RatulDawar wants to merge 1 commit intoapache:mainfrom
RatulDawar:fix-hash-join-deadlock-dynamic-filtering
Closed

fix: resolve HashJoin deadlock with dynamic filtering and empty partitions#21631
RatulDawar wants to merge 1 commit intoapache:mainfrom
RatulDawar:fix-hash-join-deadlock-dynamic-filtering

Conversation

@RatulDawar
Copy link
Copy Markdown
Contributor

Summary

This PR fixes a deadlock in HashJoinExec that occurs when dynamic filtering is enabled and some partitions have an empty build side.

The Issue

When dynamic filtering is enabled, all partitions must report their build-side data to a SharedBuildAccumulator and wait on a tokio::sync::Barrier. However, a short-circuit optimization was causing partitions with empty build sides to immediately transition to the Completed state, skipping the reporting and the barrier. This left non-empty partitions waiting indefinitely for the missing partitions to reach the barrier.

The Fix

The fix ensures that if a SharedBuildAccumulator is present, even empty partitions must proceed to the WaitPartitionBoundsReport state. This ensures they participate in the barrier synchronization before the short-circuit to Completed is allowed to happen.

Test Plan

Reproduced the issue using TPC-H Query 18 with 24 partitions (DATAFUSION_EXECUTION_TARGET_PARTITIONS=24).

  • Before fix: The query hangs indefinitely with 0% CPU usage.
  • After fix: The query completes successfully in ~0.1s.
-- Reproduction query (TPC-H Q18)
select
    c_name,
    c_custkey,
    o_orderkey,
    o_orderdate,
    o_totalprice,
    sum(l_quantity)
from
    customer,
    orders,
    lineitem
where
        o_orderkey in (
        select
            l_orderkey
        from
            lineitem
        group by
            l_orderkey having
                sum(l_quantity) > 300
    )
  and c_custkey = o_custkey
  and o_orderkey = l_orderkey
group by
    c_name, c_custkey,
    o_orderkey,
    o_orderdate,
    o_totalprice
order by
    o_totalprice desc,
    o_orderdate
limit 100;

Made with Cursor

@github-actions github-actions Bot added the physical-plan Changes to the physical-plan crate label Apr 14, 2026
@RatulDawar RatulDawar force-pushed the fix-hash-join-deadlock-dynamic-filtering branch 2 times, most recently from 36e5098 to d075e5f Compare April 14, 2026 21:30
…tions

Defer short-circuiting to Completed state for empty partitions when
dynamic filtering is enabled. This ensures all partitions participate
in the SharedBuildAccumulator barrier synchronization before finishing.

Made-with: Cursor
@RatulDawar RatulDawar force-pushed the fix-hash-join-deadlock-dynamic-filtering branch from d075e5f to 7bf5565 Compare April 14, 2026 21:33
@RatulDawar RatulDawar closed this Apr 14, 2026
@RatulDawar RatulDawar deleted the fix-hash-join-deadlock-dynamic-filtering branch April 14, 2026 21:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant