Eliminate outer joins with empty relations via null-padded projection#21321
Conversation
neilconway
left a comment
There was a problem hiding this comment.
Looks good to me! Nice work.
@alamb This looks reasonable to me.
|
This is nice optimization @SubhamSinghal |
alamb
left a comment
There was a problem hiding this comment.
Thanks @SubhamSinghal @neilconway and @coderfender -- looks good to me
| Expr::Cast(Cast::new( | ||
| Box::new(Expr::Literal(ScalarValue::Null, None)), | ||
| field.data_type().clone(), | ||
| )) |
There was a problem hiding this comment.
You can write this more concisely using the fluent API, something like
lit(ScalarValue::Null).cast( field.data_type().clone())There was a problem hiding this comment.
replaced with cast(lit(ScalarValue::Null), field.data_type().clone())
| 02)--Left Join: | ||
| 03)----TableScan: t1 projection=[t1_id] | ||
| 04)----EmptyRelation: rows=0 | ||
| 01)Projection: t1.t1_id, Int32(NULL) AS t2_id |
xudong963
left a comment
There was a problem hiding this comment.
It would be better to have at least one test with a more complex ON condition (e.g., ON t1.a = r.x AND t1.b > r.y) to demonstrate the filter is correctly ignored. Not blocking, but worth adding.
@xudong963 added UT |
|
Thank you @SubhamSinghal @xudong963 @neilconway and @coderfender -- what a dream team! |
…apache#21321) ## Which issue does this PR close? - Closes apache#21320 ### Rationale for this change When one side of a LEFT/RIGHT/FULL outer join is an EmptyRelation, the current PropagateEmptyRelation optimizer rule leaves the join untouched. This means the engine still builds a hash table for the empty side, probes every row from the non-empty side, finds zero matches, and pads NULLs — all wasted work. The TODO at lines 76-80 of propagate_empty_relation.rs explicitly called out this gap: ``` // TODO: For LeftOut/Full Join, if the right side is empty, the Join can be eliminated // with a Projection with left side columns + right side columns replaced with null values. // For RightOut/Full Join, if the left side is empty, the Join can be eliminated // with a Projection with right side columns + left side columns replaced with null values. ``` ### What changes are included in this PR? Extends the PropagateEmptyRelation rule to handle 4 previously unoptimized cases by replacing the join with a Projection that null-pads the empty side's columns: ### Are these changes tested? Yes. 4 new unit tests added: ### Are there any user-facing changes? No API changes. --------- Co-authored-by: Subham Singhal <subhamsinghal@Subhams-MacBook-Air.local> Co-authored-by: Dmitrii Blaginin <dmitrii@blaginin.me>
Which issue does this PR close?
Rationale for this change
When one side of a LEFT/RIGHT/FULL outer join is an EmptyRelation, the current PropagateEmptyRelation optimizer rule leaves the join untouched. This means the engine still builds a hash table for the empty side, probes every row from the non-empty side, finds zero matches, and pads NULLs — all wasted work.
The TODO at lines 76-80 of propagate_empty_relation.rs explicitly called out this gap:
What changes are included in this PR?
Extends the PropagateEmptyRelation rule to handle 4 previously unoptimized cases by replacing the join with a Projection that null-pads the empty side's columns:
Are these changes tested?
Yes. 4 new unit tests added:
Are there any user-facing changes?
No API changes.