You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
## Which issue does this PR close?
- Closes#21199.
## Rationale for this change
This PR adds support for LEFT join semantics for lateral joins. This is
a bit tricky because of how it interacts with compensation for the
"count bug". This might be easiest to illustrate with an example;
consider this query (Q1):
```sql
SELECT t1.id, sub.cnt
FROM t1 LEFT JOIN LATERAL (
SELECT count(*) AS cnt FROM t2 WHERE t2.t1_id = t1.id
) sub ON sub.cnt > 0
ORDER BY t1.id;
```
The initial decorrelation (Q2) is
```sql
SELECT t1.id, sub.cnt
FROM t1 LEFT JOIN (
SELECT count(*) AS cnt, t2.t1_id, TRUE AS __always_true
FROM t2 GROUP BY t2.t1_id
) sub ON t1.id = sub.t1_id
```
Ignoring the user's original `ON` clause for now. This initial query is wrong, because `t1` rows that don't have a match in `t2` will get all-`NULL` values, not `0` for `count(*)` of an empty set. This is the "count bug", and we compensate for that by checking for rows when `__always_true` is `NULL`, and replacing the agg value with the default for that agg (Q3):
```sql
SELECT t1.id,
CASE WHEN sub.__always_true IS NULL THEN 0
ELSE sub.cnt END AS cnt
FROM ( /* Q2 */ )
```
Now we just need to handle the user's original `ON` clause. We can't add this to the rewritten `ON` clause in Q1, because we *don't* want the count-bug compensation to fire. But we also can't just add it to the `WHERE` clause, because we need left join semantics. So we can instead wrap another `CASE` that re-checks the `ON` condition and substitutes `NULL` for every right-side column:
```sql
SELECT t1.id,
CASE WHEN (cnt > 0) IS NOT TRUE THEN NULL
ELSE cnt END AS cnt
FROM ( /* Q3 */ )
```
## What changes are included in this PR?
* Implement lateral left join rewrite as described above
* Update expected tests and add more test cases
* Update documentation
## Are these changes tested?
Yes. All new test queries were also run against DuckDB to confirm that
both systems produce the same results.
## Are there any user-facing changes?
Support for a new feature.
physical_plan_error This feature is not implemented: Physical plan does not support logical expression OuterReferenceColumn(Field { name: "j1_id", data_type: Int32, nullable: true }, Column { relation: Some(Bare { table: "j1" }), name: "j1_id" })
0 commit comments