Commit 1bb588e
perf: Implement physical execution of uncorrelated scalar subqueries (#21240)
## Which issue does this PR close?
- Closes #3781.
- Closes #18181.
## Rationale for this change
Previously, DataFusion evaluated uncorrelated scalar subqueries by
transforming them into joins. This has three shortcomings:
1. Scalar subqueries that return > 1 row were allowed, producing
incorrect query results. Such queries should instead result in a runtime
error.
2. Performance. Evaluating scalar subqueries as a join requires going
through the join machinery. More importantly, it means that UDFs that
have specialized handling of scalar inputs cannot use those code paths
for scalar subqueries, which often results in significantly slower query
execution (e.g., #18181). It also makes filter pushdown for scalar
subquery filters more difficult (#21324)
3. Uncorrelated scalar subqueries previously did not work in `ORDER BY`
or `JOIN ON`, or as arguments to an aggregate function. Those cases are
now supported.
This PR introduces physical execution of uncorrelated scalar subqueries:
* Uncorrelated subqueries are left in the plan by the optimizer, not
rewritten into joins
* The physical planner collects uncorrelated scalar subqueries and plans
them recursively (supporting nested subqueries). We add a
`ScalarSubqueryExec` plan node to the top of any physical plan with
uncorrelated subqueries: it has N+1 children, N subqueries and its
"main" input, which is the rest of the query plan. The subquery
expression in the parent plan is replaced with a `ScalarSubqueryExpr`.
* `ScalarSubqueryExec` manages the execution of the subqueries. Subquery
evaluation is done in parallel (for a given query level), but at present
it happens strictly before evaluation of the parent query. This might be
improved in the future (#21591).
* `ScalarSubqueryExpr` reads its value from a shared slot that
`ScalarSubqueryExec` populates when the subquery finishes; the physical
planner assigns each subquery its slot index via `ExecutionProps`.
This architecture makes it easy to avoid the shortcomings described
above. Performance seems roughly unchanged (benchmarks added in this
PR), but in situations like #18181, we can now leverage scalar
fast-paths; in the case of #18181 specifically, this improves
performance from ~800 ms to ~30 ms.
## What changes are included in this PR?
* Modify subquery rewriter to not transform subqueries -> joins
* Collect and plan uncorrelated scalar subqueries in the physical
planner, and wire up `ScalarSubqueryExpr`
* Support for subqueries in physical plan serialization/deserialization
using `PhysicalProtoConverterExtension` to wire up `ScalarSubqueryExpr`
correctly
* Support for subqueries in logical plan serialization/deserialization
* Add various SLT tests and update expected plan shapes for some tests
## Are these changes tested?
Yes. New SLT coverage for cardinality errors, `ORDER BY` / `JOIN ON` /
aggregate-arg contexts, nested uncorrelated subqueries,
duplicate-subquery deduplication, and partition-pruning filters; new
roundtrip tests for logical and physical plan serialization.
## Are there any user-facing changes?
SQL:
* Uncorrelated scalar subqueries that return more than one row now
result in a runtime error, instead of silently producing incorrect
results.
* Uncorrelated scalar subqueries now work in `ORDER BY`, `JOIN ON`, and
as aggregate function arguments.
Rust APIs:
* In `datafusion-proto`, breaking changes to
`Serializeable::from_bytes_with_registry` (renamed to
`from_bytes_with_ctx`), `parse_expr` / `parse_sorts` / `parse_exprs`,
and the `PhysicalProtoConverterExtension` trait.
Plan shape:
* `LogicalPlan::Subquery` nodes will now be preserved in the logical
plan
* Physical plans can now contain `ScalarSubqueryExec` plan node and
`ScalarSubqueryExpr` expressions
The wire format has also changed to include scalar subqueries.
---------
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>1 parent ca1d39d commit 1bb588e
41 files changed
Lines changed: 3559 additions & 1110 deletions
File tree
- .github/workflows
- datafusion-examples/examples
- custom_data_source
- proto
- datafusion
- core
- src
- expr/src
- logical_plan
- optimizer
- src
- optimize_projections
- tests
- physical-expr/src
- physical-plan/src
- proto
- proto
- src
- bytes
- generated
- logical_plan
- physical_plan
- tests/cases
- sqllogictest/test_files
- tpch/plans
- docs/source/library-user-guide/upgrading
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
34 | 34 | | |
35 | 35 | | |
36 | 36 | | |
37 | | - | |
| 37 | + | |
38 | 38 | | |
39 | | - | |
| 39 | + | |
40 | 40 | | |
41 | 41 | | |
42 | 42 | | |
| |||
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
Lines changed: 17 additions & 21 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
20 | | - | |
21 | | - | |
22 | | - | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
| |||
28 | 28 | | |
29 | 29 | | |
30 | 30 | | |
31 | | - | |
32 | | - | |
33 | | - | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
34 | 35 | | |
35 | 36 | | |
36 | 37 | | |
| |||
61 | 62 | | |
62 | 63 | | |
63 | 64 | | |
64 | | - | |
| 65 | + | |
65 | 66 | | |
66 | 67 | | |
67 | 68 | | |
| |||
177 | 178 | | |
178 | 179 | | |
179 | 180 | | |
180 | | - | |
| 181 | + | |
181 | 182 | | |
182 | 183 | | |
183 | 184 | | |
| |||
303 | 304 | | |
304 | 305 | | |
305 | 306 | | |
306 | | - | |
| 307 | + | |
| 308 | + | |
307 | 309 | | |
308 | | - | |
| 310 | + | |
309 | 311 | | |
310 | 312 | | |
311 | 313 | | |
| |||
371 | 373 | | |
372 | 374 | | |
373 | 375 | | |
374 | | - | |
375 | | - | |
376 | 376 | | |
| 377 | + | |
377 | 378 | | |
378 | 379 | | |
379 | 380 | | |
| |||
395 | 396 | | |
396 | 397 | | |
397 | 398 | | |
398 | | - | |
399 | | - | |
400 | | - | |
401 | | - | |
402 | | - | |
| 399 | + | |
403 | 400 | | |
404 | 401 | | |
405 | 402 | | |
| |||
409 | 406 | | |
410 | 407 | | |
411 | 408 | | |
412 | | - | |
| 409 | + | |
413 | 410 | | |
414 | 411 | | |
415 | 412 | | |
416 | 413 | | |
417 | 414 | | |
418 | | - | |
419 | 415 | | |
420 | | - | |
| 416 | + | |
421 | 417 | | |
422 | | - | |
| 418 | + | |
423 | 419 | | |
424 | 420 | | |
425 | 421 | | |
| |||
Lines changed: 11 additions & 12 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
20 | | - | |
21 | | - | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
22 | 23 | | |
23 | 24 | | |
24 | 25 | | |
| |||
29 | 30 | | |
30 | 31 | | |
31 | 32 | | |
32 | | - | |
33 | | - | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
34 | 36 | | |
35 | 37 | | |
36 | 38 | | |
| |||
49 | 51 | | |
50 | 52 | | |
51 | 53 | | |
52 | | - | |
| 54 | + | |
53 | 55 | | |
54 | 56 | | |
55 | 57 | | |
| |||
202 | 204 | | |
203 | 205 | | |
204 | 206 | | |
205 | | - | |
206 | | - | |
207 | 207 | | |
| 208 | + | |
208 | 209 | | |
209 | | - | |
| 210 | + | |
210 | 211 | | |
211 | 212 | | |
212 | 213 | | |
| |||
225 | 226 | | |
226 | 227 | | |
227 | 228 | | |
228 | | - | |
229 | 229 | | |
230 | | - | |
| 230 | + | |
231 | 231 | | |
232 | 232 | | |
233 | 233 | | |
| |||
249 | 249 | | |
250 | 250 | | |
251 | 251 | | |
252 | | - | |
253 | | - | |
| 252 | + | |
254 | 253 | | |
255 | 254 | | |
256 | 255 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
144 | 144 | | |
145 | 145 | | |
146 | 146 | | |
| 147 | + | |
147 | 148 | | |
148 | 149 | | |
149 | 150 | | |
| |||
0 commit comments