Commit 948cd09
proto: serialize and dedupe dynamic filters v2 (#21807)
## Which issue does this PR close?
<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->
Informs:
datafusion-contrib/datafusion-distributed#180
Closes: #20418
## Rationale for this change
<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->
Consider you have a plan with a `HashJoinExec` and `DataSourceExec`
```
HashJoinExec(dynamic_filter_1 on a@0)
(...left side of join)
ProjectionExec(a := Column("a", source_index))
DataSourceExec
ParquetSource(predicate = dynamic_filter_2)
```
You serialize the plan, deserialize it, and execute it. What should
happen is that the dynamic filter should "work", meaning:
1. When you deserialize the plan, both the `HashJoinExec` and
`DataSourceExec` should have pointers to the same
`DynamicFilterPhysicalExpr`
2. The `DynamicFilterPhysicalExpr` should be updated during execution by
the `HashJoinExec` and the `DataSourceExec` should filter out rows
This does not happen today for a few reasons, a couple of which this PR
aims to address
1. `DynamicFilterPhysicalExpr` is not survive round-tripping. The
internal exprs get inlined (ex. it may be serialized as `Literal`) due
to the `PhysicalExpr::snapshot()` API
2. Even if `DynamicFilterPhysicalExpr` survives round-tripping, the one
pushed down to the `DataSourceExec` often has different children. In
this case, you have two `DynamicFilterPhysicalExpr` which
do not survive deduping, causing referential integrity to be lost.
## What changes are included in this PR?
<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->
This PR aims to fix those problems by:
1. Removing the `snapshot()` call from the serialization process
2. Adding protos for `DynamicFilterPhysicalExpr` so it can be serialized
and deserialized
3. Removing `Arc`-based deduplication. We now only dedupe on
`expression_id` if the `PhysicalExpr` reports a `expression_id`.
After this change, only `DynamicFilterPhysicalExpr` reports an
`expression_id`
to be deduped.
4. `expression_id` is now just a random u64. Since a given query likely
only has a few `DynamicFilterPhysicalExpr` instances, the odds of a
collision are very low
5. There's no need for a `DedupingSerializer` anymore since the
`expression_id` is already stored in the dynamic filter proto itself
Future work:
1. Serialize dynamic filters in `HashJoinExec`, `AggregateExec` and
`SortExec`
2. Add tests which actually execute plans after deserialization and
assert that dynamic filtering is functional
3. Add proto converters to the `PhysicalExtensionCodec` trait so
implementors can utilize deduping logic
## Are these changes tested?
- adds tests which roundtrip dynamic filters and assert that referential
integrity is maintained
- removes tests that test `Arc`-based deduplication and session id
rotation since we don't support that anymore
## Are there any user-facing changes?
- The default codec does not call `snapshot()` on `PhysicalExpr` during
serialization anymore. This means that `DynamicFilterPhysicalExpr` are
now serialized and deserialized without snapshotting.
- All `PhysicalExpr` are not deduped anymore. Only
`DynamicFilterPhysicalExpr` is
---------
Co-authored-by: Dmitrii Blaginin <dmitrii@blaginin.me>1 parent bb86364 commit 948cd09
12 files changed
Lines changed: 860 additions & 522 deletions
File tree
- datafusion
- physical-expr-common/src
- physical-expr/src/expressions
- proto
- proto
- src
- generated
- physical_plan
- tests/cases
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
161 | 161 | | |
162 | 162 | | |
163 | 163 | | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
164 | 167 | | |
165 | 168 | | |
166 | 169 | | |
| |||
444 | 447 | | |
445 | 448 | | |
446 | 449 | | |
| 450 | + | |
| 451 | + | |
| 452 | + | |
| 453 | + | |
| 454 | + | |
| 455 | + | |
| 456 | + | |
| 457 | + | |
| 458 | + | |
| 459 | + | |
| 460 | + | |
| 461 | + | |
| 462 | + | |
| 463 | + | |
| 464 | + | |
| 465 | + | |
| 466 | + | |
447 | 467 | | |
448 | 468 | | |
449 | 469 | | |
| |||
Lines changed: 249 additions & 5 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
| 19 | + | |
19 | 20 | | |
20 | 21 | | |
21 | 22 | | |
| |||
76 | 77 | | |
77 | 78 | | |
78 | 79 | | |
79 | | - | |
80 | | - | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
81 | 93 | | |
82 | 94 | | |
83 | | - | |
84 | | - | |
| 95 | + | |
| 96 | + | |
85 | 97 | | |
86 | 98 | | |
87 | 99 | | |
88 | | - | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
89 | 119 | | |
90 | 120 | | |
91 | 121 | | |
92 | 122 | | |
93 | 123 | | |
| 124 | + | |
94 | 125 | | |
95 | 126 | | |
96 | 127 | | |
| |||
243 | 274 | | |
244 | 275 | | |
245 | 276 | | |
| 277 | + | |
| 278 | + | |
246 | 279 | | |
247 | 280 | | |
248 | 281 | | |
| |||
346 | 379 | | |
347 | 380 | | |
348 | 381 | | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
| 385 | + | |
| 386 | + | |
| 387 | + | |
| 388 | + | |
| 389 | + | |
| 390 | + | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
| 398 | + | |
| 399 | + | |
| 400 | + | |
| 401 | + | |
| 402 | + | |
| 403 | + | |
| 404 | + | |
| 405 | + | |
| 406 | + | |
| 407 | + | |
| 408 | + | |
| 409 | + | |
| 410 | + | |
| 411 | + | |
| 412 | + | |
| 413 | + | |
| 414 | + | |
| 415 | + | |
| 416 | + | |
| 417 | + | |
| 418 | + | |
| 419 | + | |
| 420 | + | |
| 421 | + | |
| 422 | + | |
| 423 | + | |
| 424 | + | |
| 425 | + | |
| 426 | + | |
| 427 | + | |
| 428 | + | |
| 429 | + | |
| 430 | + | |
| 431 | + | |
| 432 | + | |
| 433 | + | |
| 434 | + | |
| 435 | + | |
| 436 | + | |
| 437 | + | |
349 | 438 | | |
350 | 439 | | |
351 | 440 | | |
| |||
364 | 453 | | |
365 | 454 | | |
366 | 455 | | |
| 456 | + | |
367 | 457 | | |
368 | 458 | | |
369 | 459 | | |
| |||
444 | 534 | | |
445 | 535 | | |
446 | 536 | | |
| 537 | + | |
| 538 | + | |
| 539 | + | |
| 540 | + | |
447 | 541 | | |
448 | 542 | | |
| 543 | + | |
| 544 | + | |
| 545 | + | |
| 546 | + | |
| 547 | + | |
| 548 | + | |
| 549 | + | |
| 550 | + | |
| 551 | + | |
| 552 | + | |
| 553 | + | |
| 554 | + | |
| 555 | + | |
| 556 | + | |
| 557 | + | |
| 558 | + | |
| 559 | + | |
| 560 | + | |
| 561 | + | |
| 562 | + | |
| 563 | + | |
| 564 | + | |
| 565 | + | |
| 566 | + | |
449 | 567 | | |
450 | 568 | | |
451 | 569 | | |
| |||
861 | 979 | | |
862 | 980 | | |
863 | 981 | | |
| 982 | + | |
| 983 | + | |
| 984 | + | |
| 985 | + | |
| 986 | + | |
| 987 | + | |
| 988 | + | |
| 989 | + | |
| 990 | + | |
| 991 | + | |
| 992 | + | |
| 993 | + | |
| 994 | + | |
| 995 | + | |
| 996 | + | |
| 997 | + | |
| 998 | + | |
| 999 | + | |
| 1000 | + | |
| 1001 | + | |
| 1002 | + | |
| 1003 | + | |
| 1004 | + | |
| 1005 | + | |
| 1006 | + | |
| 1007 | + | |
| 1008 | + | |
| 1009 | + | |
| 1010 | + | |
| 1011 | + | |
| 1012 | + | |
| 1013 | + | |
| 1014 | + | |
| 1015 | + | |
| 1016 | + | |
| 1017 | + | |
| 1018 | + | |
| 1019 | + | |
| 1020 | + | |
| 1021 | + | |
| 1022 | + | |
| 1023 | + | |
| 1024 | + | |
| 1025 | + | |
| 1026 | + | |
| 1027 | + | |
| 1028 | + | |
| 1029 | + | |
| 1030 | + | |
| 1031 | + | |
| 1032 | + | |
| 1033 | + | |
| 1034 | + | |
| 1035 | + | |
| 1036 | + | |
| 1037 | + | |
| 1038 | + | |
| 1039 | + | |
| 1040 | + | |
| 1041 | + | |
| 1042 | + | |
| 1043 | + | |
| 1044 | + | |
| 1045 | + | |
| 1046 | + | |
| 1047 | + | |
| 1048 | + | |
| 1049 | + | |
| 1050 | + | |
| 1051 | + | |
| 1052 | + | |
| 1053 | + | |
| 1054 | + | |
| 1055 | + | |
| 1056 | + | |
| 1057 | + | |
| 1058 | + | |
| 1059 | + | |
| 1060 | + | |
| 1061 | + | |
| 1062 | + | |
| 1063 | + | |
| 1064 | + | |
| 1065 | + | |
| 1066 | + | |
| 1067 | + | |
| 1068 | + | |
| 1069 | + | |
| 1070 | + | |
| 1071 | + | |
| 1072 | + | |
| 1073 | + | |
| 1074 | + | |
| 1075 | + | |
| 1076 | + | |
| 1077 | + | |
| 1078 | + | |
| 1079 | + | |
| 1080 | + | |
| 1081 | + | |
| 1082 | + | |
| 1083 | + | |
| 1084 | + | |
| 1085 | + | |
| 1086 | + | |
| 1087 | + | |
| 1088 | + | |
| 1089 | + | |
| 1090 | + | |
| 1091 | + | |
| 1092 | + | |
| 1093 | + | |
| 1094 | + | |
| 1095 | + | |
| 1096 | + | |
| 1097 | + | |
| 1098 | + | |
| 1099 | + | |
| 1100 | + | |
| 1101 | + | |
| 1102 | + | |
| 1103 | + | |
| 1104 | + | |
| 1105 | + | |
| 1106 | + | |
| 1107 | + | |
864 | 1108 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
45 | 45 | | |
46 | 46 | | |
47 | 47 | | |
48 | | - | |
| 48 | + | |
49 | 49 | | |
50 | 50 | | |
51 | 51 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
66 | 66 | | |
67 | 67 | | |
68 | 68 | | |
69 | | - | |
70 | 69 | | |
71 | 70 | | |
72 | 71 | | |
| |||
0 commit comments