Commit 7c3ea05
authored
feat: add AggregateMode::PartialReduce for tree-reduce aggregation (#20019)
DataFusion's current `AggregateMode` enum has four variants covering
three of the four cells in the input/output matrix:
| | Input: raw data | Input: partial state |
| - | - | - |
| Output: final values | `Single` / `SinglePartitioned` | `Final` /
`FinalPartitioned` |
| Output: partial state | `Partial` | ??? |
This PR adds `AggregateMode::PartialReduce` to fill in the missing cell:
it takes partially-reduced values as input, and reduces them further,
but without finalizing.
This is useful because it's the key component needed to implement
distributed tree-reduction (as seen in e.g. the Scuba or Honeycomb
papers): a set of worker nodes each perform multithreaded `Partial`
aggregations, feed those into a `PartialReduce` to reduce all of this
node's values into a single row, and then a head node collects the
outputs from all nodes' `PartialReduce` to feed into a `Final`
reduction.
PR can be reviewed commit by commit: first commit is pure
refactor/simplification; most places we were matching on `AggregateMode`
we were actually just trying to either check which row of the above
table we were in, or else which column. So now we have `is_first_stage`
(tells you which column) and `is_last_stage` (tells you which row) and
we use them everywhere.
Second commit adds `PartialReduce`, and is pretty small because
`is_first_stage`/`is_last_stage` do most of the heavy lifting. It also
adds a test demonstrating a minimal Partial -> PartialReduce -> Final
tree-reduction.1 parent f997169 commit 7c3ea05
9 files changed
Lines changed: 304 additions & 108 deletions
File tree
- datafusion
- physical-optimizer/src
- physical-plan/src/aggregates
- proto
- proto
- src
- generated
- physical_plan
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
23 | | - | |
| 23 | + | |
24 | 24 | | |
25 | 25 | | |
26 | 26 | | |
| |||
116 | 116 | | |
117 | 117 | | |
118 | 118 | | |
119 | | - | |
| 119 | + | |
120 | 120 | | |
121 | 121 | | |
122 | 122 | | |
123 | 123 | | |
124 | 124 | | |
125 | | - | |
| 125 | + | |
126 | 126 | | |
127 | 127 | | |
128 | 128 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
25 | 25 | | |
26 | 26 | | |
27 | 27 | | |
28 | | - | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
29 | 31 | | |
30 | 32 | | |
31 | 33 | | |
| |||
81 | 83 | | |
82 | 84 | | |
83 | 85 | | |
84 | | - | |
| 86 | + | |
85 | 87 | | |
86 | 88 | | |
87 | 89 | | |
| |||
0 commit comments