Commit 69cd666
authored
fix: skip empty metadata in intersect_metadata_for_union to prevent s… (#21127)
## Which issue does this PR close?
- Closes #19049.
## Rationale for this change
We're building a SQL engine on top of DataFusion and hit this while
running benchmarks. A `UNION ALL` query against Parquet files that carry
field metadata (like `PARQUET:field_id` or InfluxDB's
`iox::column::type`). When one branch of the union has a NULL literal,
`intersect_metadata_for_union` intersects the metadata from the data
source with the empty metadata from the NULL — and since intersecting
anything with an empty set gives empty, all metadata gets wiped out.
Later, when `optimize_projections` prunes columns and `recompute_schema`
rebuilds the Union schema, the logical schema has `{}` while the
physical schema still has the original metadata from Parquet. This
causes:
```
Internal error: Physical input schema should be the same as the one
converted from logical input schema.
Differences:
- field metadata at index 0 [usage_idle]: (physical) {"iox::column::type": "..."} vs (logical) {}
```
As @erratic-pattern and @alamb discussed in the issue, empty metadata
from NULL literals isn't saying "this field has no metadata" — it's
saying "I don't know." It shouldn't erase metadata from branches that
actually have it.
I fixed this in `intersect_metadata_for_union` directly rather than
patching `optimize_projections` or `recompute_schema`, since that's
where the bad intersection happens and it covers all code paths that
derive Union schemas.
## What changes are included in this PR?
One change to `intersect_metadata_for_union` in
`datafusion/expr/src/expr.rs`: branches with empty metadata are skipped
during intersection instead of participating. Non-empty branches still
intersect normally (conflicting values still get dropped). If every
branch is empty, the result is empty — same as before.
## Are these changes tested?
Added 7 unit tests for `intersect_metadata_for_union`:
- Same metadata across branches — preserved
- Conflicting non-empty values — dropped (existing behavior, unchanged)
- One branch has metadata, other is empty — metadata preserved (the fix)
- Empty branch comes first — still works
- All branches empty — empty result
- Mix of empty and conflicting non-empty — intersects only the non-empty
ones
- No inputs — empty result
The full end-to-end reproduction needs Parquet files with field metadata
as described in the issue. The unit tests cover the intersection logic
directly.
## Are there any user-facing changes?
No API changes. `UNION ALL` queries combining metadata-carrying sources
with NULL literals will stop failing with schema mismatch errors.1 parent c4562dc commit 69cd666
2 files changed
Lines changed: 96 additions & 7 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
512 | 512 | | |
513 | 513 | | |
514 | 514 | | |
515 | | - | |
516 | | - | |
517 | | - | |
518 | | - | |
| 515 | + | |
519 | 516 | | |
520 | 517 | | |
521 | | - | |
522 | | - | |
| 518 | + | |
| 519 | + | |
| 520 | + | |
| 521 | + | |
| 522 | + | |
| 523 | + | |
| 524 | + | |
| 525 | + | |
| 526 | + | |
| 527 | + | |
| 528 | + | |
| 529 | + | |
| 530 | + | |
| 531 | + | |
523 | 532 | | |
524 | 533 | | |
525 | | - | |
| 534 | + | |
526 | 535 | | |
527 | 536 | | |
528 | 537 | | |
| |||
4127 | 4136 | | |
4128 | 4137 | | |
4129 | 4138 | | |
| 4139 | + | |
| 4140 | + | |
| 4141 | + | |
| 4142 | + | |
| 4143 | + | |
| 4144 | + | |
| 4145 | + | |
| 4146 | + | |
| 4147 | + | |
| 4148 | + | |
| 4149 | + | |
| 4150 | + | |
| 4151 | + | |
| 4152 | + | |
| 4153 | + | |
| 4154 | + | |
| 4155 | + | |
| 4156 | + | |
| 4157 | + | |
| 4158 | + | |
| 4159 | + | |
| 4160 | + | |
| 4161 | + | |
| 4162 | + | |
| 4163 | + | |
| 4164 | + | |
| 4165 | + | |
| 4166 | + | |
| 4167 | + | |
| 4168 | + | |
| 4169 | + | |
| 4170 | + | |
| 4171 | + | |
| 4172 | + | |
| 4173 | + | |
| 4174 | + | |
| 4175 | + | |
| 4176 | + | |
| 4177 | + | |
| 4178 | + | |
| 4179 | + | |
| 4180 | + | |
| 4181 | + | |
| 4182 | + | |
| 4183 | + | |
| 4184 | + | |
| 4185 | + | |
| 4186 | + | |
| 4187 | + | |
| 4188 | + | |
| 4189 | + | |
| 4190 | + | |
| 4191 | + | |
| 4192 | + | |
| 4193 | + | |
| 4194 | + | |
| 4195 | + | |
| 4196 | + | |
| 4197 | + | |
| 4198 | + | |
| 4199 | + | |
| 4200 | + | |
| 4201 | + | |
4130 | 4202 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
139 | 139 | | |
140 | 140 | | |
141 | 141 | | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
142 | 159 | | |
143 | 160 | | |
144 | 161 | | |
| |||
0 commit comments