Commit 83c2c01
fix: rebind RecursiveQueryExec batches to the declared output schema (#21770)
## Which issue does this PR close?
- Closes #.
<!-- Happy to file a tracking issue if the project prefers. -->
## Rationale for this change
A recursive CTE whose anchor aliases a computed column (e.g. `upper(val)
AS val`) and whose recursive term leaves the same expression un-aliased
(`upper(r.val)`) currently returns the wrong column name — but only when
the outer query has both `ORDER BY` and `LIMIT`. The plan-level schema
is correct (taken from the anchor), but `RecursiveQueryExec` forwards
recursive-term `RecordBatch`es with their native schemas intact.
Downstream consumers that key on `batch.schema().field(i).name()` —
`SortExec`'s TopK path, CSV/JSON writers, user-code `collect`ors — then
observe the leaked recursive-branch name instead of the anchor's.
MRE (fails on `datafusion-cli` pre-fix):
```sql
CREATE TABLE records (id VARCHAR NOT NULL, parent_id VARCHAR,
ts TIMESTAMP NOT NULL, val VARCHAR);
INSERT INTO records VALUES
('a00', NULL, TIMESTAMP '2025-01-01 00:00:00', 'v_span'),
('a01', 'a00', TIMESTAMP '2025-01-01 00:00:01', 'v_log_1'),
('a02', 'a01', TIMESTAMP '2025-01-01 00:00:02', 'v_log_2'),
('a03', 'a02', TIMESTAMP '2025-01-01 00:00:03', 'v_log_3');
WITH RECURSIVE descendants AS (
SELECT id, parent_id, ts, upper(val) AS val
FROM records WHERE id = 'a00'
UNION ALL
SELECT r.id, r.parent_id, r.ts, upper(r.val)
FROM records r INNER JOIN descendants d ON r.parent_id = d.id
)
SELECT id, parent_id, ts, val
FROM descendants ORDER BY ts ASC LIMIT 10;
```
Pre-fix header column reads `upper(r.val)`; expected `val`.
Only `ORDER BY + LIMIT` triggers it because:
- `SortExec` without fetch re-materialises batches via `ExternalSorter`
(stable schema).
- `LimitExec` without sort sits above `RecursiveQueryExec`, never mixing
branches.
- `SortExec` with fetch uses the TopK path, which emits
`interleave_record_batch` output that carries whichever input batch's
schema was used last.
## What changes are included in this PR?
In `RecursiveQueryStream::push_batch`, rebind each emitted batch to the
declared output schema (taken from the anchor term). Logical-plan
coercion in `LogicalPlanBuilder::to_recursive_query` already guarantees
matching column types, so this is a zero-copy field rebind. 14 lines of
production code + comment.
## Are these changes tested?
Yes.
-
`datafusion/core/tests/sql/select.rs::test_recursive_cte_batch_schema_stable_with_order_by_limit`
— runs the MRE and asserts every collected `RecordBatch`'s schema field
names equal `["id", "parent_id", "ts", "val"]`. Fails pre-fix with
`left: ["id", "parent_id", "ts", "upper(r.val)"]`.
- `datafusion/sqllogictest/test_files/cte.slt` — round-trips the buggy
query through a headered CSV file (whose header is written from each
batch's schema) and re-reads it as headerless CSV so the header row is
compared as a data row. SLT otherwise cannot assert column names
directly, so this is the only way to surface the leak inside SLT.
Both regression tests were verified to fail on the base branch before
the fix was applied and pass after.
## Are there any user-facing changes?
Recursive CTEs with mismatched anchor/recursive column names will now
emit batches with the anchor-declared names consistently, regardless of
downstream operators. No API changes.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent 4fbdfd0 commit 83c2c01
3 files changed
Lines changed: 153 additions & 0 deletions
File tree
- datafusion
- core/tests/sql
- physical-plan/src
- sqllogictest/test_files
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
431 | 431 | | |
432 | 432 | | |
433 | 433 | | |
| 434 | + | |
| 435 | + | |
| 436 | + | |
| 437 | + | |
| 438 | + | |
| 439 | + | |
| 440 | + | |
| 441 | + | |
| 442 | + | |
| 443 | + | |
| 444 | + | |
| 445 | + | |
| 446 | + | |
| 447 | + | |
| 448 | + | |
| 449 | + | |
| 450 | + | |
| 451 | + | |
| 452 | + | |
| 453 | + | |
| 454 | + | |
| 455 | + | |
| 456 | + | |
| 457 | + | |
| 458 | + | |
| 459 | + | |
| 460 | + | |
| 461 | + | |
| 462 | + | |
| 463 | + | |
| 464 | + | |
| 465 | + | |
| 466 | + | |
| 467 | + | |
| 468 | + | |
| 469 | + | |
| 470 | + | |
| 471 | + | |
| 472 | + | |
| 473 | + | |
| 474 | + | |
| 475 | + | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
| 485 | + | |
| 486 | + | |
| 487 | + | |
| 488 | + | |
| 489 | + | |
| 490 | + | |
| 491 | + | |
| 492 | + | |
| 493 | + | |
| 494 | + | |
| 495 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
317 | 317 | | |
318 | 318 | | |
319 | 319 | | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
320 | 334 | | |
321 | 335 | | |
322 | 336 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1223 | 1223 | | |
1224 | 1224 | | |
1225 | 1225 | | |
| 1226 | + | |
| 1227 | + | |
| 1228 | + | |
| 1229 | + | |
| 1230 | + | |
| 1231 | + | |
| 1232 | + | |
| 1233 | + | |
| 1234 | + | |
| 1235 | + | |
| 1236 | + | |
| 1237 | + | |
| 1238 | + | |
| 1239 | + | |
| 1240 | + | |
| 1241 | + | |
| 1242 | + | |
| 1243 | + | |
| 1244 | + | |
| 1245 | + | |
| 1246 | + | |
| 1247 | + | |
| 1248 | + | |
| 1249 | + | |
| 1250 | + | |
| 1251 | + | |
| 1252 | + | |
| 1253 | + | |
| 1254 | + | |
| 1255 | + | |
| 1256 | + | |
| 1257 | + | |
| 1258 | + | |
| 1259 | + | |
| 1260 | + | |
| 1261 | + | |
| 1262 | + | |
| 1263 | + | |
| 1264 | + | |
| 1265 | + | |
| 1266 | + | |
| 1267 | + | |
| 1268 | + | |
| 1269 | + | |
| 1270 | + | |
| 1271 | + | |
| 1272 | + | |
| 1273 | + | |
| 1274 | + | |
| 1275 | + | |
| 1276 | + | |
| 1277 | + | |
| 1278 | + | |
| 1279 | + | |
| 1280 | + | |
| 1281 | + | |
| 1282 | + | |
| 1283 | + | |
| 1284 | + | |
| 1285 | + | |
| 1286 | + | |
| 1287 | + | |
| 1288 | + | |
| 1289 | + | |
| 1290 | + | |
| 1291 | + | |
| 1292 | + | |
| 1293 | + | |
| 1294 | + | |
| 1295 | + | |
| 1296 | + | |
| 1297 | + | |
| 1298 | + | |
| 1299 | + | |
| 1300 | + | |
| 1301 | + | |
| 1302 | + | |
1226 | 1303 | | |
1227 | 1304 | | |
1228 | 1305 | | |
| |||
0 commit comments