Skip to content

Commit d75fcb8

Browse files
authored
Fix physical expr adapter to resolve physical fields by name, not column index (#20485)
## Which issue does this PR close? * [Comment](#20202 (comment)) on #20202 ## Rationale for this change When adapting physical expressions across differing logical/physical schemas, relying on `Column::index()` can be incorrect if the physical schema column ordering differs from the logical plan (or if a `Column` is constructed with an index that doesn’t match the current physical schema). This can lead to looking up the wrong physical field, causing incorrect casts, type mismatches, or runtime failures. This change ensures the adapter always resolves the physical field using the column **name** against the physical file schema, making expression rewriting robust to schema reordering and avoiding subtle bugs where an index points at an unrelated column. ## What changes are included in this PR? * Updated `create_cast_column_expr` to resolve the physical field via `physical_file_schema.index_of(column.name())` instead of `column.index()`. * Added a regression test that deliberately supplies a mismatched `Column` index and asserts the rewriter still selects the correct physical field by name and produces the expected `CastColumnExpr`. ## Are these changes tested? Yes. * Added `test_create_cast_column_expr_uses_name_lookup_not_column_index` which covers the scenario where physical and logical schemas have different column orders and the provided `Column` index is incorrect. ## Are there any user-facing changes? No direct user-facing changes. This is an internal correctness fix that improves robustness of physical expression adaptation when schema ordering differs between logical and physical plans. <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
1 parent 2347306 commit d75fcb8

1 file changed

Lines changed: 42 additions & 1 deletion

File tree

datafusion/physical-expr-adapter/src/schema_rewriter.rs

Lines changed: 42 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -468,7 +468,10 @@ impl DefaultPhysicalExprAdapterRewriter {
468468
column: Column,
469469
logical_field: &Field,
470470
) -> Result<Transformed<Arc<dyn PhysicalExpr>>> {
471-
let actual_physical_field = self.physical_file_schema.field(column.index());
471+
// Look up the column index in the physical schema by name to ensure correctness.
472+
let physical_column_index = self.physical_file_schema.index_of(column.name())?;
473+
let actual_physical_field =
474+
self.physical_file_schema.field(physical_column_index);
472475

473476
// For struct types, use validate_struct_compatibility which handles:
474477
// - Missing fields in source (filled with nulls)
@@ -1492,4 +1495,42 @@ mod tests {
14921495
DataType::Int64
14931496
);
14941497
}
1498+
1499+
#[test]
1500+
fn test_create_cast_column_expr_uses_name_lookup_not_column_index() {
1501+
// Physical schema has column `a` at index 1; index 0 is an incompatible type.
1502+
let physical_schema = Arc::new(Schema::new(vec![
1503+
Field::new("b", DataType::Binary, true),
1504+
Field::new("a", DataType::Int32, false),
1505+
]));
1506+
1507+
let logical_schema = Arc::new(Schema::new(vec![
1508+
Field::new("a", DataType::Int64, false),
1509+
Field::new("b", DataType::Binary, true),
1510+
]));
1511+
1512+
let rewriter = DefaultPhysicalExprAdapterRewriter {
1513+
logical_file_schema: Arc::clone(&logical_schema),
1514+
physical_file_schema: Arc::clone(&physical_schema),
1515+
};
1516+
1517+
// Deliberately provide the wrong index for column `a`.
1518+
// Regression: this must still resolve against physical field `a` by name.
1519+
let transformed = rewriter
1520+
.create_cast_column_expr(
1521+
Column::new("a", 0),
1522+
logical_schema.field_with_name("a").unwrap(),
1523+
)
1524+
.unwrap();
1525+
1526+
let cast_expr = transformed
1527+
.data
1528+
.as_any()
1529+
.downcast_ref::<CastColumnExpr>()
1530+
.expect("Expected CastColumnExpr");
1531+
1532+
assert_eq!(cast_expr.input_field().name(), "a");
1533+
assert_eq!(cast_expr.input_field().data_type(), &DataType::Int32);
1534+
assert_eq!(cast_expr.target_field().data_type(), &DataType::Int64);
1535+
}
14951536
}

0 commit comments

Comments
 (0)