DataFusion uses name-based field mapping when coercing struct types across different operations. This document explains how struct coercion works, when it applies, and how to handle NULL fields.
When combining structs from different sources (e.g., in UNION, array construction, or JOINs), DataFusion matches struct fields by name rather than by position. This provides more robust and predictable behavior compared to positional matching.
-- These two structs have the same fields in different order
SELECT [{a: 1, b: 2}, {b: 3, a: 4}];
-- Result: Field names matched, values unified
-- [{"a": 1, "b": 2}, {"a": 4, "b": 3}]The following query operations use name-based field mapping for struct coercion:
When creating array literals with struct elements that have different field orders:
-- Structs with reordered fields in array literal
SELECT [{x: 1, y: 2}, {y: 3, x: 4}];
-- Unified type: List(Struct("x": Int32, "y": Int32))
-- Values: [{"x": 1, "y": 2}, {"x": 4, "y": 3}]When it applies:
- Array literals with struct elements:
[{...}, {...}] - Nested arrays with structs:
[[{x: 1}, {x: 2}]]
When constructing arrays from table columns with different struct schemas:
CREATE TABLE t_left (s struct(x int, y int)) AS VALUES ({x: 1, y: 2});
CREATE TABLE t_right (s struct(y int, x int)) AS VALUES ({y: 3, x: 4});
-- Dynamically constructs unified array schema
SELECT [t_left.s, t_right.s] FROM t_left JOIN t_right;
-- Result: [{"x": 1, "y": 2}, {"x": 4, "y": 3}]When it applies:
- Array construction with column references:
[col1, col2] - Array construction in joins with matching field names
When combining query results with different struct field orders:
SELECT {a: 1, b: 2} as s
UNION ALL
SELECT {b: 3, a: 4} as s;
-- Result: {"a": 1, "b": 2} and {"a": 4, "b": 3}When it applies:
- UNION ALL with structs: field names matched across branches
- UNION (deduplicated) with structs
When multiple CTEs produce structs with different field orders that are combined:
WITH
t1 AS (SELECT {a: 1, b: 2} as s),
t2 AS (SELECT {b: 3, a: 4} as s)
SELECT s FROM t1
UNION ALL
SELECT s FROM t2;
-- Result: Field names matched across CTEsWhen creating tables or temporary results with struct values in different field orders:
CREATE TABLE t AS VALUES ({a: 1, b: 2}), ({b: 3, a: 4});
-- Table schema unified: struct(a: int, b: int)
-- Values: {a: 1, b: 2} and {a: 4, b: 3}When joining tables where the JOIN condition involves structs with different field orders:
CREATE TABLE orders (customer struct(name varchar, id int));
CREATE TABLE customers (info struct(id int, name varchar));
-- Join matches struct fields by name
SELECT * FROM orders
JOIN customers ON orders.customer = customers.info;When collecting structs with different field orders using aggregate functions like array_agg:
SELECT array_agg(s) FROM (
SELECT {x: 1, y: 2} as s
UNION ALL
SELECT {y: 3, x: 4} as s
) t
GROUP BY category;
-- Result: Array of structs with unified field orderWhen using window functions with struct expressions having different field orders:
SELECT
id,
row_number() over (partition by s order by id) as rn
FROM (
SELECT {category: 1, value: 10} as s, 1 as id
UNION ALL
SELECT {value: 20, category: 1} as s, 2 as id
);
-- Fields matched by name in PARTITION BY clauseWhen structs have different field sets, missing fields are filled with NULL values during coercion.
-- Struct in first position has fields: a, b
-- Struct in second position has fields: b, c
-- Unified schema includes all fields: a, b, c
SELECT [
CAST({a: 1, b: 2} AS STRUCT(a INT, b INT, c INT)),
CAST({b: 3, c: 4} AS STRUCT(a INT, b INT, c INT))
];
-- Result:
-- [
-- {"a": 1, "b": 2, "c": NULL},
-- {"a": NULL, "b": 3, "c": 4}
-- ]Field count must match exactly. If structs have different numbers of fields and their field names don't completely overlap, the query will fail:
-- This fails because field sets don't match:
-- t_left has {x, y} but t_right has {x, y, z}
SELECT [t_left.s, t_right.s] FROM t_left JOIN t_right;
-- Error: Cannot coerce struct with mismatched field countsWorkaround: Use explicit CAST
To handle partial field overlap, explicitly cast structs to a unified schema:
SELECT [
CAST(t_left.s AS STRUCT(x INT, y INT, z INT)),
CAST(t_right.s AS STRUCT(x INT, y INT, z INT))
] FROM t_left JOIN t_right;DataFusion supports comparing STRUCT values with standard comparison operators
(=, !=, <, <=, >, >=). Ordering comparisons are lexicographical and
follow DataFusion's default ascending comparison behavior, where NULL sorts
before non-NULL values.
SELECT {x: 1, y: 2} < {x: 1, y: 3};
-- true
SELECT {x: 1, y: NULL} < {x: 1, y: 2};
-- true
SELECT {x: 1, y: NULL} = {x: 1, y: NULL};
--trueIf you have existing code that relied on positional struct field matching, you may need to update it.
Old behavior (positional):
-- These would have been positionally mapped (left-to-right)
SELECT [{x: 1, y: 2}, {y: 3, x: 4}];
-- Old result (positional): [{"x": 1, "y": 2}, {"y": 3, "x": 4}]New behavior (name-based):
-- Now uses name-based matching
SELECT [{x: 1, y: 2}, {y: 3, x: 4}];
-- New result (by name): [{"x": 1, "y": 2}, {"x": 4, "y": 3}]- Review struct operations - Look for queries that combine structs from different sources
- Check field names - Verify that field names match as expected (not positions)
- Test with new coercion - Run queries and verify the results match your expectations
- Handle field reordering - If you need specific field orders, use explicit CAST operations
If you need precise control over struct field order and types, use explicit CAST:
-- Guarantee specific field order and types
SELECT CAST({b: 3, a: 4} AS STRUCT(a INT, b INT));
-- Result: {"a": 4, "b": 3}When joining or combining structs, define target schemas explicitly:
-- Good: explicit schema definition
SELECT CAST(data AS STRUCT(id INT, name VARCHAR, active BOOLEAN))
FROM external_source;Prefer named struct constructors for clarity:
-- Good: field names are explicit
SELECT named_struct('id', 1, 'name', 'Alice', 'active', true);
-- Or using struct literal syntax
SELECT {id: 1, name: 'Alice', active: true};Always verify that field mappings work as expected:
-- Use arrow_typeof to verify unified schema
SELECT arrow_typeof([{x: 1, y: 2}, {y: 3, x: 4}]);
-- Result: List(Struct("x": Int32, "y": Int32))When combining structs with partial field overlap, use explicit CAST:
-- Instead of relying on implicit coercion
SELECT [
CAST(left_struct AS STRUCT(x INT, y INT, z INT)),
CAST(right_struct AS STRUCT(x INT, y INT, z INT))
];In complex queries, document the expected struct schemas:
-- Expected schema: {customer_id: INT, name: VARCHAR, age: INT}
SELECT {
customer_id: c.id,
name: c.name,
age: c.age
} as customer_info
FROM customers c;Cause: Trying to combine structs with different numbers of fields.
Solution:
-- Use explicit CAST to handle missing fields
SELECT [
CAST(struct1 AS STRUCT(a INT, b INT, c INT)),
CAST(struct2 AS STRUCT(a INT, b INT, c INT))
];Cause: Referencing a field name that doesn't exist in the struct.
Solution:
-- Verify field names match exactly (case-sensitive)
SELECT s['field_name'] FROM my_table; -- Use bracket notation for access
-- Or use get_field function
SELECT get_field(s, 'field_name') FROM my_table;Cause: Struct coercion added NULL for missing fields.
Solution: Check that all structs have the required fields, or explicitly handle NULLs:
SELECT COALESCE(s['field'], default_value) FROM my_table;arrow_typeof()- Returns the Arrow type of an expressionstruct()/named_struct()- Creates struct valuesget_field()- Extracts field values from structsCAST()- Explicitly casts structs to specific schemas