Skip to content

Commit 1e68674

Browse files
authored
doc: Add documentation explaining the behavior of null values ​​in struct comparisons (#21226)
## Which issue does this PR close? - Closes #. ## Rationale for this change DataFusion already supports tuple-like and `STRUCT` comparisons, but the user-facing documentation did not clearly describe the current comparison semantics, especially around lexicographical ordering and `NULL` handling. The `IN` documentation also did not explain tuple-like values. ## What changes are included in this PR? - Added a short `STRUCT` comparison note to [struct_coercion.md](/Users/jensen/code/datafusion/docs/source/user-guide/sql/struct_coercion.md), documenting that: - `STRUCT` values support standard comparison operators - comparisons are lexicographical by field order - `NULL` is ordered before non-`NULL` - Added minimal examples for `STRUCT` comparisons, including a `NULL` example - Added a short note to [subqueries.md](/Users/jensen/code/datafusion/docs/source/user-guide/sql/subqueries.md) explaining that tuple-like `IN` uses DataFusion's struct equality semantics - Added a concrete example: - `SELECT (7521, 30) IN ((7521, NULL));` - result: `false` ## Are these changes tested? No new tests were added. This PR only updates documentation for existing behavior, and the documented behavior is already covered by existing tuple/struct comparison and `IN` tests. ## Are there any user-facing changes? Yes. This PR updates the SQL user documentation to clarify the current semantics of `STRUCT` comparisons and tuple-like `IN` expressions.
1 parent bc2b36c commit 1e68674

5 files changed

Lines changed: 42 additions & 2 deletions

File tree

datafusion/functions/src/core/named_struct.rs

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,9 @@ use std::sync::Arc;
2727

2828
#[user_doc(
2929
doc_section(label = "Struct Functions"),
30-
description = "Returns an Arrow struct using the specified name and input expressions pairs.",
30+
description = "Returns an Arrow struct using the specified name and input expressions pairs.
31+
For information on comparing and ordering struct values (including `NULL` handling),
32+
see [Comparison and Ordering](struct_coercion.md#comparison-and-ordering).",
3133
syntax_example = "named_struct(expression1_name, expression1_input[, ..., expression_n_name, expression_n_input])",
3234
sql_example = r#"
3335
For example, this query converts two columns `a` and `b` to a single column with

datafusion/functions/src/core/struct.rs

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,9 @@ use std::sync::Arc;
2727
doc_section(label = "Struct Functions"),
2828
description = "Returns an Arrow struct using the specified input expressions optionally named.
2929
Fields in the returned struct use the optional name or the `cN` naming convention.
30-
For example: `c0`, `c1`, `c2`, etc.",
30+
For example: `c0`, `c1`, `c2`, etc.
31+
For information on comparing and ordering struct values (including `NULL` handling),
32+
see [Comparison and Ordering](struct_coercion.md#comparison-and-ordering).",
3133
syntax_example = "struct(expression1[, ..., expression_n])",
3234
sql_example = r#"For example, this query converts two columns `a` and `b` to a single column with
3335
a struct type of fields `field_a` and `c1`:

docs/source/user-guide/sql/scalar_functions.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4709,6 +4709,8 @@ _Alias of [string_to_array](#string_to_array)._
47094709
### `named_struct`
47104710

47114711
Returns an Arrow struct using the specified name and input expressions pairs.
4712+
For information on comparing and ordering struct values (including `NULL` handling),
4713+
see [Comparison and Ordering](struct_coercion.md#comparison-and-ordering).
47124714

47134715
```sql
47144716
named_struct(expression1_name, expression1_input[, ..., expression_n_name, expression_n_input])
@@ -4750,6 +4752,8 @@ _Alias of [struct](#struct)._
47504752
Returns an Arrow struct using the specified input expressions optionally named.
47514753
Fields in the returned struct use the optional name or the `cN` naming convention.
47524754
For example: `c0`, `c1`, `c2`, etc.
4755+
For information on comparing and ordering struct values (including `NULL` handling),
4756+
see [Comparison and Ordering](struct_coercion.md#comparison-and-ordering).
47534757

47544758
```sql
47554759
struct(expression1[, ..., expression_n])

docs/source/user-guide/sql/struct_coercion.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -208,6 +208,26 @@ SELECT [
208208
] FROM t_left JOIN t_right;
209209
```
210210

211+
## Comparison and Ordering
212+
213+
DataFusion supports comparing `STRUCT` values with standard comparison operators
214+
(`=`, `!=`, `<`, `<=`, `>`, `>=`). Ordering comparisons are lexicographical and
215+
follow DataFusion's default ascending comparison behavior, where `NULL` sorts
216+
before non-`NULL` values.
217+
218+
### Examples
219+
220+
```sql
221+
SELECT {x: 1, y: 2} < {x: 1, y: 3};
222+
-- true
223+
224+
SELECT {x: 1, y: NULL} < {x: 1, y: 2};
225+
-- true
226+
227+
SELECT {x: 1, y: NULL} = {x: 1, y: NULL};
228+
--true
229+
```
230+
211231
## Migration Guide: From Positional to Name-Based Matching
212232

213233
If you have existing code that relied on **positional** struct field matching, you may need to update it.

docs/source/user-guide/sql/subqueries.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -102,6 +102,18 @@ SELECT * FROM x WHERE column_1 NOT IN (1,3);
102102
+----------+----------+
103103
```
104104

105+
#### `IN` with tuple-like values and `NULL`
106+
107+
For tuple-like values, `IN` uses DataFusion's struct equality semantics:
108+
109+
```sql
110+
SELECT (1, 1) IN ((1, NULL));
111+
-- false
112+
113+
SELECT (1, NULL) IN ((1, NULL));
114+
-- true
115+
```
116+
105117
## SELECT clause subqueries
106118

107119
`SELECT` clause subqueries use values returned from the inner query as part

0 commit comments

Comments
 (0)