fix: Fix Spark slice function Null type to GenericListArray casting issue#20469
Conversation
|
Does Spark return null (void) or an array of null (void)? I tested on PySpark 4.1.1 >>> spark.sql("select slice(NULL, 1, 2) as a").printSchema()
root
|-- a: array (nullable = true)
| |-- element: void (containsNull = true)
|
|
Spark returns Case2: input array has null element ( |
|
This seems to suggest return type should be list of nulls instead of just null |
dcaa80c to
2d066d7
Compare
Yes, latest fix aims to address this by returning |
f8cb205 to
f5832c4
Compare
Sorry I don't see how this relates? I mean it seems to suggest that a |
f5832c4 to
2d6254e
Compare
0fc58c6 to
901642c
Compare
|
@Jefffrey Sorry for the delay and thanks again for the review. I have just submitted the latest commit which returns |
Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>
3c43685 to
f6c3f70
Compare
f6c3f70 to
9062d10
Compare
|
Last review comment has been addressed and last CI run has been successful. Thanks for the reviews. |
|
Thanks @erenavsarogullari & @martin-g |
…ting issue (apache#20469) ## Which issue does this PR close? - Closes apache#20466. ## Rationale for this change Currently, Spark `slice` function accepts Null Arrays and return `Null` for this particular queries. DataFusion-Spark `slice` function also needs to return `NULL` when Null Array is set. **Spark Behavior** (tested with latest Spark master): ``` > SELECT slice(NULL, 1, 2); +-----------------+ |slice(NULL, 1, 2)| +-----------------+ | null| +-----------------+ ``` **DF Behaviour:** Current: ``` query error SELECT slice(NULL, 1, 2); ---- DataFusion error: Internal error: could not cast array of type Null to arrow_array::array::list_array::GenericListArray<i32>. This issue was likely caused by a bug in DataFusion's code. Please help us to resolve this by filing a bug report in our issue tracker: https://github.com/apache/datafusion/issues ``` New: ``` query ? SELECT slice(NULL, 1, 2); ---- NULL ``` ## What changes are included in this PR? Explained under first section. ## Are these changes tested? Added new UT cases for both `slice.rs` and `slice.slt`. ## Are there any user-facing changes? Yes, currently, `slice` function returns error message for `Null` Array inputs, however, expected behavior is to be returned `NULL` so end-user will get expected result instead of error message. --------- Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>
Which issue does this PR close?
slicefunctionNulltype toGenericListArraycasting issue #20466.Rationale for this change
Currently, Spark
slicefunction accepts Null Arrays and returnNullfor this particular queries. DataFusion-Sparkslicefunction also needs to returnNULLwhen Null Array is set.Spark Behavior (tested with latest Spark master):
DF Behaviour:
Current:
New:
What changes are included in this PR?
Explained under first section.
Are these changes tested?
Added new UT cases for both
slice.rsandslice.slt.Are there any user-facing changes?
Yes, currently,
slicefunction returns error message forNullArray inputs, however, expected behavior is to be returnedNULLso end-user will get expected result instead of error message.