Commit dabfa4d
committed
fix: string_to_array('', delim) returns empty array for PostgreSQL compatibility (apache#21104)
## Problem
`string_to_array` was returning incorrect results for empty string input
— both when the delimiter is non-empty and when the delimiter is itself
an empty string. This diverges from PostgreSQL behavior.
| Query | DataFusion (before) | PostgreSQL (expected) |
|---|---|---|
| `string_to_array('', ',')` | `['']` | `{}` |
| `string_to_array('', '')` | `['']` | `{}` |
| `string_to_array('', ',', 'x')` | `['']` | `{}` |
| `string_to_array('', '', 'x')` | `['']` | `{}` |
Results from datafusion-cli
<img width="1435" height="104" alt="Screenshot 2026-03-23 at 9 14 08 AM"
src="https://github.com/user-attachments/assets/2eaae366-7f8a-4220-87d2-f0b311c26712"
/>
**Root cause:** Rust's `str::split()` on an empty string always yields
one empty-string element, so `"".split(",")` produces `[""]`.
Additionally, the empty-delimiter branch unconditionally appended the
(empty) string value. This is subtle because Arrow's text display format
appears not to quote strings, so `[""]` renders as `[]` —
indistinguishable from an actual empty array. Using `cardinality()`
reveals the current length is 1, not 0.
**PostgreSQL reference:**
[db-fiddle](https://www.db-fiddle.com/f/oCF8EPaZFkDNKSg28rVVWy/3)
## Fix
In `datafusion/functions-nested/src/string.rs`:
- **Non-empty delimiter** `(Some(string), Some(delimiter))`: added `if
!string.is_empty()` guard to skip splitting when input is empty.
- **Empty delimiter** `(Some(string), Some(""))`: added `if
!string.is_empty()` guard so the string value is only appended when
non-empty.
Both the plain variant and the `null_value` variant are fixed (4 checks
total).
## Tests
Added sqllogictest cases in
`datafusion/sqllogictest/test_files/array.slt` using `cardinality()` to
unambiguously verify the arrays are truly empty (not just displaying as
empty):
```sql
SELECT cardinality(string_to_array('', ',')) -- 0
SELECT cardinality(string_to_array('', '')) -- 0
SELECT cardinality(string_to_array('', ',', 'x')) -- 0
SELECT cardinality(string_to_array('', '', 'x')) -- 0
```
Each test covers one of the four `is_empty` guard checks. All four fail
without the fix (returning 1 instead of 0).
(cherry picked from commit cdaecf0)1 parent 9723464 commit dabfa4d
2 files changed
+59
-35
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
721 | 721 | | |
722 | 722 | | |
723 | 723 | | |
724 | | - | |
725 | | - | |
726 | | - | |
727 | | - | |
728 | | - | |
729 | | - | |
730 | | - | |
731 | | - | |
732 | | - | |
733 | | - | |
734 | | - | |
735 | | - | |
736 | | - | |
737 | | - | |
738 | | - | |
739 | | - | |
740 | | - | |
741 | | - | |
742 | | - | |
743 | | - | |
744 | | - | |
| 724 | + | |
| 725 | + | |
| 726 | + | |
| 727 | + | |
| 728 | + | |
745 | 729 | | |
746 | | - | |
747 | | - | |
748 | | - | |
| 730 | + | |
| 731 | + | |
| 732 | + | |
| 733 | + | |
| 734 | + | |
| 735 | + | |
| 736 | + | |
| 737 | + | |
| 738 | + | |
| 739 | + | |
| 740 | + | |
| 741 | + | |
| 742 | + | |
| 743 | + | |
| 744 | + | |
| 745 | + | |
| 746 | + | |
| 747 | + | |
| 748 | + | |
749 | 749 | | |
750 | 750 | | |
751 | 751 | | |
752 | 752 | | |
753 | 753 | | |
754 | 754 | | |
755 | 755 | | |
756 | | - | |
757 | | - | |
758 | | - | |
759 | | - | |
| 756 | + | |
| 757 | + | |
| 758 | + | |
| 759 | + | |
| 760 | + | |
| 761 | + | |
760 | 762 | | |
761 | 763 | | |
762 | 764 | | |
763 | 765 | | |
764 | | - | |
765 | | - | |
766 | | - | |
767 | | - | |
768 | | - | |
769 | | - | |
770 | | - | |
| 766 | + | |
| 767 | + | |
| 768 | + | |
| 769 | + | |
| 770 | + | |
| 771 | + | |
| 772 | + | |
| 773 | + | |
| 774 | + | |
771 | 775 | | |
772 | 776 | | |
773 | 777 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
8040 | 8040 | | |
8041 | 8041 | | |
8042 | 8042 | | |
| 8043 | + | |
| 8044 | + | |
| 8045 | + | |
| 8046 | + | |
| 8047 | + | |
| 8048 | + | |
| 8049 | + | |
| 8050 | + | |
| 8051 | + | |
| 8052 | + | |
| 8053 | + | |
| 8054 | + | |
| 8055 | + | |
| 8056 | + | |
| 8057 | + | |
| 8058 | + | |
| 8059 | + | |
| 8060 | + | |
| 8061 | + | |
| 8062 | + | |
8043 | 8063 | | |
8044 | 8064 | | |
8045 | 8065 | | |
| |||
0 commit comments