Skip to content

Add arrow_try_cast UDF#21130

Merged
adriangb merged 3 commits intoapache:mainfrom
pydantic:arrow-try-cast
Mar 24, 2026
Merged

Add arrow_try_cast UDF#21130
adriangb merged 3 commits intoapache:mainfrom
pydantic:arrow-try-cast

Conversation

@adriangb
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

N/A - new feature

Rationale for this change

arrow_cast(expr, 'DataType') casts to Arrow data types specified as strings but errors on failure. try_cast(expr AS type) returns NULL on failure but only works with SQL types. There's currently no way to attempt a cast to a specific Arrow type and get NULL on failure instead of an error.

What changes are included in this PR?

Adds a new arrow_try_cast(expression, datatype) scalar function that combines the behavior of arrow_cast and try_cast:

  • Accepts Arrow data type strings (like arrow_cast)
  • Returns NULL on cast failure instead of erroring (like try_cast)

Implementation details:

  • Reuses arrow_cast's data_type_from_args helper (made pub(crate))
  • Simplifies to Expr::TryCast during optimization (vs Expr::Cast for arrow_cast)
  • Registered alongside existing core functions

Are these changes tested?

Yes — new sqllogictest file arrow_try_cast.slt covering:

  • Successful casts (Int64, Float64, LargeUtf8, Dictionary)
  • Failed cast returning NULL
  • Same-type passthrough
  • NULL input
  • Invalid type string errors
  • Multiple casts in one query

Are there any user-facing changes?

New arrow_try_cast SQL function available.

🤖 Generated with Claude Code

@github-actions github-actions Bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Mar 23, 2026
@adriangb adriangb requested a review from jonahgao March 23, 2026 21:08
Adds a new `arrow_try_cast(expr, 'DataType')` function that casts to
Arrow data types specified as strings (like `arrow_cast`) but returns
NULL on cast failure instead of erroring (like `try_cast`).

The implementation reuses `arrow_cast`'s `data_type_from_args` helper
and simplifies to `Expr::TryCast` during optimization.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added the documentation Improvements or additions to documentation label Mar 23, 2026
Copy link
Copy Markdown
Member

@jonahgao jonahgao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I think this is a very practical UDF.

A minor improvement might be to add more tests. The current test cases are all evaluated at compile time via constant folding. Maybe we need a test likes

select arrow_try_cast(a, 'Int64') from values('100'), (NULL), ('foo') t(a);

This would evaluate the cast during physical execution.

Add tests using VALUES clauses so arrow_try_cast is evaluated at runtime
rather than being constant-folded at compile time.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@adriangb
Copy link
Copy Markdown
Contributor Author

Thanks @jonahgao !

@adriangb adriangb added this pull request to the merge queue Mar 24, 2026
Merged via the queue into apache:main with commit 7f29cb0 Mar 24, 2026
31 checks passed
@adriangb adriangb deleted the arrow-try-cast branch March 24, 2026 13:38
de-bgunter pushed a commit to de-bgunter/datafusion that referenced this pull request Mar 24, 2026
## Which issue does this PR close?

N/A - new feature

## Rationale for this change

`arrow_cast(expr, 'DataType')` casts to Arrow data types specified as
strings but errors on failure. `try_cast(expr AS type)` returns NULL on
failure but only works with SQL types. There's currently no way to
attempt a cast to a specific Arrow type and get NULL on failure instead
of an error.

## What changes are included in this PR?

Adds a new `arrow_try_cast(expression, datatype)` scalar function that
combines the behavior of `arrow_cast` and `try_cast`:
- Accepts Arrow data type strings (like `arrow_cast`)
- Returns NULL on cast failure instead of erroring (like `try_cast`)

Implementation details:
- Reuses `arrow_cast`'s `data_type_from_args` helper (made `pub(crate)`)
- Simplifies to `Expr::TryCast` during optimization (vs `Expr::Cast` for
`arrow_cast`)
- Registered alongside existing core functions

## Are these changes tested?

Yes — new sqllogictest file `arrow_try_cast.slt` covering:
- Successful casts (Int64, Float64, LargeUtf8, Dictionary)
- Failed cast returning NULL
- Same-type passthrough
- NULL input
- Invalid type string errors
- Multiple casts in one query

## Are there any user-facing changes?

New `arrow_try_cast` SQL function available.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants