[datafusion-spark] Add Spark-compatible ceil function#20593
[datafusion-spark] Add Spark-compatible ceil function#20593comphead merged 33 commits intoapache:mainfrom
Conversation
comphead
left a comment
There was a problem hiding this comment.
Thanks @shivbhatia10 it would be nice having Spark .slt tests covering edge cases and datatypes as well as commenting in file body what is the difference between Spark and current DF implementation
|
Hey @comphead, I've added some |
|
Thanks @shivbhatia10 would you mind providing an example where DF ceil and Spark ceil return diff result? I'm still struggling to understand the difference between them |
|
Taking a look @shivbhatia10 |
|
Hi @coderfender, thank you very much for the comprehensive review here, I've tried to address your comments, please let me know if there's anything else I should add. |
| Ok(DataType::Decimal128(new_p, 0)) | ||
| } else { | ||
| Ok(DataType::Decimal128(*p, *s)) | ||
| } |
There was a problem hiding this comment.
this makes it clear :)
| @@ -81,7 +81,8 @@ impl ScalarUDFImpl for SparkCeil { | |||
| } | |||
| } | |||
| dt if dt.is_integer() => Ok(dt.clone()), | |||
There was a problem hiding this comment.
Shouldnt all integer inputs produce a long / int64 output ?
There was a problem hiding this comment.
Yes that's my bad, I checked the Spark source code and I think they always cast any non-decimal type to int64, I've set it up that way now: 5ad7ef6
| "Returns the smallest integer not less than expr.", | ||
| arg1 | ||
| )); | ||
| export_functions!((ceil, "Returns the ceiling of expr.", arg1)); |
There was a problem hiding this comment.
nice way to keep brief and concise
| ScalarValue::Decimal128(v, p, s) if *s > 0 => { | ||
| let div = 10_i128.pow(*s as u32); | ||
| let div = 10_i128.pow_wrapping(*s as u32); | ||
| let new_p = ((*p as i64) - (*s as i64) + 1).clamp(1, 38) as u8; |
There was a problem hiding this comment.
There is a lot of duplication in the code between scalar and array inputs. Can we create simple functions and inline them to not repeat ourselves ?
There was a problem hiding this comment.
Have tried to extract out a few methods between both paths: 9b23318
coderfender
left a comment
There was a problem hiding this comment.
Almost there . Left some minor comments to improve quality and a potential datatype mismatch
| query I | ||
| SELECT ceil(5::int); | ||
| ---- | ||
| 5 |
There was a problem hiding this comment.
@shivbhatia10 , we might need to test the output data type along with the value in the SLT files ?
|
Hey @coderfender sorry for the delay here, I've tried to respond to your comments, let me know if there's anything else that would be good to change |
coderfender
left a comment
There was a problem hiding this comment.
LGTM with minor suggestion . Thank you @shivbhatia10
| } | ||
| } | ||
| DataType::Float32 | DataType::Float64 => Ok(DataType::Int64), | ||
| dt if dt.is_integer() => Ok(DataType::Int64), |
There was a problem hiding this comment.
Can we simplify line 79 and 80? Both float and int return Int64 so why two match arms ?
There was a problem hiding this comment.
I had this earlier but thought using the matches! macro was less clean, but happy to unify these arms: 174833b
comphead
left a comment
There was a problem hiding this comment.
Thanks @shivbhatia10 I took the liberty to refer to a ticket to support optional scale
I think we super close, WDYT of adding
fn aliases(&self)
? I think we can also cover https://spark.apache.org/docs/latest/api/sql/#ceiling
As per SparkCode it is an alias
expressionBuilder("ceil", CeilExpressionBuilder),
expressionBuilder("ceiling", CeilExpressionBuilder, true),
…afusion into sb/datafusion-math-ceil
|
Thank you @coderfender and @comphead - I agree that we should add "ceiling" as an alias, I've fixed that |
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes apache#123` indicates that this PR will close issue apache#123. --> Part of apache#15914 and apache/datafusion-comet#1704 I also just noticed apache#15916 butI believe work has been stale on this one. ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> Helping to continue adding Spark compatible expressions to datafusion-spark. ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> Add new `ceil` function. ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> Yes, unit tests. ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> No. <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> --------- Co-authored-by: Shiv Bhatia <sbhatia@palantir.com> Co-authored-by: Oleks V <comphead@users.noreply.github.com>
Which issue does this PR close?
Part of #15914 and apache/datafusion-comet#1704
I also just noticed #15916 butI believe work has been stale on this one.
Rationale for this change
Helping to continue adding Spark compatible expressions to datafusion-spark.
What changes are included in this PR?
Add new
ceilfunction.Are these changes tested?
Yes, unit tests.
Are there any user-facing changes?
No.