Skip to content

[datafusion-spark] Add Spark-compatible ceil function#20593

Merged
comphead merged 33 commits intoapache:mainfrom
shivbhatia10:sb/datafusion-math-ceil
Apr 13, 2026
Merged

[datafusion-spark] Add Spark-compatible ceil function#20593
comphead merged 33 commits intoapache:mainfrom
shivbhatia10:sb/datafusion-math-ceil

Conversation

@shivbhatia10
Copy link
Copy Markdown
Contributor

@shivbhatia10 shivbhatia10 commented Feb 27, 2026

Which issue does this PR close?

Part of #15914 and apache/datafusion-comet#1704

I also just noticed #15916 butI believe work has been stale on this one.

Rationale for this change

Helping to continue adding Spark compatible expressions to datafusion-spark.

What changes are included in this PR?

Add new ceil function.

Are these changes tested?

Yes, unit tests.

Are there any user-facing changes?

No.

@github-actions github-actions bot added the spark label Feb 27, 2026
@shivbhatia10 shivbhatia10 changed the title Add ceil [datafusion-spark] Add Spark-compatible ceil function Feb 27, 2026
@shivbhatia10 shivbhatia10 marked this pull request as ready for review February 27, 2026 13:35
Copy link
Copy Markdown
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @shivbhatia10 it would be nice having Spark .slt tests covering edge cases and datatypes as well as commenting in file body what is the difference between Spark and current DF implementation

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Feb 28, 2026
@shivbhatia10
Copy link
Copy Markdown
Contributor Author

Hey @comphead, I've added some .slt tests and uncommented a few existing ones that seem have been there as placeholders for when we got this implementation, and I expanded on the comment explaining the differences between this and the DataFusion ceil function.

@comphead
Copy link
Copy Markdown
Contributor

comphead commented Mar 2, 2026

Thanks @shivbhatia10 would you mind providing an example where DF ceil and Spark ceil return diff result? I'm still struggling to understand the difference between them

Comment thread datafusion/spark/src/function/math/ceil.rs
@coderfender
Copy link
Copy Markdown
Contributor

Taking a look @shivbhatia10

Comment thread datafusion/spark/src/function/math/ceil.rs Outdated
Comment thread datafusion/spark/src/function/math/ceil.rs Outdated
Comment thread datafusion/spark/src/function/math/ceil.rs Outdated
Comment thread datafusion/spark/src/function/math/ceil.rs Outdated
Comment thread datafusion/spark/src/function/math/ceil.rs Outdated
Comment thread datafusion/spark/src/function/math/ceil.rs
Comment thread datafusion/spark/src/function/math/ceil.rs Outdated
Comment thread datafusion/spark/src/function/math/ceil.rs Outdated
@shivbhatia10
Copy link
Copy Markdown
Contributor Author

Hi @coderfender, thank you very much for the comprehensive review here, I've tried to address your comments, please let me know if there's anything else I should add.

Ok(DataType::Decimal128(new_p, 0))
} else {
Ok(DataType::Decimal128(*p, *s))
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this makes it clear :)

@@ -81,7 +81,8 @@ impl ScalarUDFImpl for SparkCeil {
}
}
dt if dt.is_integer() => Ok(dt.clone()),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldnt all integer inputs produce a long / int64 output ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's my bad, I checked the Spark source code and I think they always cast any non-decimal type to int64, I've set it up that way now: 5ad7ef6

"Returns the smallest integer not less than expr.",
arg1
));
export_functions!((ceil, "Returns the ceiling of expr.", arg1));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice way to keep brief and concise

ScalarValue::Decimal128(v, p, s) if *s > 0 => {
let div = 10_i128.pow(*s as u32);
let div = 10_i128.pow_wrapping(*s as u32);
let new_p = ((*p as i64) - (*s as i64) + 1).clamp(1, 38) as u8;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a lot of duplication in the code between scalar and array inputs. Can we create simple functions and inline them to not repeat ourselves ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have tried to extract out a few methods between both paths: 9b23318

Copy link
Copy Markdown
Contributor

@coderfender coderfender left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there . Left some minor comments to improve quality and a potential datatype mismatch

query I
SELECT ceil(5::int);
----
5
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shivbhatia10 , we might need to test the output data type along with the value in the SLT files ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shivbhatia10
Copy link
Copy Markdown
Contributor Author

Hey @coderfender sorry for the delay here, I've tried to respond to your comments, let me know if there's anything else that would be good to change

Copy link
Copy Markdown
Contributor

@coderfender coderfender left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with minor suggestion . Thank you @shivbhatia10

}
}
DataType::Float32 | DataType::Float64 => Ok(DataType::Int64),
dt if dt.is_integer() => Ok(DataType::Int64),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we simplify line 79 and 80? Both float and int return Int64 so why two match arms ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had this earlier but thought using the matches! macro was less clean, but happy to unify these arms: 174833b

Comment thread datafusion/spark/src/function/math/ceil.rs Outdated
Copy link
Copy Markdown
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @shivbhatia10 I took the liberty to refer to a ticket to support optional scale

I think we super close, WDYT of adding

fn aliases(&self)

? I think we can also cover https://spark.apache.org/docs/latest/api/sql/#ceiling

As per SparkCode it is an alias

    expressionBuilder("ceil", CeilExpressionBuilder),
    expressionBuilder("ceiling", CeilExpressionBuilder, true),

Comment thread datafusion/spark/src/function/math/ceil.rs Outdated
@shivbhatia10
Copy link
Copy Markdown
Contributor Author

Thank you @coderfender and @comphead - I agree that we should add "ceiling" as an alias, I've fixed that

Copy link
Copy Markdown
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @shivbhatia10

@comphead comphead added this pull request to the merge queue Apr 13, 2026
Merged via the queue into apache:main with commit 29c5dd5 Apr 13, 2026
31 checks passed
coderfender pushed a commit to coderfender/datafusion that referenced this pull request Apr 14, 2026
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

Part of apache#15914 and
apache/datafusion-comet#1704

I also just noticed apache#15916
butI believe work has been stale on this one.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

Helping to continue adding Spark compatible expressions to
datafusion-spark.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

Add new `ceil` function.

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

Yes, unit tests.

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

No.

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

---------

Co-authored-by: Shiv Bhatia <sbhatia@palantir.com>
Co-authored-by: Oleks V <comphead@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

spark sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants