Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions datafusion/sql/src/expr/function.rs
Original file line number Diff line number Diff line change
Expand Up @@ -542,6 +542,12 @@ impl<S: ContextProvider> SqlToRel<'_, S> {
// accept a WITHIN GROUP clause.
let supports_within_group = fm.supports_within_group_clause();

if supports_within_group && !order_by.is_empty() {
return plan_err!(
"ORDER BY must be specified using WITHIN GROUP for ordered-set aggregate functions"
);
}

if !within_group.is_empty() && !supports_within_group {
return plan_err!(
"WITHIN GROUP is only supported for ordered-set aggregate functions"
Expand Down
8 changes: 8 additions & 0 deletions datafusion/sqllogictest/test_files/aggregate.slt
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,14 @@ CREATE TABLE group_median_table_nullable (
# Error tests
#######

statement error ORDER BY must be specified using WITHIN GROUP
select quantile_cont(col0, 0.75 order by col0)
from values (1, 3), (2, 2), (3, 1) t(col0, col1);

statement error ORDER BY must be specified using WITHIN GROUP
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I double checked that these queries fail on main too:

> select quantile_cont(0.75 order by col0)
from values (1, 3), (2, 2), (3, 1) t(col0, col1);
Error during planning: Function 'percentile_cont' expects 2 arguments but received 1 No function matches the given name and argument types 'percentile_cont(Float64)'. You might need to add explicit type casts.
	Candidate functions:
	percentile_cont(expr: Coercion(TypeSignatureClass::Float, implicit_coercion=ImplicitCoercion([Numeric], default_type=Float64), percentile: Coercion(TypeSignatureClass::Native(LogicalType(Native(Float64), Float64)), implicit_coercion=ImplicitCoercion([Numeric], default_type=Float64))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The query above currently succeeds on main

select quantile_cont(col0, 0.75 order by col0)
from values (1, 3), (2, 2), (3, 1) t(col0, col1);

I was thinking this could be a breaking change, but I wonder how often these (ordered set aggregates) are currently used especially with an order by in the argument list like this; if we want to be fully explicit we can add an entry to the upgrade guide

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I double checked and I agree with you

andrewlamb@Andrews-MacBook-Pro-3:~/Software/influxdb_pro/ent$ datafusion-cli
DataFusion CLI v52.1.0
> select quantile_cont(col0, 0.75 order by col0)
from values (1, 3), (2, 2), (3, 1) t(col0, col1);
+-------------------------------------+
| quantile_cont(t.col0,Float64(0.75)) |
+-------------------------------------+
| 2.5                                 |
+-------------------------------------+
1 row(s) fetched.
Elapsed 0.002 seconds.

In general, rather than disallow this syntax, is there some way we can support both syntaxes? I also have no idea how often they are used but if it is currently supported and produces the correct result, then I suspect at least some users would treat them starting to error as a regression 🤔

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can confirm that this PR does introduce a breaking change. This query works on main:

select quantile_cont(col0, 0.75 order by col0)
from values (1, 3), (2, 2), (3, 1) t(col0, col1);

but now it returns error: ORDER BY must be specified using WITHIN GROUP

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for raising this suggestion - I started working on supporting both syntaxes instead of returning an error.

After digging into it more deeply, it turns out the change is more involved than I initially expected. It’s starting to drift away from the scope of the original refactor.

Given that, I’m wondering if it would make sense to keep this PR focused and open a separate PR dedicated specifically to supporting both syntaxes. I’m happy to continue working on that and share progress once I have a clean and well-tested solution.

Would that approach make sense?

select quantile_cont(0.75 order by col0)
from values (1, 3), (2, 2), (3, 1) t(col0, col1);

statement error DataFusion error: Error during planning: WITHIN GROUP is only supported for ordered-set aggregate functions
SELECT SUM(c2) WITHIN GROUP (ORDER BY c2) FROM aggregate_test_100

Expand Down
Loading