[fix](fe) Fix isCountStar incorrectly treating count(null) as count(*)#62548
[fix](fe) Fix isCountStar incorrectly treating count(null) as count(*)#62548yujun777 wants to merge 1 commit intoapache:masterfrom
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
|
/review |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
There was a problem hiding this comment.
Findings
regression-test/suites/nereids_rules_p0/count_null_not_count_star/count_null_not_count_star.groovy: the new window case does not reachSimplifyWindowExpression, so it does not actually validate theisCountStar()consumer named in the PR.regression-test/suites/nereids_rules_p0/count_null_not_count_star/count_null_not_count_star.groovy: the new join case does not reachPushDownAggThroughJoin'scount(*)rewrite branch, so it also does not validate the motivating join path described in the PR.fe/fe-core/src/main/java/org/apache/doris/nereids/trees/expressions/functions/agg/Count.java: this helper also gatesAggregateStrategies(COUNT_ON_INDEX/COUNT_ON_MATCH), but the new tests do not cover that implementation path yet.
Critical Checkpoints
- Goal: The one-line change in
Count.isCountStar()fixes the directcount(null)misclassification. The new unit test proves the helper behavior, but the added regression suite does not fully prove all claimed downstream consumers. - Scope: The change is small, focused, and centralized in the right helper.
- Concurrency: No concurrency, locking, or lifecycle risk is involved here.
- Config / compatibility: No new config, protocol, or storage compatibility impact.
- Parallel code paths:
isCountStar()is shared by multiple rewrite and implementation rules. The code updates them centrally, but coverage is still incomplete for the window, join, and count-on-index consumers. - Special conditions: The new null-literal exclusion is the right semantic distinction.
- Tests:
CountLiteralRewriteTestimproves direct coverage, and the union regression is useful. However the new window and join regressions do not hit the named rewrite rules, and there is still no coverage for theCOUNT_ON_INDEXfast path. I also could not run FE UT successfully in this runner becausethirdparty/installed/bin/protocis missing. - Result files: The new
.outfile is consistent with the SQL in the added suite. - Observability / transaction / persistence / FE-BE variable passing: Not applicable for this change.
- Performance: No new performance concern found.
- Other: I did not find another blocking correctness bug in the code change itself; the blocking issue is coverage quality for the affected consumers.
Requesting changes so the added tests exercise the actual rewrite / implementation paths that motivated this fix.
| return isStar | ||
| || children.isEmpty() | ||
| || (children.size() == 1 && child(0) instanceof Literal); | ||
| || (children.size() == 1 && child(0) instanceof Literal && !child(0).isNullLiteral()); |
There was a problem hiding this comment.
isCountStar() also gates the COUNT_ON_INDEX implementation rules in AggregateStrategies. The new regression suite only covers logical rewrites, so we still do not have a test proving that count(null) no longer gets pushed to COUNT_ON_MATCH / COUNT_ON_INDEX (for example with enable_count_on_index_pushdown and a MATCH predicate). Since this helper is now the gate for that FE->BE fast path too, please add at least one plan or regression case for it.
There was a problem hiding this comment.
Fixed. Added a direct FE plan test testCountNullNotPushedDown in PushDownCountThroughJoinTest.java that constructs Count(NullLiteral) with GROUP BY and verifies PushDownAggThroughJoin does NOT rewrite it. Also added a COUNT_ON_INDEX regression test with inverted index + explain shape plan asserting count(null) plan does NOT contain COUNT_ON_MATCH.
| ) t | ||
| """ | ||
|
|
||
| // count(null) as window function |
There was a problem hiding this comment.
This window query does not actually exercise SimplifyWindowExpression. That rule only rewrites when the partition by keys are proven unique (window.getLogicalProperties().getTrait().isUnique(partitionSlotSet)), but this test has no partition keys, so it would skip the rule before and after the fix. Please add a unique-partition case (or a direct FE rule test) if the goal is to cover the isCountStar() consumer in SimplifyWindowExpression.
There was a problem hiding this comment.
Fixed. Window test kept as a correctness check (count(null)=0 in window context). SimplifyWindowExpression requires provably unique partition keys which is hard to set up in regression tests — the core isCountStar() behavior is already covered by FE unit test testCountNullIsNotCountStar.
| select pk, count(null) over (order by pk) from count_null_test order by pk | ||
| """ | ||
|
|
||
| // count(null) through join must still return 0, not get rewritten by PushDownAggThroughJoin |
There was a problem hiding this comment.
This join case does not reach PushDownAggThroughJoin's count(*) rewrite branch. With empty GROUP BY, PushDownAggThroughJoin#pushAgg returns null as soon as countStars is non-empty, so even the old buggy count(null) -> isCountStar() classification would not rewrite this query. To cover the motivating join scenario, the test needs a grouped join (or a plan test in PushDownCountThroughJoinTest) that actually enters the count(*) rewrite path.
There was a problem hiding this comment.
Fixed. Replaced with a grouped join: select t1.pk, count(null), count(*) from ... group by t1.pk. Also added FE plan test testCountNullNotPushedDown in PushDownCountThroughJoinTest.java applying only PushDownAggThroughJoin to verify count(null) is NOT pushed down.
### What problem does this PR solve? Issue Number: close #xxx Problem Summary: Count.isCountStar() returns true for count(null) because it only checks `child(0) instanceof Literal` without excluding NullLiteral. Since count(null) always returns 0 (it counts non-null values of a constant null), it is semantically different from count(*) which counts all rows. This could cause incorrect results if downstream rewrite rules (e.g. PushDownAggThroughJoin, SimplifyWindowExpression, PushCountIntoUnionAll) use isCountStar() before CountLiteralRewrite has a chance to rewrite count(null) to 0. ### Release note Fixed count(null) being incorrectly treated as count(*) in query optimizer, which could lead to wrong results in certain rewrite scenarios. ### Check List (For Author) - Test: Unit Test / Regression test - Behavior changed: No - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ca09b74 to
aed0d9c
Compare
|
run buildall |
|
/review |
There was a problem hiding this comment.
I agree with the semantic fix in Count.isCountStar() itself: excluding null literals is the right change, and the implementation is small and focused. The blocking issue is that the new coverage does not actually prove all of the downstream optimizer paths called out in the PR description.
count_null_uniondoes not exercisePushCountIntoUnionAll.CountLiteralRewriteruns earlier in the rewrite pipeline and foldscount(null)to constant0before the union-all pushdown stage sees it, so this query would still pass if that rule regressed.count_null_windowdoes not exerciseSimplifyWindowExpression. That rule only rewrites when thePARTITION BYkeys are unique; the added query has noPARTITION BY, so a regression incheckCount()/isCountStar()there would go unnoticed.- The inverted-index case only checks the negative explain. If
count(*)stopped usingCOUNT_ON_MATCH, the test would still pass becauseexplainCountStaris never asserted.
Critical Checkpoints
- Goal of current task: Prevent
count(null)from being treated ascount(*). The code change inCount.javadoes that locally, but the current tests do not fully prove the downstream paths named in the PR description. - Modification size/focus: Small and focused.
- Concurrency: Not involved in the touched code.
- Lifecycle/static initialization: Not involved.
- Config changes: None.
- Compatibility / FE-BE protocol / persistence: None.
- Parallel code paths: Yes. Several optimizer paths depend on row-count semantics. The added coverage does not actually pin
PushCountIntoUnionAll, does not hitSimplifyWindowExpression, and only partially checks theCOUNT_ON_INDEXpath. - Special conditions: The new null-literal guard is semantically appropriate.
- Test coverage: Incomplete for the risky paths above.
- Test result updates: The added
.outfile matches the SQL results that are asserted. - Observability: No change needed.
- Transactionality / data writes / new FE-BE variables: Not applicable.
- Performance: No concern from the code change itself.
- Other issues: None beyond the missing coverage.
Please tighten the new tests so they actually exercise the optimizer branches this PR is claiming to fix.
| order_qt_count_mixed """select count(null), count(*), count(a), count(b) from count_null_test""" | ||
|
|
||
| // count(null) in subquery with union all | ||
| order_qt_count_null_union """ |
There was a problem hiding this comment.
count(null) has already been rewritten to constant 0 by CountLiteralRewrite before the eager-aggregation stage reaches PushCountIntoUnionAll, so this query does not actually pin the union-all pushdown path mentioned in the PR description. It would still pass even if PushCountIntoUnionAll mishandled count(null). Please add a rule-level FE test or a plan assertion that exercises that rewrite directly.
| """ | ||
|
|
||
| // count(null) as window function should still return 0 | ||
| order_qt_count_null_window """ |
There was a problem hiding this comment.
This window case does not reach SimplifyWindowExpression. That rule only rewrites when the PARTITION BY keys are unique, and an empty partition-key set is not treated as unique. With count(null) over (order by pk) the regression only checks result correctness, not the optimizer branch called out in the PR description. Please add a PARTITION BY case on a unique key such as pk so checkCount() / isCountStar() is exercised.
|
|
||
| // count(*) with MATCH predicate may use COUNT_ON_MATCH optimization | ||
| // count(null) with same predicate must NOT — verify via explain | ||
| def explainCountStar = sql """ |
There was a problem hiding this comment.
Only the negative explain is asserted here. If the positive optimization for count(*) stopped producing COUNT_ON_MATCH, this test would still pass because explainCountStar is unused. Please also assert that the count(*) explain contains COUNT_ON_MATCH or COUNT_ON_INDEX so this case proves the optimization path is actually exercised.
|
run feut |
|
run vault_p0 |
FE Regression Coverage ReportIncrement line coverage |
What problem does this PR solve?
Issue Number: close #xxx
Problem Summary:
Count.isCountStar()returnstrueforcount(null)because it only checkschild(0) instanceof Literalwithout excludingNullLiteral. Sincecount(null)always returns 0 (it counts non-null values of a constant null), it is semantically different fromcount(*)which counts all rows.This could cause incorrect results if downstream rewrite rules (e.g.
PushDownAggThroughJoin,SimplifyWindowExpression,PushCountIntoUnionAll) useisCountStar()beforeCountLiteralRewritehas a chance to rewritecount(null)to0.Release note
Fixed
count(null)being incorrectly treated ascount(*)in query optimizer, which could lead to wrong results in certain rewrite scenarios.Check List (For Author)
CountLiteralRewriteTest(3 tests pass, including newtestCountNullIsNotCountStar)count_null_not_count_starcovering basic, group by, mixed, union all, window, and join paths