Skip to content

Commit 9eefb7c

Browse files
lyne7-scJefffrey
andauthored
fix: median retract logic for sliding window frames (#21300)
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes #21299. ## Rationale for this change `median` could produce incorrect results for sliding window frames because `MedianAccumulator::retract_batch` could skip values after `swap_remove(i)`. When an element was removed, the last element was moved into index `i`, but the loop still incremented `i`. That meant the swapped-in value was never checked, so not all requested values were retracted. <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ## What changes are included in this PR? - fix `retract_batch` so the index is only advanced when no value is removed - add a slt case for `median` over a sliding `range` window <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? Yes. <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? No. <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com>
1 parent dbe3395 commit 9eefb7c

2 files changed

Lines changed: 18 additions & 1 deletion

File tree

datafusion/functions-aggregate/src/median.rs

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -314,8 +314,9 @@ impl<T: ArrowNumericType> Accumulator for MedianAccumulator<T> {
314314
break;
315315
}
316316
}
317+
} else {
318+
i += 1;
317319
}
318-
i += 1;
319320
}
320321
Ok(())
321322
}

datafusion/sqllogictest/test_files/aggregate.slt

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1223,6 +1223,22 @@ ORDER BY tags, timestamp;
12231223
4 tag2 90 75 80 95
12241224
5 tag2 100 80 80 100
12251225

1226+
# Test median on a sliding RANGE window with repeated values.
1227+
query RR
1228+
SELECT
1229+
x,
1230+
median(x) OVER (
1231+
ORDER BY x
1232+
RANGE BETWEEN 1 PRECEDING AND CURRENT ROW
1233+
) AS median_val
1234+
FROM (VALUES (1.0), (1.0), (2.0), (5.0)) AS t(x)
1235+
ORDER BY x;
1236+
----
1237+
1 1
1238+
1 1
1239+
2 1
1240+
5 5
1241+
12261242
###########
12271243
# Issue #19612: Test that percentile_cont produces correct results
12281244
# in window frame queries. Previously percentile_cont consumed its internal state

0 commit comments

Comments
 (0)