perf : Optimize count distinct using bitmaps instead of hashsets for smaller datatypes#21456
Conversation
|
benchmark results : It seems like we are 25x faster for u16 bitmap based accumulators (or I am sleepy :) ) |
|
I think we can do the same for 16 bit types, it is just 65_536 bytes 8192 if we use a bitmap. |
|
Oh wait, you're already doing that :) |
|
Query 0 in clickbench_extended dataset (which uses (Other queries are faster but I believe that is more around variance ) |
|
cc : @neilconway , @alamb , @martin-g . Please take a look whenever you get a chance |
This comment has been minimized.
This comment has been minimized.
alamb
left a comment
There was a problem hiding this comment.
This looks like a great idea. Thank you @coderfender
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
93acd98 to
7e67e2e
Compare
|
run benchmark count_distinct |
|
run benchmark clickbench |
|
🤖 Criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing optimize_count_distinct (a13bcaa) to fbdf770 (merge-base) diff File an issue against this benchmark runner |
|
Benchmark for this request failed. Last 20 lines of output: Click to expandFile an issue against this benchmark runner |
|
🤖 Criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing optimize_count_distinct (a13bcaa) to fbdf770 (merge-base) diff File an issue against this benchmark runner |
|
Benchmark for this request failed. Last 20 lines of output: Click to expandFile an issue against this benchmark runner |
…inct' into optimize_count_distinct
|
Hi @coderfender, thanks for the request (#21456 (comment)). Only whitelisted users can trigger benchmarks. Allowed users: Dandandan, Fokko, Jefffrey, Omega359, adriangb, alamb, asubiotto, brunal, buraksenn, cetra3, codephage2020, comphead, erenavsarogullari, etseidl, friendlymatthew, gabotechs, geoffreyclaude, grtlr, haohuaijin, jonathanc-n, kevinjqliu, klion26, kosiew, kumarUjjawal, kunalsinghdadhwal, liamzwbao, mbutrovich, mzabaluev, neilconway, rluvaton, sdf-jkl, timsaucer, xudong963, zhuqi-lucas. File an issue against this benchmark runner |
|
run benchmark count_distinct |
|
🤖 Criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing optimize_count_distinct (554f60c) to eaf0a41 (merge-base) diff File an issue against this benchmark runner |
|
🤖 Criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
|
My suspicion is that the match arm's bloat is causing regression . Trying out options to reduce the CPU cache pressure to potentially reduce code bloat / match arm bloat and see if that removes the regression |
|
Pushed a commit to move out bitmap based accumulators to a separate function with a |
|
run benchmark count_distinct |
|
🤖 Criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing optimize_count_distinct (87a9af8) to eaf0a41 (merge-base) diff File an issue against this benchmark runner |
|
🤖 Criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
|
@alamb, @Dandandan the regression with i64 now seems to be fixed (and slightly faster) with moving bitmap accumulations inside function call with compiler hint. Please take a look whenever you get a chance |
|
@martin-g , the null checks pushed up seem to be providing mixed results . Let me try and see if there are are opportunities to avoid super minor regressions |
|
I get this on the latest version of this PR (local run on my mac)- |
|
Thank you for the approval @Dandandan :) |
|
Thank you for the approval @alamb |
|
Thank you for the contribution @coderfender -- this is a nice one |
…smaller datatypes (apache#21456) ## Which issue does this PR close? Remove hashset based accumulators for smaller int data types and use bitmaps. Follow up of : apache#21453 <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes apache#123` indicates that this PR will close issue apache#123. --> - Closes apache#21488 ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Which issue does this PR close?
Remove hashset based accumulators for smaller int data types and use bitmaps. Follow up of : #21453
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?