Skip to content

Commit f304134

Browse files
authored
Merge branch 'main' into feat/spark-encode-function
2 parents bcdabd4 + 587f4c0 commit f304134

9 files changed

Lines changed: 838 additions & 325 deletions

File tree

benchmarks/queries/clickbench/README.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -228,6 +228,22 @@ Results look like
228228
Elapsed 30.195 seconds.
229229
```
230230

231+
232+
### Q9-Q12: FIRST_VALUE Aggregation Performance
233+
234+
These queries test the performance of the `FIRST_VALUE` aggregation function with different data types and grouping cardinalities.
235+
236+
| Query | `FIRST_VALUE` Column | Column Type | Group By Column | Group By Type | Number of Groups |
237+
|-------|----------------------|-------------|-----------------|---------------|------------------|
238+
| Q9 | `URL` | `Utf8` | `UserID` | `Int64` | 17,630,976 |
239+
| Q10 | `URL` | `Utf8` | `OS` | `Int16` | 91 |
240+
| Q11 | `WatchID` | `Int64` | `UserID` | `Int64` | 17,630,976 |
241+
| Q12 | `WatchID` | `Int64` | `OS` | `Int16` | 91 |
242+
243+
244+
245+
246+
231247
## Data Notes
232248

233249
Here are some interesting statistics about the data used in the queries
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
-- Must set for ClickBench hits_partitioned dataset. See https://github.com/apache/datafusion/issues/16591
2+
-- set datafusion.execution.parquet.binary_as_string = true
3+
4+
SELECT MAX(len) FROM (
5+
SELECT LENGTH(FIRST_VALUE("URL" ORDER BY "EventTime")) as len
6+
FROM hits
7+
GROUP BY "OS"
8+
);
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
-- Must set for ClickBench hits_partitioned dataset. See https://github.com/apache/datafusion/issues/16591
2+
-- set datafusion.execution.parquet.binary_as_string = true
3+
4+
SELECT MAX(fv) FROM (
5+
SELECT FIRST_VALUE("WatchID" ORDER BY "EventTime") as fv
6+
FROM hits
7+
GROUP BY "UserID"
8+
);
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
-- Must set for ClickBench hits_partitioned dataset. See https://github.com/apache/datafusion/issues/16591
2+
-- set datafusion.execution.parquet.binary_as_string = true
3+
4+
SELECT MAX(fv) FROM (
5+
SELECT FIRST_VALUE("WatchID" ORDER BY "EventTime") as fv
6+
FROM hits
7+
GROUP BY "OS"
8+
);
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
-- Must set for ClickBench hits_partitioned dataset. See https://github.com/apache/datafusion/issues/16591
2+
-- set datafusion.execution.parquet.binary_as_string = true
3+
4+
SELECT "RegionID", "UserAgent", "OS", AVG(to_timestamp("ResponseEndTiming")-to_timestamp("ResponseStartTiming")) as avg_response_time, AVG(to_timestamp("ResponseEndTiming")-to_timestamp("ConnectTiming")) as avg_latency FROM hits GROUP BY "RegionID", "UserAgent", "OS" ORDER BY avg_latency DESC limit 10;
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
-- Must set for ClickBench hits_partitioned dataset. See https://github.com/apache/datafusion/issues/16591
2+
-- set datafusion.execution.parquet.binary_as_string = true
3+
4+
SELECT MAX(len) FROM (
5+
SELECT LENGTH(FIRST_VALUE("URL" ORDER BY "EventTime")) as len
6+
FROM hits
7+
GROUP BY "UserID"
8+
);

0 commit comments

Comments
 (0)