Skip to content

Add benchmark_runner for sql_benchmarks with help and list commands#22001

Open
Omega359 wants to merge 3 commits intoapache:mainfrom
Omega359:benchmark-runner-help-list
Open

Add benchmark_runner for sql_benchmarks with help and list commands#22001
Omega359 wants to merge 3 commits intoapache:mainfrom
Omega359:benchmark-runner-help-list

Conversation

@Omega359
Copy link
Copy Markdown
Contributor

@Omega359 Omega359 commented May 3, 2026

Which issue does this PR close?

Rationale for this change

This adds a standalone benchmark runner CLI for discovering benchmark suites and inspecting available commands. The runner provides a foundation for invoking benchmark workflows through a structured interface instead of relying only on shell scripts.

This is the first of 3 PR's to complete this feature.

What changes are included in this PR?

  • Adds the benchmark_runner binary and supporting module structure.
  • Adds CLI support for help and suite listing commands.
  • Adds suite discovery/loading support and output formatting helpers.
  • Adds a TPCH suite manifest for benchmark runner discovery.
  • Wires the new benchmark runner code into the benchmarks crate.

Are these changes tested?

Yes.

Are there any user-facing changes?

Yes. This adds a new benchmark_runner benchmark CLI with help and list functionality.

Use of AI

This code was partially written by Codex but edited, reviewed and tested by @Omega359

@Omega359 Omega359 changed the title Add benchmark_runner with help and list commands Add benchmark_runner for sql_benchmarks with help and list commands May 3, 2026
@Omega359 Omega359 marked this pull request as ready for review May 3, 2026 17:35
Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Omega359

I think I am missing some of the larger plan for this code. My hope was for a binary that could run the same benchmark setup as the criterion runner (e.g. the content of benchmarks/sql_benchmarks ) rather than a new set of definitions

So I was expecting to see some of the logic in the criterion benchmark runner, for example load_benchmarks, be ported out and reused:

/// Loads all benchmark files in the `sql_benchmarks` directory.
/// For each file ending with `.benchmark` it creates a new
/// `SqlBenchmark` instance.
async fn load_benchmarks(
args: &EnvParser,
ctx: &SessionContext,
path: &str,
) -> Result<BTreeMap<String, Vec<SqlBenchmark>>> {
let mut benches = BTreeMap::new();
let mut paths = Vec::new();
list_files(path, &mut |path: &str| {
if path.ends_with(".benchmark") {
paths.push(path.to_string());
}
});

However, this PR seems to make a separate parallel path for discovering benchmarks 🤔

Documentation

I think we should document how this work, for example https://github.com/apache/datafusion/tree/main/benchmarks#running-the-benchmarks

Local Testing

I tested it out locally and it works as advertised

$ cargo run --profile=profiling --bin benchmark_runner -- --help
    Finished `profiling` profile [optimized + debuginfo] target(s) in 0.55s
     Running `target/profiling/benchmark_runner --help`
Inspect DataFusion SQL benchmark suites.

Usage: benchmark_runner [COMMAND]

Commands:
  help  Print help
  list  List SQL benchmark suites

Options:
  -h, --help  Print help
$ cargo run --profile=profiling --bin benchmark_runner -- list
    Finished `profiling` profile [optimized + debuginfo] target(s) in 0.20s
     Running `target/profiling/benchmark_runner list`
tpch
  description: TPC-H SQL benchmarks
  query ids: 1-22 discovered under benchmarks/sql_benchmarks/tpch as qNN.benchmark
  options:
    -f, --format <VALUE>  parquet (default), csv, mem
    -sf, --scale-factor <VALUE>  1 (default), 10

Comment thread benchmarks/Cargo.toml
mimalloc_extended = ["libmimalloc-sys/extended"]

[dependencies]
anstream = "1.0"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I double checked and it seems like anstream is already a dependency of clap_builder, so this is not not a (net) new dependency

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coloured output ftw :) At least locally, depends on your terminal I think.

@@ -0,0 +1,16 @@
name = "tpch"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed? It seems somewhat redundant with SqlBenchmark, yes?

pub struct SqlBenchmark {
/// Human‑readable name of the benchmark.
name: String,
/// Top‑level group name (derived from the file path or defined in a benchmark).
group: String,
/// Subgroup name, often a logical grouping.
subgroup: String,
/// Full path to the benchmark file.
benchmark_path: PathBuf,
/// Mapping of placeholder keys to concrete values (e.g. `"BENCHMARK_DIR"`).
replacement_mapping: HashMap<String, String>,
/// Expected string that must appear in the physical plan of the queries.
expect: Vec<String>,
/// All SQL queries grouped by directive (`load`, `run`, etc.).
queries: HashMap<QueryDirective, Vec<String>>,
/// Queries whose results are persisted to disk for later comparison.
result_queries: Vec<BenchmarkQuery>,
/// Queries whose results are asserted against an expected table.
assert_queries: Vec<BenchmarkQuery>,
/// Flag indicating whether the benchmark has been fully loaded
is_loaded: bool,
/// Stores the last run results if needed so they can be compared or persisted.
last_results: Option<Vec<RecordBatch>>,
/// echo statements
echo: Vec<String>,
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name doesn't really need to be there. The suite file though does for later PR's. The SqlBenchmark is per file, the suite is ... per suite.

@Omega359
Copy link
Copy Markdown
Contributor Author

Omega359 commented May 4, 2026

I think I am missing some of the larger plan for this code. My hope was for a binary that could run the same benchmark setup as the criterion runner (e.g. the content of benchmarks/sql_benchmarks ) rather than a new set of definitions

Ah, yes, I really do need to work on explaining what I'm planning and/or doing. I tend to just go ahead and do things and tell others why later :/

Since just using environment variables to pass things into sql benchmarks isn't exactly simple to figure out (need to read docs, typos, etc) and the issue this PR partially resolves requested a better approach I went ahead and wrote a whole runner that can dynamically discover sql benchmarks (via suite files), their options, and provide help, required options, etc. I've just added support for native benchmarks locally as well so benchmark_runner list would show all of the benchmarks, not sure sql ones.

My plan for this is a benchmark runner that replaces both dfbench and the criterion benchmark runner so datafusion has just a single benchmark runner that can run either the sql benchmarks or the native ones with builtin discovery, help and information on suites and individual benchmarks. I've got a full working solution locally but I need to tighten it up (I need to add criterion baseline support for example). I suppose we could just rename the benchmark runner to be dfbench at that point.

In any case, I split up the work into 3 separate branches to make the review easier. This is the first of the 3, the other two add support for info and query subcommands (query only applies to sql benchmarks) and finally the run command. The last one comes with documentation updates but doesn't touch bench.sh nor dfbench. Run supports both the dfbench simple '# of iterations' approach as well as using criterion. Per suite arguments are supported, coming either from the .suite file or the RunOpts for the native benchmark.

If all of these are approved I plan on submitting PRs to update the benchmarks/README.md, the bench.sh to use the new runner, remove or update the sql.rs (cargo bench - update would have it just delegate to the new runner though I haven't yet looked into that) and to remove dfbench.rs.

After all that I plan on submitting PR's for each migration of a native benchmark to be sql based (tpcds, h2o, etc). After that I think the only native benchmark that would be left would be the cancellation one.

I hope that clarifies things a bit.

@alamb
Copy link
Copy Markdown
Contributor

alamb commented May 4, 2026

If all of these are approved I plan on submitting PRs to update the benchmarks/README.md, the bench.sh to use the new runner, remove or update the sql.rs (cargo bench - update would have it just delegate to the new runner though I haven't yet looked into that) and to remove dfbench.rs.

Let's do it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants