Add benchmark_runner for sql_benchmarks with help and list commands by Omega359 · Pull Request #22001 · apache/datafusion

Omega359 · 2026-05-03T14:45:16Z

Which issue does this PR close?

CLI (non criterion) interface for sql runner #21937 (partial)

Rationale for this change

This adds a standalone benchmark runner CLI for discovering benchmark suites and inspecting available commands. The runner provides a foundation for invoking benchmark workflows through a structured interface instead of relying only on shell scripts.

This is the first of 3 PR's to complete this feature.

What changes are included in this PR?

Adds the benchmark_runner binary and supporting module structure.
Adds CLI support for help and suite listing commands.
Adds suite discovery/loading support and output formatting helpers.
Adds a TPCH suite manifest for benchmark runner discovery.
Wires the new benchmark runner code into the benchmarks crate.

Are these changes tested?

Yes.

Are there any user-facing changes?

Yes. This adds a new benchmark_runner benchmark CLI with help and list functionality.

Use of AI

This code was partially written by Codex but edited, reviewed and tested by @Omega359

alamb

Thank you @Omega359

I think I am missing some of the larger plan for this code. My hope was for a binary that could run the same benchmark setup as the criterion runner (e.g. the content of benchmarks/sql_benchmarks ) rather than a new set of definitions

So I was expecting to see some of the logic in the criterion benchmark runner, for example load_benchmarks, be ported out and reused:

datafusion/benchmarks/benches/sql.rs

Lines 264 to 279 in 2b95cde

    
           /// Loads all benchmark files in the `sql_benchmarks` directory. 
        
           /// For each file ending with `.benchmark` it creates a new 
        
           /// `SqlBenchmark` instance. 
        
           async fn load_benchmarks( 
        
               args: &EnvParser, 
        
               ctx: &SessionContext, 
        
               path: &str, 
        
           ) -> Result<BTreeMap<String, Vec<SqlBenchmark>>> { 
        
               let mut benches = BTreeMap::new(); 
        
               let mut paths = Vec::new(); 
        
               list_files(path, &mut |path: &str| { 
        
                   if path.ends_with(".benchmark") { 
        
                       paths.push(path.to_string()); 
        
                   } 
        
               });

However, this PR seems to make a separate parallel path for discovering benchmarks 🤔

Documentation

I think we should document how this work, for example https://github.com/apache/datafusion/tree/main/benchmarks#running-the-benchmarks

Local Testing

I tested it out locally and it works as advertised

$ cargo run --profile=profiling --bin benchmark_runner -- --help
    Finished `profiling` profile [optimized + debuginfo] target(s) in 0.55s
     Running `target/profiling/benchmark_runner --help`
Inspect DataFusion SQL benchmark suites.

Usage: benchmark_runner [COMMAND]

Commands:
  help  Print help
  list  List SQL benchmark suites

Options:
  -h, --help  Print help

$ cargo run --profile=profiling --bin benchmark_runner -- list
    Finished `profiling` profile [optimized + debuginfo] target(s) in 0.20s
     Running `target/profiling/benchmark_runner list`
tpch
  description: TPC-H SQL benchmarks
  query ids: 1-22 discovered under benchmarks/sql_benchmarks/tpch as qNN.benchmark
  options:
    -f, --format <VALUE>  parquet (default), csv, mem
    -sf, --scale-factor <VALUE>  1 (default), 10

alamb · 2026-05-04T18:55:11Z

 mimalloc_extended = ["libmimalloc-sys/extended"]

 [dependencies]
+anstream = "1.0"


I double checked and it seems like anstream is already a dependency of clap_builder, so this is not not a (net) new dependency

Coloured output ftw :) At least locally, depends on your terminal I think.

alamb · 2026-05-04T19:02:24Z

@@ -0,0 +1,16 @@
+name = "tpch"


Why is this needed? It seems somewhat redundant with SqlBenchmark, yes?

datafusion/benchmarks/src/sql_benchmark.rs

Lines 45 to 70 in dcd364a

pub struct SqlBenchmark {

/// Human‑readable name of the benchmark.

name: String,

/// Top‑level group name (derived from the file path or defined in a benchmark).

group: String,

/// Subgroup name, often a logical grouping.

subgroup: String,

/// Full path to the benchmark file.

benchmark_path: PathBuf,

/// Mapping of placeholder keys to concrete values (e.g. `"BENCHMARK_DIR"`).

replacement_mapping: HashMap<String, String>,

/// Expected string that must appear in the physical plan of the queries.

expect: Vec<String>,

/// All SQL queries grouped by directive (`load`, `run`, etc.).

queries: HashMap<QueryDirective, Vec<String>>,

/// Queries whose results are persisted to disk for later comparison.

result_queries: Vec<BenchmarkQuery>,

/// Queries whose results are asserted against an expected table.

assert_queries: Vec<BenchmarkQuery>,

/// Flag indicating whether the benchmark has been fully loaded

is_loaded: bool,

/// Stores the last run results if needed so they can be compared or persisted.

last_results: Option<Vec<RecordBatch>>,

/// echo statements

echo: Vec<String>,

}

The name doesn't really need to be there. The suite file though does for later PR's. The SqlBenchmark is per file, the suite is ... per suite.

Omega359 · 2026-05-04T19:40:55Z

I think I am missing some of the larger plan for this code. My hope was for a binary that could run the same benchmark setup as the criterion runner (e.g. the content of benchmarks/sql_benchmarks ) rather than a new set of definitions

Ah, yes, I really do need to work on explaining what I'm planning and/or doing. I tend to just go ahead and do things and tell others why later :/

Since just using environment variables to pass things into sql benchmarks isn't exactly simple to figure out (need to read docs, typos, etc) and the issue this PR partially resolves requested a better approach I went ahead and wrote a whole runner that can dynamically discover sql benchmarks (via suite files), their options, and provide help, required options, etc. I've just added support for native benchmarks locally as well so benchmark_runner list would show all of the benchmarks, not sure sql ones.

My plan for this is a benchmark runner that replaces both dfbench and the criterion benchmark runner so datafusion has just a single benchmark runner that can run either the sql benchmarks or the native ones with builtin discovery, help and information on suites and individual benchmarks. I've got a full working solution locally but I need to tighten it up (I need to add criterion baseline support for example). I suppose we could just rename the benchmark runner to be dfbench at that point.

In any case, I split up the work into 3 separate branches to make the review easier. This is the first of the 3, the other two add support for info and query subcommands (query only applies to sql benchmarks) and finally the run command. The last one comes with documentation updates but doesn't touch bench.sh nor dfbench. Run supports both the dfbench simple '# of iterations' approach as well as using criterion. Per suite arguments are supported, coming either from the .suite file or the RunOpts for the native benchmark.

If all of these are approved I plan on submitting PRs to update the benchmarks/README.md, the bench.sh to use the new runner, remove or update the sql.rs (cargo bench - update would have it just delegate to the new runner though I haven't yet looked into that) and to remove dfbench.rs.

After all that I plan on submitting PR's for each migration of a native benchmark to be sql based (tpcds, h2o, etc). After that I think the only native benchmark that would be left would be the cancellation one.

I hope that clarifies things a bit.

alamb · 2026-05-04T20:22:22Z

If all of these are approved I plan on submitting PRs to update the benchmarks/README.md, the bench.sh to use the new runner, remove or update the sql.rs (cargo bench - update would have it just delegate to the new runner though I haven't yet looked into that) and to remove dfbench.rs.

Let's do it!

Add benchmark_runner with help and list commands.

a85ec42

Omega359 changed the title ~~Add benchmark_runner with help and list commands~~ Add benchmark_runner for sql_benchmarks with help and list commands May 3, 2026

Fixed test.

18fb4d1

Omega359 marked this pull request as ready for review May 3, 2026 17:35

Merge branch 'main' into benchmark-runner-help-list

d304888

alamb reviewed May 4, 2026

View reviewed changes

alamb approved these changes May 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add benchmark_runner for sql_benchmarks with help and list commands#22001

Add benchmark_runner for sql_benchmarks with help and list commands#22001
Omega359 wants to merge 3 commits intoapache:mainfrom
Omega359:benchmark-runner-help-list

Omega359 commented May 3, 2026 •

edited

Loading

Uh oh!

alamb left a comment

Uh oh!

alamb May 4, 2026

Uh oh!

Omega359 May 4, 2026

Uh oh!

alamb May 4, 2026

Uh oh!

Omega359 May 4, 2026

Uh oh!

Omega359 commented May 4, 2026

Uh oh!

alamb commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	/// Loads all benchmark files in the `sql_benchmarks` directory.
	/// For each file ending with `.benchmark` it creates a new
	/// `SqlBenchmark` instance.
	async fn load_benchmarks(
	args: &EnvParser,
	ctx: &SessionContext,
	path: &str,
	) -> Result<BTreeMap<String, Vec<SqlBenchmark>>> {
	let mut benches = BTreeMap::new();
	let mut paths = Vec::new();

	list_files(path, &mut \|path: &str\| {
	if path.ends_with(".benchmark") {
	paths.push(path.to_string());
	}
	});

	pub struct SqlBenchmark {
	/// Human‑readable name of the benchmark.
	name: String,
	/// Top‑level group name (derived from the file path or defined in a benchmark).
	group: String,
	/// Subgroup name, often a logical grouping.
	subgroup: String,
	/// Full path to the benchmark file.
	benchmark_path: PathBuf,
	/// Mapping of placeholder keys to concrete values (e.g. `"BENCHMARK_DIR"`).
	replacement_mapping: HashMap<String, String>,
	/// Expected string that must appear in the physical plan of the queries.
	expect: Vec<String>,
	/// All SQL queries grouped by directive (`load`, `run`, etc.).
	queries: HashMap<QueryDirective, Vec<String>>,
	/// Queries whose results are persisted to disk for later comparison.
	result_queries: Vec<BenchmarkQuery>,
	/// Queries whose results are asserted against an expected table.
	assert_queries: Vec<BenchmarkQuery>,
	/// Flag indicating whether the benchmark has been fully loaded
	is_loaded: bool,
	/// Stores the last run results if needed so they can be compared or persisted.
	last_results: Option<Vec<RecordBatch>>,
	/// echo statements
	echo: Vec<String>,
	}

Conversation

Omega359 commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Use of AI

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Documentation

Local Testing

Uh oh!

alamb May 4, 2026

Choose a reason for hiding this comment

Uh oh!

Omega359 May 4, 2026

Choose a reason for hiding this comment

Uh oh!

alamb May 4, 2026

Choose a reason for hiding this comment

Uh oh!

Omega359 May 4, 2026

Choose a reason for hiding this comment

Uh oh!

Omega359 commented May 4, 2026

Uh oh!

alamb commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Omega359 commented May 3, 2026 •

edited

Loading