Skip to content

Commit 1f59d32

Browse files
adriangbclaude
andauthored
fix: dfbench respects DATAFUSION_RUNTIME_MEMORY_LIMIT env var (#20631)
## Which issue does this PR close? N/A — discovered while running benchmarks with `bench.sh`. ## Rationale for this change When running benchmarks via `bench.sh` / `dfbench`, setting `DATAFUSION_RUNTIME_MEMORY_LIMIT=2G` is ignored for memory pool enforcement. Most `DATAFUSION_*` env vars work because `SessionConfig::from_env()` picks them up, but the memory limit is a special case — it requires constructing a `MemoryPool` in the `RuntimeEnv`, which `dfbench` only did when `--memory-limit` was passed as a CLI flag. ## What changes are included in this PR? In `runtime_env_builder()`, when `self.memory_limit` (CLI flag) is `None`, fall back to reading the `DATAFUSION_RUNTIME_MEMORY_LIMIT` env var using the existing `parse_memory_limit()` function. The CLI flag still takes precedence when provided. ## Are these changes tested? Yes — added `test_runtime_env_builder_reads_env_var` which sets the env var, constructs a `CommonOpt` with no CLI memory limit, and verifies the resulting `RuntimeEnv` has a `Finite(2GB)` memory pool. ## Are there any user-facing changes? `dfbench` now honors the `DATAFUSION_RUNTIME_MEMORY_LIMIT` environment variable as a fallback when `--memory-limit` is not passed on the command line. No breaking changes — existing CLI flag behavior is unchanged. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 538a201 commit 1f59d32

1 file changed

Lines changed: 42 additions & 1 deletion

File tree

benchmarks/src/util/options.rs

Lines changed: 42 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -91,7 +91,15 @@ impl CommonOpt {
9191
pub fn runtime_env_builder(&self) -> Result<RuntimeEnvBuilder> {
9292
let mut rt_builder = RuntimeEnvBuilder::new();
9393
const NUM_TRACKED_CONSUMERS: usize = 5;
94-
if let Some(memory_limit) = self.memory_limit {
94+
// Use CLI --memory-limit if provided, otherwise fall back to
95+
// DATAFUSION_RUNTIME_MEMORY_LIMIT env var
96+
let memory_limit = self.memory_limit.or_else(|| {
97+
std::env::var("DATAFUSION_RUNTIME_MEMORY_LIMIT")
98+
.ok()
99+
.and_then(|val| parse_capacity_limit(&val).ok())
100+
});
101+
102+
if let Some(memory_limit) = memory_limit {
95103
let pool: Arc<dyn MemoryPool> = match self.mem_pool_type.as_str() {
96104
"fair" => Arc::new(TrackConsumersPool::new(
97105
FairSpillPool::new(memory_limit),
@@ -144,6 +152,39 @@ fn parse_capacity_limit(limit: &str) -> Result<usize, String> {
144152
mod tests {
145153
use super::*;
146154

155+
#[test]
156+
fn test_runtime_env_builder_reads_env_var() {
157+
// Set the env var and verify runtime_env_builder picks it up
158+
// when no CLI --memory-limit is provided
159+
let opt = CommonOpt {
160+
iterations: 3,
161+
partitions: None,
162+
batch_size: None,
163+
mem_pool_type: "fair".to_string(),
164+
memory_limit: None,
165+
sort_spill_reservation_bytes: None,
166+
debug: false,
167+
};
168+
169+
// With env var set, builder should succeed and have a memory pool
170+
// SAFETY: This test is single-threaded and the env var is restored after use
171+
unsafe {
172+
std::env::set_var("DATAFUSION_RUNTIME_MEMORY_LIMIT", "2G");
173+
}
174+
let builder = opt.runtime_env_builder().unwrap();
175+
let runtime = builder.build().unwrap();
176+
unsafe {
177+
std::env::remove_var("DATAFUSION_RUNTIME_MEMORY_LIMIT");
178+
}
179+
// A 2G memory pool should be present — verify it reports the correct limit
180+
match runtime.memory_pool.memory_limit() {
181+
datafusion::execution::memory_pool::MemoryLimit::Finite(limit) => {
182+
assert_eq!(limit, 2 * 1024 * 1024 * 1024);
183+
}
184+
_ => panic!("Expected Finite memory limit"),
185+
}
186+
}
187+
147188
#[test]
148189
fn test_parse_capacity_limit_all() {
149190
// Test valid inputs

0 commit comments

Comments
 (0)