Skip to content
This repository was archived by the owner on May 6, 2026. It is now read-only.

Latest commit

 

History

History
58 lines (43 loc) · 3.71 KB

File metadata and controls

58 lines (43 loc) · 3.71 KB

CUBE Standard — Roadmap

This roadmap reflects current priorities and is updated as the project evolves. Items are roughly ordered by priority within each phase. For detailed proposals, see the design/ folder and the RFC process in CONTRIBUTING.md.

Phase 1 — Alpha Stabilization (current)

Goal: stable core protocol, first wave of cubes, compliance tooling.

  • Core protocol: Tool, Task, Benchmark, Observation, Action
  • cube init / cube test CLI
  • Reference implementation: counter-cube
  • Container backends (Docker, Modal, Daytona)
  • First cubes landing:
    • Web agents: MiniWob ✅, WebArena-Verified ✅ (cube-harness#214), WorkArena ✅
    • Computer use (CUA): OSWorld ✅
    • SWE: SWE-bench Verified + Live ✅, TerminalBench 2 ✅, LiveCodeBench ✅
  • Benchmark metadata schema — BenchmarkMetadata fields: homepage, citation, license, task count, modality (benchmark.py)
  • CUBE Stress Test — compliance checks and latency suite (`cube test cube-name) — nearly complete, see PR #22
  • Unified resource backend — VMBackend / VM abstraction for cloud and local VM provisioning (design/vm_backend.md)
  • Stable v0.1 API — freeze core interfaces, tag release
  • PyPI publication (cube-standard)
  • Published documentation site

Phase 2 — Platform Integrations & Cube Growth

Goal: integrate with major agent frameworks, grow to ~50 cubes.

Phase 3 — Broad Ecosystem

Goal: CUBE becomes the default interoperability layer for agent benchmarks. Exact scope TBD — to be discussed with the community.

  • Large-scale cube registry — community-maintained index of CUBE-compatible benchmarks
  • Cube discovery and install (cube add <benchmark>)
  • Broader platform integrations (beyond Phase 2)
  • Number of cubes: open-ended, driven by community adoption

Phase 3 priorities will be shaped by what the community builds in Phase 2. Join the discussions to help define it.

RFC Process

Have an idea that changes the core protocol? Open a GitHub Discussion or file a PR against design/. See CONTRIBUTING.md for the full process.

How to Influence the Roadmap