Skip to content

v0.3.0: Step-level assertions and step-to-step data flow testing

Latest

Choose a tag to compare

@simukappu simukappu released this 25 Apr 16:23
· 3 commits to main since this release

agentverify 0.3.0 extends deterministic testing from flat tool-call sequences to the full step structure of multi-step agents: ReAct, Plan-and-Execute, probabilistic refinement loops, and workflow-style agents with cache / validation / conditional branches.

Highlights

  • Step-level assertions: assert_step / assert_step_output / assert_step_uses_result_from verify tool calls, intermediate outputs, and step-to-step data flow on the new Step data model.
  • Step-to-step data flow verification: assert_step_uses_result_from(step=N, depends_on=M) catches the "agent regenerated but ignored the previous step" bug. Works on cassette replay; tolerates numeric/string type coercion and multi-line string serialization.
  • step_probe context manager: mark logical step boundaries in agent code (cache hits, validation, conditional branches) so tests can assert on non-LLM steps. Zero-cost no-op outside of test contexts. Safe to leave in production code.
  • MATCHES(pattern) regex matcher: verify string tool-call arguments against a regex, with the same semantics as ANY.
  • MockLLM + mock_response(...): replay predefined LLM responses in-memory. Test agent routing without a cassette or a real LLM call.
  • assert_latency(result, max_ms=...): response-time SLAs. ExecutionResult.duration_ms is captured automatically by the cassette fixture and MockLLM.
  • Three new example suites exercising step-level and data-flow testing on different agent shapes:

Cassette adapter improvements

  • OpenAI cassette adapter now also intercepts AsyncCompletions.create, so agent frameworks that drive the SDK through AsyncOpenAI internally (including the OpenAI Agents SDK) are recorded and replayed transparently.
  • OpenAI cassette adapter strips openai.omit / openai.NOT_GIVEN sentinels from tools, per-message dicts, and extra parameters before they reach the cassette YAML.
  • OpenAI cassette adapter handles the with_raw_response.create code path used by langchain-openai v1.x.
  • Anthropic cassette adapter flattens SDK content-block objects to plain dicts at record time, so cassettes from ReAct-style agents load cleanly regardless of the installed Anthropic SDK version.

Dependency

  • Minimum pytest>=7 is now declared in the runtime dependency. CI tests against pytest 7, 8, and 9 majors on Python 3.10-3.14.

Breaking changes

  • ExecutionResult.to_dict() now emits steps: [...] instead of tool_calls: [...]. Read-side assertions and the ExecutionResult(tool_calls=[...]) constructor remain backward compatible; from_dict() still accepts the legacy tool_calls key on input.

Install:

pip install agentverify==0.3.0

Full changelog: CHANGELOG.md