agentverify 0.3.0 extends deterministic testing from flat tool-call sequences to the full step structure of multi-step agents: ReAct, Plan-and-Execute, probabilistic refinement loops, and workflow-style agents with cache / validation / conditional branches.
Highlights
- Step-level assertions:
assert_step/assert_step_output/assert_step_uses_result_fromverify tool calls, intermediate outputs, and step-to-step data flow on the newStepdata model. - Step-to-step data flow verification:
assert_step_uses_result_from(step=N, depends_on=M)catches the "agent regenerated but ignored the previous step" bug. Works on cassette replay; tolerates numeric/string type coercion and multi-line string serialization. step_probecontext manager: mark logical step boundaries in agent code (cache hits, validation, conditional branches) so tests can assert on non-LLM steps. Zero-cost no-op outside of test contexts. Safe to leave in production code.MATCHES(pattern)regex matcher: verify string tool-call arguments against a regex, with the same semantics asANY.MockLLM+mock_response(...): replay predefined LLM responses in-memory. Test agent routing without a cassette or a real LLM call.assert_latency(result, max_ms=...): response-time SLAs.ExecutionResult.duration_msis captured automatically by the cassette fixture andMockLLM.- Three new example suites exercising step-level and data-flow testing on different agent shapes:
openai-agents-llm-as-a-judge: OpenAI Agents SDK LLM-as-a-Judge (probabilistic refinement loop with feedback-chain data flow)langgraph-multi-agent-supervisor: LangGraph research + math handoff with numeric running-total data flowcustom-converter-python-agent: pure-Python Anthropic SDK ReAct with an ~80-line converter reference
Cassette adapter improvements
- OpenAI cassette adapter now also intercepts
AsyncCompletions.create, so agent frameworks that drive the SDK throughAsyncOpenAIinternally (including the OpenAI Agents SDK) are recorded and replayed transparently. - OpenAI cassette adapter strips
openai.omit/openai.NOT_GIVENsentinels fromtools, per-message dicts, and extra parameters before they reach the cassette YAML. - OpenAI cassette adapter handles the
with_raw_response.createcode path used by langchain-openai v1.x. - Anthropic cassette adapter flattens SDK content-block objects to plain dicts at record time, so cassettes from ReAct-style agents load cleanly regardless of the installed Anthropic SDK version.
Dependency
- Minimum
pytest>=7is now declared in the runtime dependency. CI tests against pytest 7, 8, and 9 majors on Python 3.10-3.14.
Breaking changes
ExecutionResult.to_dict()now emitssteps: [...]instead oftool_calls: [...]. Read-side assertions and theExecutionResult(tool_calls=[...])constructor remain backward compatible;from_dict()still accepts the legacytool_callskey on input.
Install:
pip install agentverify==0.3.0
Full changelog: CHANGELOG.md