- Step-level assertions for agents making multiple LLM calls per run. New
assert_step,assert_step_output, andassert_step_uses_result_fromverify tool calls, intermediate outputs, and step-to-step data flow. See README "Step-Level Assertions". step_probecontext manager to mark logical step boundaries in agent code, including LLM-free steps like cache hits, state management, and validation. Zero-cost no-op outside of recorder/MockLLM contexts, safe to leave in production code.- Data-flow matching in
assert_step_uses_result_fromtolerates common serialization differences: numeric tool results encoded as strings match their int/float consumers, and multi-line produced strings match consumers that hold them inside a container. MATCHES(pattern)regex matcher for verifying string tool-call arguments against a regex, with the same semantics asANY.MockLLM+mock_response(...)replay predefined LLM responses in-memory. Test agent routing without a cassette or real LLM call.assert_latency(result, max_ms=...)enforces response-time SLAs.ExecutionResult.duration_msis captured automatically by the cassette fixture andMockLLM.- Three new example suites exercising step-level and data-flow testing on different agent shapes: OpenAI Agents SDK LLM-as-a-Judge, LangGraph multi-agent supervisor, and custom-converter-python-agent. See
examples/.
- OpenAI cassette adapter now also intercepts
AsyncCompletions.create, so agent frameworks that drive the SDK throughAsyncOpenAIinternally (including the OpenAI Agents SDK) are recorded and replayed transparently. - OpenAI cassette adapter strips
openai.omit/openai.NOT_GIVENsentinels fromtools, per-message dicts, and extra parameters before they reach the cassette YAML. - OpenAI cassette adapter handles the
with_raw_response.createcode path used by langchain-openai v1.x, unwrappingLegacyAPIResponseon record and re-wrapping the synthesisedChatCompletionon replay. - Anthropic cassette adapter flattens SDK content-block objects to plain dicts at record time, so cassettes recorded from ReAct-style agents load cleanly regardless of the installed Anthropic SDK version.
- Minimum
pytest>=7is now declared in the runtime dependency so pip refuses to install on older pytest. CI tests against pytest 7, 8, and 9 majors on Python 3.10-3.14.
ExecutionResult.to_dict()now emitssteps: [...]instead oftool_calls: [...]. Read-side assertions and theExecutionResult(tool_calls=[...])constructor are backward compatible;from_dict()still accepts the legacytool_callskey on input.
- Built-in framework adapters for Strands Agents, LangChain, LangGraph, and OpenAI Agents SDK. Extract
ExecutionResultdirectly from agent framework outputs without writing a custom converter (from_strands,from_langchain,from_langgraph,from_openai_agents). - Cassette request matching detects stale cassettes by verifying model name and tool names during replay. Enabled by default; raises
CassetteRequestMismatchErrorwith a clear diff on mismatch. - Cassette sanitization automatically redacts API keys and sensitive data when recording. Built-in patterns cover OpenAI, Anthropic, AWS, and Bearer tokens; extendable with custom
SanitizePatternobjects. - Strands Weather Forecaster example: End-to-end example testing the official Strands sample with a pre-recorded Bedrock cassette.
Initial release.
- Tool call assertions:
assert_tool_callswith EXACT, IN_ORDER, and ANY_ORDER modes;ANYwildcard andpartial_argsfor flexible argument matching. - Cost budget assertions:
assert_costenforcesmax_tokensandmax_cost_usdlimits. - Safety guardrails:
assert_no_tool_calldetects forbidden tool invocations. - Final output assertions:
assert_final_outputwithcontains,equals, andmatches(regex). - Batch assertions:
assert_allcollects all failures without stopping at the first. - LLM Cassette Record & Replay: VCR-style recording of LLM API calls for deterministic CI testing. Human-readable YAML / JSON cassettes that you commit to git.
- 5 LLM provider adapters: OpenAI, Amazon Bedrock, Google Gemini, Anthropic, LiteLLM.
- pytest plugin: Auto-registers on install, provides the
cassettefixture and@pytest.mark.agentverifymarker. - Structured error messages: Clear diffs with mismatch position highlighting.