fix(agents): preserve text streamed before tool calls in output_key#5616
Open
vipin-v-nair wants to merge 1 commit intogoogle:mainfrom
Open
fix(agents): preserve text streamed before tool calls in output_key#5616vipin-v-nair wants to merge 1 commit intogoogle:mainfrom
vipin-v-nair wants to merge 1 commit intogoogle:mainfrom
Conversation
LlmAgent.__maybe_save_output_to_state only writes state_delta[output_key] on events where Event.is_final_response() returns True. That gate is False for any event carrying a function_call or function_response part, so under StreamingMode.SSE the text on text+function_call events was silently dropped from output_key — only the final tool-free event's text was saved. Production agents using output_key with tools were losing 60-70% of model text, including intros and progress narration. Add a private helper that accumulates non-partial text across the model turn into a local string (per-invocation, scoped to the caller) and writes the running value to state_delta[output_key] on every text-bearing event from this agent. Wire it into _run_async_impl and _run_live_impl alongside the existing __maybe_save_output_to_state call. __maybe_save_output_to_state is unchanged; output_schema behavior is preserved (validate one complete final document, never accumulate partial JSON). Fixes google#5590
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #5590
Problem
When an
LlmAgentis configured withoutput_keyand runs inStreamingMode.SSEwhile making tool calls, only the model text on thevery last tool-free event is written to
state[output_key]. Any textthe model emits before or between tool calls is silently dropped, even
though it streams to the user. Reporters measure 60–70% retention loss
in production. A live reproduction against
gemini-2.5-proon Vertexshows the intro and progress narration discarded, with only the
conclusion text reaching
output_key.Root cause
LlmAgent.__maybe_save_output_to_stateonly writesstate_delta[output_key]on events whereEvent.is_final_response()returns True. That gate returns False for any event carrying a
function_callorfunction_responsepart. Under streaming, Geminiemits non-partial events that bundle text with a
function_call—those events fail the gate, so their text is skipped. The post-
aggregation events that follow the tool calls are also rejected (they
contain
function_responseparts). Only the final tool-free eventpasses, and
state_delta[output_key]gets overwritten with just thattext. Every prior text segment from the same model turn is silently
discarded.
Fix
Add a new private helper
LlmAgent.__maybe_accumulate_streaming_outputthat receives the current event and a running accumulator string. It
returns the new accumulator value, and on applicable events also writes
the running value to
state_delta[output_key]. The helper is invokedfrom the call sites in
_run_async_impland_run_live_impl,alongside (and after) the existing
__maybe_save_output_to_statecall.Each call site holds a local
output_accumulator: strthat lives forthe duration of the invocation — no new instance state on the agent.
__maybe_save_output_to_stateis byte-identical to main. The schemapath is preserved unchanged: when
output_schemais set, the helpershort-circuits and the existing single-final-document validation path
runs as before. The function-response-only callback path is also
preserved (the helper short-circuits on empty text).
Tests added
tests/unittests/agents/test_llm_agent_streaming_output.py::test_run_async_accumulates_text_around_tool_cal lsRegression test for Text accumulation issue for output_key #5590. Drives
_run_async_implwith a stubbed_llm_flowthat yields the canned event sequence the streamingflow produces (interleaved partial text, text+function_call,
function_response, more text, final tool-free text), merges each
event's
state_deltainto session state viasession_service.append_event, and asserts onsession.state["final_output"]— the value named in the bugreport. Fails on
mainwithAssertionError: 'Conclusion one. Conclusion two.' == 'Intro one. Intro two.Progress.Conclusion one. Conclusion two.';passes with this fix.
Verification
pre-commit run --all-files— cleanpyink --check --diff --config pyproject.toml src/ tests/— cleanisort --check src/ tests/— clean0 removed
pylintclean on touched files (no new warnings; pre-existingwarnings unchanged)
pytest tests/unittests --ignore=tests/unittests/artifacts/test_artifact_service.py --ignore=tests/unittests/tools/google_api_tool/test_googleapi_to_openapi_converter.py—
5608 passed, 2266 warnings, 6 subtests passed in 436.97sgemini-2.5-proon Vertex confirms allthree text sections (intro + progress + conclusion) now reach
state[output_key]; pre-fix run had only the conclusion.Design notes
I considered an alternative that adds a
PrivateAttraccumulator toLlmAgentand keeps the fix inside__maybe_save_output_to_state. Iwent with the call-site approach because:
LlmAgent's field set unchanged.PrivateAttronLlmAgentwould be the first in the agent classhierarchy — this bug didn't feel like the right place to introduce
that pattern, even though
PrivateAttris already used elsewherein the codebase (
agents/invocation_context.py:221,sessions/session.py:53).The trade-off: the regression test constructs an
InvocationContextdirectly, so it's mildly brittle to future changes in
InvocationContext's required fields. The 14 existing tests intests/unittests/agents/test_llm_agent_output_save.pyare byte-identical to main and continue to pass unchanged.
Out of scope
test_llm_agent_output_save.pyuse name-mangledaccess to internal state (
agent._LlmAgent__maybe_save_output_to_state(event)).The new test deliberately asserts at a higher level
(
_run_async_impl+session.state[output_key]), but the existingtests are untouched.