gen.forward(llm, input) (and llm.chat(...)) hangs forever when using any Gemini 3 family model on Vertex AI with the default streaming mode. The Vertex endpoint responds with HTTP 200 and produces bytes, but axllm's streaming parser never resolves. Disabling streaming via config: { stream: false } fixes it immediately.
Gemini 2.x models on the same Vertex provider, project, region, auth, and axllm instance work fine with streaming.
Environment
- @ax-llm/ax: 19.0.45
- Runtime: Node.js 24
- Provider: ai({ name: 'google-gemini', projectId, region: 'global', apiKey: () => googleAuthToken })
- Auth: google-auth-library via ADC, scope https://www.googleapis.com/auth/cloud-platform
- Affected models (all hang):
- AxAIGoogleGeminiModel.Gemini3Flash ("gemini-3-flash-preview")
- AxAIGoogleGeminiModel.Gemini3FlashLite ("gemini-3.1-flash-lite-preview")
- AxAIGoogleGeminiModel.Gemini3Pro / Gemini31Pro ("gemini-3.1-pro-preview")
- Working controls (same auth, same code path): gemini-2.0-flash-001, gemini-2.5-pro, gemini-2.5-flash
Reproduction
import { ai, ax } from '@ax-llm/ax';
const llm = ai({
name: 'google-gemini',
apiKey: async () => (await googleAuthClient.getAccessToken()).token!,
projectId: process.env.VERTEX_PROJECT!,
region: 'global',
config: { model: 'gemini-3-flash-preview' },
// options: { debug: true, timeout: 30_000 }, // forces a visible timeout
});
const gen = ax('question:string -> answer:string');
const result = await gen.forward(llm, { question: 'Say hi in one word.' });
console.log(result);
// → hangs forever; never resolves, never throws
Switching 'gemini-3-flash-preview' to 'gemini-2.5-pro' (everything else identical) returns within seconds.
Workaround
config: { model: 'gemini-3-flash-preview', stream: false },
Works immediately. Output shape is identical; only streaming is disabled.
What's not the issue
Ruled out externally to axllm:
- Not auth / credentials. Same GoogleAuth token, same service account works with 2.x models in streaming mode.
- Not model availability. Direct curl to
https://aiplatform.googleapis.com/v1/projects/$PROJECT/locations/global/publishers/google/models/gemini-3-flash-preview:generateContent returns 200 with a valid response. Same for the v1beta1 path.
- Not region routing. global region works for 2.x on axllm; same configuration hangs for 3.x.
- Not a network/socket hang at the connection level. With debug: true + timeout: 30_000, the request does send and receive bytes; the timeout fires inside the streaming parse loop, not at the HTTP layer.
Suspected cause
SSE terminator or chunk-framing mismatch between axllm's streaming response parser and the Gemini 3 preview endpoint's SSE format. The streaming parser likely awaits a delimiter (e.g. data: [DONE], a specific content-block-stop event, or a final empty chunk) that the 3.x preview endpoint either omits or formats differently from 2.x.
Suggested areas to check in the Google Gemini provider implementation:
- src/ax/ai/google-gemini/ — streaming decoder / chunk reassembly
- Any hardcoded end-of-stream sentinel specific to 2.x response shape
- Handling of new Gemini 3 "thinking" blocks (3.x introduced extended thinking content parts); if the parser waits for a thinking-close that the API doesn't emit in stream mode, the promise never resolves
Happy to help
gen.forward(llm, input)(andllm.chat(...)) hangs forever when using any Gemini 3 family model on Vertex AI with the default streaming mode. The Vertex endpoint responds with HTTP 200 and produces bytes, but axllm's streaming parser never resolves. Disabling streaming viaconfig: { stream: false }fixes it immediately.Gemini 2.x models on the same Vertex provider, project, region, auth, and axllm instance work fine with streaming.
Environment
Reproduction
Switching 'gemini-3-flash-preview' to 'gemini-2.5-pro' (everything else identical) returns within seconds.
Workaround
Works immediately. Output shape is identical; only streaming is disabled.
What's not the issue
Ruled out externally to axllm:
https://aiplatform.googleapis.com/v1/projects/$PROJECT/locations/global/publishers/google/models/gemini-3-flash-preview:generateContentreturns 200 with a valid response. Same for the v1beta1 path.Suspected cause
SSE terminator or chunk-framing mismatch between axllm's streaming response parser and the Gemini 3 preview endpoint's SSE format. The streaming parser likely awaits a delimiter (e.g. data: [DONE], a specific content-block-stop event, or a final empty chunk) that the 3.x preview endpoint either omits or formats differently from 2.x.
Suggested areas to check in the Google Gemini provider implementation:
Happy to help