Streaming parser hangs indefinitely on Gemini 3.x preview models (Vertex AI)


`gen.forward(llm, input)` (and `llm.chat(...)`) hangs forever when using any Gemini 3 family model on Vertex AI with the default streaming mode. The Vertex endpoint responds with HTTP 200 and produces bytes, but axllm's streaming parser never resolves. Disabling streaming via `config: { stream: false }` fixes it immediately.

Gemini 2.x models on the same Vertex provider, project, region, auth, and axllm instance work fine with streaming.

Environment

  - @ax-llm/ax: 19.0.45
  - Runtime: Node.js 24 
  - Provider: ai({ name: 'google-gemini', projectId, region: 'global', apiKey: () => googleAuthToken })
  - Auth: google-auth-library via ADC, scope https://www.googleapis.com/auth/cloud-platform
  - Affected models (all hang):
    - AxAIGoogleGeminiModel.Gemini3Flash ("gemini-3-flash-preview")
    - AxAIGoogleGeminiModel.Gemini3FlashLite ("gemini-3.1-flash-lite-preview")
    - AxAIGoogleGeminiModel.Gemini3Pro / Gemini31Pro ("gemini-3.1-pro-preview")
  - Working controls (same auth, same code path): gemini-2.0-flash-001, gemini-2.5-pro, gemini-2.5-flash

  Reproduction

```ts
  import { ai, ax } from '@ax-llm/ax';

  const llm = ai({
    name: 'google-gemini',
    apiKey: async () => (await googleAuthClient.getAccessToken()).token!,
    projectId: process.env.VERTEX_PROJECT!,
    region: 'global',
    config: { model: 'gemini-3-flash-preview' },
    // options: { debug: true, timeout: 30_000 },  // forces a visible timeout
  });

  const gen = ax('question:string -> answer:string');
  const result = await gen.forward(llm, { question: 'Say hi in one word.' });
  console.log(result);
  // → hangs forever; never resolves, never throws
```

 Switching 'gemini-3-flash-preview' to 'gemini-2.5-pro' (everything else identical) returns within seconds.

 Workaround

```ts
 config: { model: 'gemini-3-flash-preview', stream: false },
```

 Works immediately. Output shape is identical; only streaming is disabled.

 What's not the issue

  Ruled out externally to axllm:

  - Not auth / credentials. Same GoogleAuth token, same service account works with 2.x models in streaming mode.
  - Not model availability. Direct curl to
  `https://aiplatform.googleapis.com/v1/projects/$PROJECT/locations/global/publishers/google/models/gemini-3-flash-preview:generateContent` returns 200 with a valid response. Same for the v1beta1 path.
  - Not region routing. global region works for 2.x on axllm; same configuration hangs for 3.x.
  - Not a network/socket hang at the connection level. With debug: true + timeout: 30_000, the request does send and receive bytes; the timeout fires inside the streaming parse loop, not at the HTTP layer.

Suspected cause

SSE terminator or chunk-framing mismatch between axllm's streaming response parser and the Gemini 3 preview endpoint's SSE format. The streaming parser likely awaits a delimiter (e.g. data: [DONE], a specific content-block-stop event, or a final empty chunk) that the 3.x preview endpoint either omits or formats differently from 2.x.

Suggested areas to check in the Google Gemini provider implementation:

  - src/ax/ai/google-gemini/ — streaming decoder / chunk reassembly
  - Any hardcoded end-of-stream sentinel specific to 2.x response shape
  - Handling of new Gemini 3 "thinking" blocks (3.x introduced extended thinking content parts); if the parser waits for a thinking-close that the API doesn't emit in stream mode, the promise never resolves

Happy to help


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streaming parser hangs indefinitely on Gemini 3.x preview models (Vertex AI) #511

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Streaming parser hangs indefinitely on Gemini 3.x preview models (Vertex AI) #511

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions