Skip to content

[SDK] Compaction does not trigger for BYOK Anthropic — model token budget not resolved #1072

@KennyTurtle

Description

@KennyTurtle

Summary

Two related problems with infinite sessions compaction and Anthropic models:

  1. BYOK providers: Compaction never triggers — the CLI can't resolve max_context_window_tokens for BYOK providers, so compaction is silently skipped.
  2. Catalog models: Token budget is hardcoded to 128K for all models (test/harness/replayingCapiProxy.ts:1088), incorrect for Anthropic Claude models which have 200K context windows.

Repro (confirmed 2026-04-13)

SDK v0.2.2, model: claude-sonnet-4-20250514 via BYOK (CAPI relay)

Thresholds: 5% background, 30% blocking
Expected trigger: ~6,400 tokens (5% of 128K) or ~10,000 (5% of 200K)

  Message 1:  input=62       Message 6:  input=6,354
  Message 2:  input=1,047    Message 7:  input=7,817
  Message 3:  input=2,198    Message 8:  input=9,361
  Message 4:  input=3,516    Message 9:  input=10,870
  Message 5:  input=4,847    Message 10: input=12,390

  Total input: 58,462 tokens
  Compaction events: 0  ← NEVER triggered

No session.compaction_start or session.compaction_complete events received, even at 12K+ input tokens per call (far exceeding any 5% threshold).

SDK code references

Where the budget is hardcoded (CLI layer)

File Line What
test/harness/replayingCapiProxy.ts 1088 max_context_window_tokens: 128000 for ALL models

SDK type flow (passes through, no provider logic)

File Line What
python/copilot/generated/rpc.py 182 max_context_window_tokens: float
python/copilot/client.py 332 `max_context_window_tokens: int
go/types.go 823 MaxContextWindowTokens int
nodejs/src/types.ts 1577 max_context_window_tokens: number

Compaction threshold config

File Line What
go/types.go 463 BackgroundCompactionThreshold *float64
python/copilot/session.py 785 background_compaction_threshold field

Impact on Dracarys

  • Long Anthropic sessions will eventually hit 200K context limit and stall — compaction should prevent this
  • For catalog-routed models, compaction wastes ~58K tokens of usable context (triggers at 80% of 128K instead of 80% of 200K)
  • Our PR Expose cwd/Repo information of a session #413 (infinite sessions support) is blocked on this for Anthropic providers

Suggested fix

  1. BYOK: Accept max_context_window_tokens on ProviderConfig. Use provider-type defaults:
    • anthropic → 200,000
    • openai → 128,000
  2. Catalog: Populate from actual model metadata (branch mackinnonbuck/byok-provider-token-limits partially addresses this)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions