fix(extraction): bump EXTRACTION_MAX_TOKENS 4096 → 8192 by moralespanitz · Pull Request #4 · atomicmemory/atomicmemory-core

moralespanitz · 2026-04-29T17:56:15Z

Summary

Extraction LLM was truncating JSON output at ~14 KB during BEAM Sprint 2 CR mini-slice runs on dense 10-turn chunks. Bumping the max_tokens budget from 4096 → 8192 prevents the truncation.

Evidence

Server log during iter 7 (first attempt) before this fix:

[extractFacts] JSON parse failed (Unterminated string in JSON at position 14152 (line 339 column 8)); attempting repair
[extractFacts] JSON parse failed (Unterminated string in JSON at position 14172 ...)
[extractFacts] JSON parse failed (Unterminated string in JSON at position 14284 ...)
[extractFacts] JSON parse failed (Unterminated string in JSON at position 14301 ...)
[extractFacts] JSON parse failed (Unterminated string in JSON at position 14172 ...)
[extractFacts] JSON parse failed (Unterminated string in JSON at position 15084 ...)

Six truncations on one ingest pass. Conv-3 crashed.

After bumping to 8192: zero truncation across iter 7 v3 N=3 full-ingest reruns.

Risk

Marginal cost increase: Anthropic bills only for output tokens actually generated, and only the dense chunks that previously truncated will use more tokens. Most extractions stay well under 4096.

Companion changes (separate PRs)

atomicmemory-benchmarks PR feat(search): EXP-14 — retrieval-side abstention gate #8 lowers harness chunk size from 10 → 5 turn-pairs (defense-in-depth on the input side).

Test plan

Server starts and runs with the change
iter 7 v3 N=3 full-ingest reruns: no JSON parse failures
npx tsc --noEmit clean
npm test
fallow --no-cache

Extraction LLM was truncating JSON output at ~14 KB during BEAM Sprint 2 CR mini-slice runs on dense 10-turn chunks. Server log showed: [extractFacts] JSON parse failed (Unterminated string in JSON at position 14152 ...); attempting repair across 6 chunks of one ingest, causing iter 7 (first attempt) to crash on conv-3. The Anthropic max_tokens budget defaults to 4096 in extraction.ts. Going to 8192 doubles the headroom for JSON output without changing any other behavior. Cost impact is marginal (Anthropic bills only for tokens actually generated; rare for extraction to use the full 8192). Validation: server is running with this change locally; iter 7 v3 N=3 full-ingest reruns succeed without truncation. Companion harness mitigation lowered chunk size from 10 to 5 turn-pairs (in atomicmemory-benchmarks PR #8) to reduce the chance of hitting the limit at all. This server-side bump is defense-in-depth.

Hard-deletes (e.g. resetBySource in repository-write) silently nulled out predecessor_memory_id pointers, breaking backward chain traversal and leaving half-broken chains where some events still reference a deleted ancestor. Change the predecessor FK to ON DELETE CASCADE so the dependent chain node gets removed cleanly when its predecessor goes. Matches the memory_id FK policy on the same table. Schema migration runs an idempotent DROP CONSTRAINT / ADD CONSTRAINT in a DO block because CREATE TABLE IF NOT EXISTS won't update column-level FK definitions on an existing table.

Squashed across four review rounds on PR #18, all of which surfaced after the initial wave of fixes (#1–#11). Each item below maps to a finding in the PR's review threads. Search path - hydrateChainMemories now returns fully-shaped SearchResults via SELECT * + normalizeMemoryRow. The previous projection set similarity: null and omitted source_site / score / summary / observed_at, crashing the buildInjection formatter (`memory.similarity.toFixed(2)`) the first time TLL augmentation actually fired against a populated chain. The `as unknown as SearchResult` cast was hiding it from tsc. (review v2 #1) - Hydration query now uses `unnest($2::uuid[]) WITH ORDINALITY ... ORDER BY req.ord` so the chronological order chainsFor returns through the augmentation pipeline survives. (review v4 #2) - Workspace isolation: hydrateChainMemories filters `m.workspace_id IS NULL` to match the gate behavior performSearch's postProcessResults already applies. Without it, a workspace memory chained from a global memory's entity could surface in a global response. (review v4 #1) - Defensive `relevance: 1.0` on hydrated rows locks in the chain-membership bypass invariant against future filter drift. The augmented rows are appended after applySearchRelevanceFilter today, but `similarity: 0` + `score: 0` would make them load-bearing on `relevance` if any future filter past appendTllAugmentation checked `memory.relevance >= threshold`. Regression test drives performSearch with a high retrievalOptions.relevanceThreshold and confirms the augmented row survives. (review v5 #2) Repository - TLL chain reads (chain, chainEventsForEntities) now derive chronological position and predecessor via `ROW_NUMBER()` and `LAG()` window functions ordered by observation_date ASC (with stored position_in_chain as a deterministic tiebreaker for events sharing an observation_date). The stored predecessor_memory_id and position_in_chain columns become insertion-order audit metadata; the API surface returns chronological ordering. Backfilled out-of-order events surface in their true position with chronologically-correct predecessors. (review v3 #1) - chainEventsForEntities adds `m.workspace_id IS NULL` for the same reason as the search-path fix above — the global event-chains HTTP endpoint must not surface workspace memories. (review v4 #1) Schema - FirstMentionsExtractBodySchema validates memory_ids_by_turn_id values as UUIDs so a non-UUID returns 400 (schema layer) instead of leaking a Postgres "invalid input syntax for type uuid" as 500 from the route. (review v3 #2) - New SearchResult.retrieval_signal optional field tags chain-augmented rows so observability and any future ranker can distinguish them from similarity-ranked candidates. (review v2 #1, plumbed through v4) Refactor - Extracted maybeExpandViaTLL, hydrateChainMemories, appendTllAugmentation out of memory-search.ts into a sibling tll-augmentation.ts module. The search file dropped from 551 → 385 LOC, back under the 400-LOC project cap. Shared internal types (PostProcessedSearch, RelevanceFilterSummary) pulled into memory-search-types.ts so the two consumers don't duplicate. (review v5 #4) Test coverage - New integration test (services/__tests__/tll-augmentation-integration.test.ts) drives performSearch end-to-end through appendTllAugmentation, with cases for: rendering augmented rows through buildInjection without crashing, the SQL contract (unnest ORDINALITY + ORDER BY req.ord), workspace-leak prevention, the relevance-1.0 bypass invariant, and no-augmentation for non-TLL queries. - New repository tests for backfill chronological ordering of chain() and chainEventsForEntities() and for chainEventsForEntities workspace isolation. - New route test asserts 400 (not 500) on non-UUID memory_ids_by_turn_id. Verification: `npx tsc --noEmit` clean · 1286/1286 vitest pass against the test DB · `npx fallow audit --no-cache` exit 0. Deferrals (parked durably in the research repo's tech-debt log at Atomicmemory-research/docs/core-repo/tech-debt.md): - predecessor_memory_id ON DELETE CASCADE vs SET NULL — design call, contested between two reviewers; current default kept. - process.env.ALLOWED_ORIGINS direct read in src/routes/memories.ts — pre-existing CLAUDE.md violation, out of scope for this PR. - shouldUseTLL non-adjacent "after did" — low operational risk; tightening the regex was the explicit goal of review #9 and re-broadening risks reintroducing false positives. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

moralespanitz requested a review from ethanj as a code owner April 29, 2026 17:56

moralespanitz marked this pull request as draft April 30, 2026 05:19

This was referenced May 1, 2026

feat: enforce retrieval relevance thresholds #2

Merged

feat(core): productize first-mention events + TLL EO read-path #18

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(extraction): bump EXTRACTION_MAX_TOKENS 4096 → 8192#4

fix(extraction): bump EXTRACTION_MAX_TOKENS 4096 → 8192#4
moralespanitz wants to merge 1 commit intomainfrom
fix/extraction-max-tokens

moralespanitz commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

moralespanitz commented Apr 29, 2026

Summary

Evidence

Risk

Companion changes (separate PRs)

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant