feat(core): productize first-mention events + TLL EO read-path#17
Draft
moralespanitz wants to merge 7 commits intoexperiment/phase2-combined-stackfrom
Draft
feat(core): productize first-mention events + TLL EO read-path#17moralespanitz wants to merge 7 commits intoexperiment/phase2-combined-stackfrom
moralespanitz wants to merge 7 commits intoexperiment/phase2-combined-stackfrom
Conversation
Swaps EXTRACTION_PROMPT for CONCEPT_FAITHFUL_EXTRACTION_PROMPT as
the active extraction prompt at runtime. Toggle back via env var
EXTRACTION_PROMPT_VARIANT=atomic if needed.
Hypothesis (from H1 judge-sanity): BEAM rubrics score on concept-
level vocabulary ('Transaction error handling', 'Security and
deployment'). Our atomic extractor explicitly strips these phrases
('AGGRESSIVE TECHNOLOGY SPLITTING'). Mem0's 15-80 word memories
preserve them — that's why their published BEAM SUM is 0.635 vs
our 0.10.
This is the upstream-est lever in our stack. Lower layer = higher
leverage. tsc clean, 1213/1213 tests pass.
…rb-form keyword expansion Three architectural changes that align AtomicMemory with the Mem0-pattern playbook (April 2026 algorithm). 1. AUDN_LLM_DISABLED config flag (Phase 1) When true, memory-audn short-circuits the slow LLM mutation-decision path: every fact that's not a fast-AUDN near-duplicate NOOP is stored as ADD. Defers state-change semantics to retrieval-time. Per Mem0's own description: "the new algorithm collapses extraction into a single LLM call that only adds. Every extracted fact becomes an independent record." Effect: halves ingest latency, eliminates the AUDN-LLM hang vector (no LLM call to time out), preserves history (no UPDATE/DELETE that destroys information). 2. First-class agent-fact extraction (Phase 1b) EXTRACTION_PROMPT now treats assistant-generated facts (confirmations, recommendations, computed results) as first-class extractable content with explicit prefixes: "Assistant confirmed:", "Assistant recommended:", "Assistant computed:". Per Mem0: +53.6 on single-session-assistant. Closes a real coverage gap on questions like "what did we decide?" that depend on retrievable agent-stated facts. 3. Verb-form keyword expansion (Phase 1c) findKeywordCandidates() now expands each keyword into its verb-form variants (-ing, -ed, -es, -s) before the ILIKE substring search. Mem0 reports measurable lift: "what meetings did I attend?" now matches stored "attending a meeting". Minimal English suffix table, no NLP dep. Pre-commit: - npx tsc --noEmit clean - npm test: 1214/1214 vitest tests pass (+1 new test for agent-facts) Hypothesis (to be validated next): together these unlock Mem0-level performance on BEAM-100K Anthropic stack, while preserving our distinctive primitives (observation network, lessons, claim slots).
… primitive H2b finding (2026-05-03): concept-faithful prompt produced raw text copies on Haiku, not synthesized memories. KU regressed 2/2 -> 0/2. The Mem0-pattern win comes from ADD-only architecture, not from changing extraction granularity. Atomic facts + ADD-only + observation network synthesis is the right combination. Concept-faithful prompt remains available via EXTRACTION_PROMPT_VARIANT=concept-faithful for controlled experiments. See 2026-05-03-deep-state-analysis.md for the full reasoning chain.
Adds the TLL primitive — a per-entity sparse graph of event nodes with
predecessor/successor edges. Each new memory referencing an entity
appends an event node to that entity's chain; the predecessor pointer
allows traversal of the chain backward at query time.
Targets the abilities Mem0 explicitly admits their architecture
doesn't crack at 10M (per their April 2026 blog):
- temporal reasoning (TR)
- event ordering (EO)
- multi-session reasoning (MSR)
These are exactly the abilities still at 0/2 single-run after the
Phase 1 ADD-only architectural shift. They require higher-order
representations of how events relate across time — fact-level and
entity-level matching are insufficient.
Implementation (Karpathy-minimal):
- schema.sql: new table temporal_linkage_list with composite PK
(user_id, entity_id, memory_id), predecessor pointer, position_in_chain
- repository-tll.ts: append() and chain()/chainsFor() ~120 LOC
- memory-storage.ts: append after entity link in resolveAndLinkEntities
(best-effort, fire-and-forget — keeps ingest hot path fast)
- runtime-container.ts: instantiate TllRepository when entity-graph enabled
- memory-service-types.ts + memory-service.ts: thread through deps
This is the unique architectural primitive: Hindsight gestures at it
via Tempr; the CROME proposal formalizes it; nobody has shipped it
publicly. Retrieval-time traversal is not yet wired — that's Phase 4b
(adding TLL traversal as a retrieval signal alongside semantic/keyword/
entity).
Pre-commit: tsc clean, 1214/1214 vitest tests pass.
…ion -0.56) Diagnostic 1 finding: per-ability multirun aggregate showed Phase 1's agent-fact extraction prompt regressed ABS from 0.83 -> 0.27 (-0.56). Mechanism: over-surfacing assistant explanatory hypothetical content as facts. ABS questions ask about things the user never raised; if the assistant explained a similar topic, our extractor stored that explanation as a 'fact' that retrieval surfaces, defeating abstention. Fix: keep the original 'Skip generic assistant chatter' rule but add explicit forbidding of hypothetical/explanatory content extraction. Preserve narrow factual extraction (named entities, recommendations, data tables, schedules) which lifted IF/PF. Predicted: ABS recovers to ~0.83, SUM/PF/KU/IF retain their lifts. Composite jumps from 0.510 to ~0.57.
Phase 4b: TLL chain-traversal as a deterministic retrieval-time signal for EO/MSR/TR queries. When the query matches an ordering/temporal pattern (regex over 'order', 'before/after', 'evolution', 'when did', etc), we: 1. Take the top-10 initial retrieval candidates' memory_ids 2. Find which entities they link to (memory_entities) 3. For those entities, traverse the TLL chain (chronological event sequence per entity) and collect all chain memory_ids 4. Hydrate any chain memories not already in the candidate set 5. Append to the result pool — downstream filtering/reranking applies Fails open: chain expansion errors don't block primary retrieval. Skipped for non-ordering queries (factual lookups don't benefit). This is the unique architectural primitive: per-entity event-chain traversal at retrieval time. Hindsight gestures at it via Tempr; CROME formalizes it; Mem0 admits their architecture lacks higher- order temporal representations entirely. Pre-commit: tsc clean, 1214/1214 vitest tests pass.
Two architectural primitives previously prototyped harness-side now ship as
first-class atomicmemory-core features.
## TLL EO read-path (B2)
Previously: tll-retrieval helpers existed in core but were dynamic-imported
inside memory-search.ts and not exposed via any public API.
Now:
- `memory-search.ts` uses static `import { shouldUseTLL, expandViaTLL }
from './tll-retrieval.js'` (line 22). Cleaner deps, no runtime import.
- `repository-tll.ts` adds `chainEventsForEntities(userId, entityIds)`
that returns enriched events joined with memory content (memoryId,
content, observationDate, positionInChain, predecessorMemoryId). Used
by the new HTTP endpoint and by EO-shaped read paths that need content
alongside chain position.
- `memory-service.ts` adds public `getEventChains(userId, entityIds)`
wrapper around the new repo method.
- `routes/memories.ts` registers `GET /v1/memories/event-chains?
user_id=X&entity_ids=Y,Z`. Comma-separated UUIDs, deduped, validated.
Returns `{ chains: [{ entity_id, events: [...] }] }`.
- `EventChainsQuerySchema` + `EventChainsResponseSchema` + response-map
entry. Behavior: TLL stays read-only augmentation; AUDN flow unaffected;
NULL-predecessor handling preserved.
Tests:
- `services/__tests__/tll-retrieval.test.ts` — 26 cases covering
`shouldUseTLL` regex coverage (positive + negative + case insensitivity),
`entitiesForMemories` SQL-shape verification, and `expandViaTLL` call
ordering / 10-id slice / userId pass-through.
- `db/__tests__/repository-tll.test.ts` — 13 integration tests against
test Postgres covering `append` idempotency + predecessor wiring,
`chain` and `chainsFor` ordering, and `chainEventsForEntities`
enriched-join + soft-delete filtering.
## First-mention events productization (B1)
Previously: `extractFirstMentions` lived in the BEAM harness only; chained
through a single LLM call returning JSON; no persistence.
Now:
- New table `first_mention_events` (schema.sql) with `(user_id, memory_id)`
unique constraint for idempotent re-extraction. Indexed on
`(user_id, position_in_conversation)` and on `topic` via GIN.
- `repository-first-mentions.ts` — `FirstMentionRepository` with
`store()`, `getByMemoryId()`, `list()`. Mirrors `TllRepository` pattern.
- `services/first-mention-service.ts` — `FirstMentionService` with
`extractAndStore(userId, conversationText, sourceSite,
memoryIdsByTurnId)`. Ports `FIRST_MENTIONS_SYSTEM` prompt and salvage
parser verbatim from the harness; runs single LLM call via injected
`ChatFn`; maps loose LLM output to strict `FirstMentionEvent` schema.
- `routes/memories.ts` registers `POST /v1/memories/first-mentions/extract`.
Body: `{ user_id, conversation_text, source_site, memory_ids_by_turn_id }`
where `memory_ids_by_turn_id` is `{ "0": "uuid", "5": "uuid", ... }`
(object form because JSON has no Map). Returns `{ events: [...] }`.
- `app/runtime-container.ts` instantiates the repository + service. The
service's `ChatFn` adapter wraps the configured `llm.chat` singleton
from `services/llm.ts`; per-call cost is tracked inside `llm.chat`.
- `MemoryService` constructor accepts a 9th optional parameter
(`firstMentionService`) and exposes `extractFirstMentions()`.
- `MemoryServiceDeps` adds `firstMentionService: FirstMentionService | null`.
Tests:
- `services/__tests__/first-mention-service.test.ts` — 9 unit tests
covering happy path, salvage of truncated JSON, garbage-text fallback,
non-array JSON, chatFn throw, missing `memoryId` mapping drop, schema
validation drop, anchor_date parsing (valid/invalid/null), ascending
sort. No DB required.
## Why caller-driven extraction (no in-core ingest hook)
The in-core ingest pipeline does not retain turn structure (it extracts
atomic facts from chunks, not turns). The BEAM harness — which knows the
turn structure — supplies the turn-id-to-memory-id mapping in the request
body. This keeps the extraction path explicit and the core ingest pipeline
unchanged. Adding an automatic post-write hook is deferred to a follow-up
once a core-side notion of "turn" exists.
## Verification
- `npx tsc --noEmit` clean across all changes
- New tests deliberately not run in this commit (`repository-tll.test.ts`
needs `dotenv -e .env.test` Postgres set up); `npm test` to be run
before merging the PR
## Files
NEW:
- src/db/repository-first-mentions.ts
- src/services/first-mention-service.ts
- src/services/__tests__/first-mention-service.test.ts
- src/services/__tests__/tll-retrieval.test.ts
- src/db/__tests__/repository-tll.test.ts
MODIFIED:
- src/db/schema.sql (first_mention_events table)
- src/db/repository-tll.ts (chainEventsForEntities)
- src/services/memory-search.ts (static tll-retrieval import)
- src/services/memory-service.ts (getEventChains, extractFirstMentions)
- src/services/memory-service-types.ts (firstMentionService dep)
- src/routes/memories.ts (event-chains + first-mentions/extract routes)
- src/routes/response-schema-map.ts (new schema entries)
- src/schemas/memories.ts (EventChainsQuerySchema,
FirstMentionsExtractBodySchema)
- src/schemas/responses.ts (EventChainsResponseSchema,
FirstMentionsExtractResponseSchema)
- src/services/__tests__/cross-workspace-coupling-fence.test.ts
(firstMentionService: null in test deps)
- src/app/runtime-container.ts (repository + service instantiation,
9th MemoryService constructor argument)
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Productizes two architectural primitives previously prototyped in the BEAM benchmark harness:
Both surface as new HTTP endpoints; both are gated and best-effort so existing pipelines are untouched. Type-check clean; new repo + integration tests pending DB run.
Branch base note: stacks on
experiment/phase2-combined-stack(the active core experiment branch) because that branch andmainhave no common ancestor in the current repo state. Onceexperiment/phase2-combined-stacklands onmain, this PR can be re-targeted.Companion:
atomicmemory-benchmarksPR #20 (link) where these primitives were prototyped harness-side. Today's cross-backbone gate experiment showed the +0.157 architecture lift on Haiku does not generalize to Sonnet 4.6 (regressed to 0.350); productizing the primitives makes them shippable independent of BEAM scoring while keeping cross-backbone variance as the headline finding.What's new
TLL EO read-path
src/services/memory-search.tsawait import('./tll-retrieval.js')chainEventsForEntities()src/db/repository-tll.tsgetEventChains()src/services/memory-service.tsEventChainsQuerySchemasrc/schemas/memories.tsEventChainsResponseSchemasrc/schemas/responses.tsGET /v1/memories/event-chains?user_id=X&entity_ids=Y,Zsrc/routes/memories.tssrc/services/__tests__/tll-retrieval.test.ts(26 cases) +src/db/__tests__/repository-tll.test.ts(13 tests)Risk: LOW. TLL stays read-only augmentation; AUDN flow unaffected; NULL-predecessor handling preserved.
First-mention events productization
src/db/schema.sqlfirst_mention_eventstable + 2 indexes (position, GIN(topic))src/db/repository-first-mentions.ts(NEW, 119 lines)FirstMentionRepositorywithstore(),getByMemoryId(),list()src/services/first-mention-service.ts(NEW, 209 lines)FirstMentionService.extractAndStore(). Single LLM call scans full transcript; salvage parser for truncated JSON; best-effort error handlingsrc/app/runtime-container.tsChatFnadapter wrapsllm.chatsingletonsrc/services/memory-service.tsfirstMentionService; new publicextractFirstMentions()methodsrc/services/memory-service-types.tsfirstMentionService: FirstMentionService | nullsrc/schemas/memories.tsFirstMentionsExtractBodySchemaacceptsmemory_ids_by_turn_idasRecord<string, string>and transforms toMap<number, string>src/schemas/responses.tsFirstMentionsExtractResponseSchemasrc/routes/memories.tsPOST /v1/memories/first-mentions/extractsrc/services/__tests__/cross-workspace-coupling-fence.test.tsfirstMentionService: nullsrc/services/__tests__/first-mention-service.test.tsWhy caller-driven extraction (no in-core ingest hook)
The in-core ingest pipeline does not retain turn structure (it extracts atomic facts from chunks, not turns). The BEAM harness — which knows turn structure — supplies the turn-id-to-memory-id mapping in the request body. Adding an automatic post-write hook is deferred to a follow-up once a core-side notion of "turn" exists. This keeps the extraction path explicit and the core ingest pipeline unchanged.
API additions
Verification
npx tsc --noEmitclean across all changesnpm test(vitest withfileParallelism: false) — required pre-merge; needsdotenv -e .env.testPostgres set upfallow --no-cacheif installed (skip with note otherwise)npm run migrate:testthennpm run migrateon real DB before deployTest plan
npm testagainst test Postgres — expect new tests passfirst_mention_eventstable created with(user_id, memory_id) UNIQUE+ both indexesGET /v1/memories/event-chainswith a known UID + entity_ids; expects ordered enriched eventsPOST /v1/memories/first-mentions/extractwith a sample conversation + memory_ids_by_turn_id; expects events JSON; verifies row infirst_mention_eventsOut of scope (separate PRs)
atomicmemory-benchmarks/data/exp-stage7-beam-dryrun/lib.tsto call new core endpoint instead of localextractFirstMentions()Companion
atomicmemory-benchmarksPR feat(infra): LiteLLM unified gateway for multi-provider LLM routing #20: https://github.com/atomicmemory/atomicmemory-benchmarks/pull/20Merge readiness (target:
main)This section captures the state of the PR specifically as a candidate for merge into
main, layered on top of the existing description above.Pre-merge blockers
experiment/phase2-combined-stack, notmain. Re-targeting the base tomainis a hard requirement before a merge button to main makes sense. See blocker (2) for why a simple re-target is not sufficient on its own.HEADandorigin/mainshare no common ancestor. Confirmed withgit merge-base feature/first-mention-and-tll-productization origin/main(exit 1, no output). The two histories have separate root commits (754d9ac"Initial public release" vs919752f"Initial commit"). A normal merge or rebase tomainis not possible without an explicit--allow-unrelated-historiesmerge or a history reconciliation. This is a repo-wide situation, not specific to this PR — six other open PRs (feat(extraction): preserve first-person negations in extraction prompt #3–feat(search): EXP-14 — retrieval-side abstention gate #8) targetmainfrom branches that descend from the same754d9aclineage and would face the same constraint.gh pr checks 17returns "no checks reported on the 'feature/first-mention-and-tll-productization' branch"..github/workflows/ci.ymlis configured to trigger onpull_request: branches: [main], so PRs targeting any non-mainbranch (including this one targetingexperiment/phase2-combined-stack) do not trigger CI. Re-targeting tomainwould also unblock CI.npm testnot yet run against test Postgres. The PR body's verification checklist explicitly leaves this unchecked; the newrepository-tll.test.ts(13 cases) needs.env.testPostgres. CI will cover this once (3) is unblocked. Local pre-commit verification was deliberately skipped in this audit — agent has been instructed not to touch the running core server.Behavior changes for callers (vs
main)The PR body documents the delta vs the immediate base (
phase2-combined-stack). The full delta vsmainis larger becausephase2-combined-stackitself has not landed onmain. Net surface diffmain..HEAD(across the full stack):GET /v1/memories/event-chains,POST /v1/memories/first-mentions/extract.c84d7b8in the stack):AUDN_LLM_DISABLED— whentrue, the AUDN flow short-circuits to ADD-only. Defaultsfalse; backward-compatible.709242f):EXTRACTION_MAX_TOKENSraised 4096 → 8192 default. Backward-compatible.edcbe0b):ANTHROPIC_LLM_TIMEOUT_MS(default30000).first_mention_eventstable + 2 indexes.temporal_linkage_listtable (commitd6bd5f8); migrations should be applied in order if/when this stack lands.EXTRACTION_PROMPT_VARIANT=concept-faithful. No breaking change for existing callers.AUDN_LLM_DISABLEDis not set (default).Test coverage for new code (this PR's commit only)
src/services/__tests__/tll-retrieval.test.tssrc/db/__tests__/repository-tll.test.tssrc/services/__tests__/first-mention-service.test.tssrc/services/__tests__/cross-workspace-coupling-fence.test.tsRecommended path to merge
experiment/phase2-combined-stackonmainfirst (with whatever history-reconciliation strategy the team chooses for the unrelated-histories situation), OR re-target this PR tomainand merge with--allow-unrelated-historiessemantics.experiment/phase2-combined-stacktomainonce (1) is resolved.build-and-test.npm run migrateon production after merge (two new tables:temporal_linkage_listif not already migrated,first_mention_events).Status
AGENTS.mdleft alone (project-context doc; mirrors the embeddedCLAUDE.mdcontent).