Skip to content

feat(core): productize first-mention events + TLL EO read-path#17

Draft
moralespanitz wants to merge 7 commits intoexperiment/phase2-combined-stackfrom
feature/first-mention-and-tll-productization
Draft

feat(core): productize first-mention events + TLL EO read-path#17
moralespanitz wants to merge 7 commits intoexperiment/phase2-combined-stackfrom
feature/first-mention-and-tll-productization

Conversation

@moralespanitz
Copy link
Copy Markdown
Contributor

@moralespanitz moralespanitz commented May 4, 2026

Summary

Productizes two architectural primitives previously prototyped in the BEAM benchmark harness:

  1. First-mention events — chronological topic-introduction list as first-class memory objects (new table, repository, service, HTTP route)
  2. TLL EO read-path — public API on the existing Temporal Linkage List (write-side already shipped); enriched event-chain queries with content joined

Both surface as new HTTP endpoints; both are gated and best-effort so existing pipelines are untouched. Type-check clean; new repo + integration tests pending DB run.

Branch base note: stacks on experiment/phase2-combined-stack (the active core experiment branch) because that branch and main have no common ancestor in the current repo state. Once experiment/phase2-combined-stack lands on main, this PR can be re-targeted.

Companion: atomicmemory-benchmarks PR #20 (link) where these primitives were prototyped harness-side. Today's cross-backbone gate experiment showed the +0.157 architecture lift on Haiku does not generalize to Sonnet 4.6 (regressed to 0.350); productizing the primitives makes them shippable independent of BEAM scoring while keeping cross-backbone variance as the headline finding.

What's new

TLL EO read-path

Change File Purpose
Static import src/services/memory-search.ts replaces dynamic await import('./tll-retrieval.js')
chainEventsForEntities() src/db/repository-tll.ts enriched events: memoryId, content, observationDate, positionInChain, predecessorMemoryId joined from memories table
getEventChains() src/services/memory-service.ts public service method; null-safe when TLL disabled
EventChainsQuerySchema src/schemas/memories.ts comma-separated UUID list, deduped, validated
EventChainsResponseSchema src/schemas/responses.ts shape contract
GET /v1/memories/event-chains?user_id=X&entity_ids=Y,Z src/routes/memories.ts new endpoint
Tests src/services/__tests__/tll-retrieval.test.ts (26 cases) + src/db/__tests__/repository-tll.test.ts (13 tests) unit tests for retrieval helpers + integration tests against test Postgres

Risk: LOW. TLL stays read-only augmentation; AUDN flow unaffected; NULL-predecessor handling preserved.

First-mention events productization

Change File Purpose
Schema migration src/db/schema.sql new first_mention_events table + 2 indexes (position, GIN(topic))
Repository src/db/repository-first-mentions.ts (NEW, 119 lines) FirstMentionRepository with store(), getByMemoryId(), list()
Service src/services/first-mention-service.ts (NEW, 209 lines) FirstMentionService.extractAndStore(). Single LLM call scans full transcript; salvage parser for truncated JSON; best-effort error handling
Composition root src/app/runtime-container.ts instantiate repo + service; ChatFn adapter wraps llm.chat singleton
MemoryService 9th param src/services/memory-service.ts optional firstMentionService; new public extractFirstMentions() method
Deps type src/services/memory-service-types.ts firstMentionService: FirstMentionService | null
Body schema src/schemas/memories.ts FirstMentionsExtractBodySchema accepts memory_ids_by_turn_id as Record<string, string> and transforms to Map<number, string>
Response schema src/schemas/responses.ts FirstMentionsExtractResponseSchema
Route src/routes/memories.ts POST /v1/memories/first-mentions/extract
Test fixture src/services/__tests__/cross-workspace-coupling-fence.test.ts adds firstMentionService: null
Tests src/services/__tests__/first-mention-service.test.ts 9 unit tests covering happy path, salvage of truncated JSON, garbage fallback, schema validation, anchor-date parsing, sort

Why caller-driven extraction (no in-core ingest hook)

The in-core ingest pipeline does not retain turn structure (it extracts atomic facts from chunks, not turns). The BEAM harness — which knows turn structure — supplies the turn-id-to-memory-id mapping in the request body. Adding an automatic post-write hook is deferred to a follow-up once a core-side notion of "turn" exists. This keeps the extraction path explicit and the core ingest pipeline unchanged.

API additions

GET /v1/memories/event-chains?user_id=<uid>&entity_ids=<uuid1>,<uuid2>
→ { chains: [{ entity_id, events: [{ memory_id, content, observation_date, position_in_chain, predecessor_memory_id }, ...] }] }

POST /v1/memories/first-mentions/extract
Body: {
  user_id: string,
  conversation_text: string,
  source_site: string,
  memory_ids_by_turn_id: { [turnId: string]: string }
}
→ { events: [{ topic, turn_id, memory_id, anchor_date, position_in_conversation }, ...] }

Verification

  • npx tsc --noEmit clean across all changes
  • npm test (vitest with fileParallelism: false) — required pre-merge; needs dotenv -e .env.test Postgres set up
  • fallow --no-cache if installed (skip with note otherwise)
  • npm run migrate:test then npm run migrate on real DB before deploy

Test plan

  • Reviewer runs npm test against test Postgres — expect new tests pass
  • Reviewer runs migration on a clean test DB; confirms first_mention_events table created with (user_id, memory_id) UNIQUE + both indexes
  • Reviewer hits GET /v1/memories/event-chains with a known UID + entity_ids; expects ordered enriched events
  • Reviewer hits POST /v1/memories/first-mentions/extract with a sample conversation + memory_ids_by_turn_id; expects events JSON; verifies row in first_mention_events
  • Reviewer confirms existing routes still pass schema validation

Out of scope (separate PRs)

  • Harness refactor in atomicmemory-benchmarks/data/exp-stage7-beam-dryrun/lib.ts to call new core endpoint instead of local extractFirstMentions()
  • In-core ingest-post-write hook (requires turn-structure abstraction)
  • BEAM 1M / 10M evaluation (pre-committed kill after cross-backbone gate)
  • Sonnet/Opus 3-conv multirun

Companion


Merge readiness (target: main)

This section captures the state of the PR specifically as a candidate for merge into main, layered on top of the existing description above.

Pre-merge blockers

  1. PR base is experiment/phase2-combined-stack, not main. Re-targeting the base to main is a hard requirement before a merge button to main makes sense. See blocker (2) for why a simple re-target is not sufficient on its own.
  2. HEAD and origin/main share no common ancestor. Confirmed with git merge-base feature/first-mention-and-tll-productization origin/main (exit 1, no output). The two histories have separate root commits (754d9ac "Initial public release" vs 919752f "Initial commit"). A normal merge or rebase to main is not possible without an explicit --allow-unrelated-histories merge or a history reconciliation. This is a repo-wide situation, not specific to this PR — six other open PRs (feat(extraction): preserve first-person negations in extraction prompt #3feat(search): EXP-14 — retrieval-side abstention gate #8) target main from branches that descend from the same 754d9ac lineage and would face the same constraint.
  3. CI has not run on this PR. gh pr checks 17 returns "no checks reported on the 'feature/first-mention-and-tll-productization' branch". .github/workflows/ci.yml is configured to trigger on pull_request: branches: [main], so PRs targeting any non-main branch (including this one targeting experiment/phase2-combined-stack) do not trigger CI. Re-targeting to main would also unblock CI.
  4. npm test not yet run against test Postgres. The PR body's verification checklist explicitly leaves this unchecked; the new repository-tll.test.ts (13 cases) needs .env.test Postgres. CI will cover this once (3) is unblocked. Local pre-commit verification was deliberately skipped in this audit — agent has been instructed not to touch the running core server.

Behavior changes for callers (vs main)

The PR body documents the delta vs the immediate base (phase2-combined-stack). The full delta vs main is larger because phase2-combined-stack itself has not landed on main. Net surface diff main..HEAD (across the full stack):

  • New HTTP endpoints (this PR): GET /v1/memories/event-chains, POST /v1/memories/first-mentions/extract.
  • New env flag (from earlier commit c84d7b8 in the stack): AUDN_LLM_DISABLED — when true, the AUDN flow short-circuits to ADD-only. Defaults false; backward-compatible.
  • New env flag (from earlier commit 709242f): EXTRACTION_MAX_TOKENS raised 4096 → 8192 default. Backward-compatible.
  • New env flag (from earlier commit edcbe0b): ANTHROPIC_LLM_TIMEOUT_MS (default 30000).
  • New schema migration (this PR): first_mention_events table + 2 indexes.
  • Other schema changes (from earlier commits in the stack): temporal_linkage_list table (commit d6bd5f8); migrations should be applied in order if/when this stack lands.
  • Extraction prompt: atomic remains the default. Concept-faithful is opt-in via EXTRACTION_PROMPT_VARIANT=concept-faithful. No breaking change for existing callers.
  • AUDN behavior: unchanged when AUDN_LLM_DISABLED is not set (default).

Test coverage for new code (this PR's commit only)

  • 26 unit cases in src/services/__tests__/tll-retrieval.test.ts
  • 13 integration cases in src/db/__tests__/repository-tll.test.ts
  • 9 unit cases in src/services/__tests__/first-mention-service.test.ts
  • New deps fixture in src/services/__tests__/cross-workspace-coupling-fence.test.ts

Recommended path to merge

  1. Land experiment/phase2-combined-stack on main first (with whatever history-reconciliation strategy the team chooses for the unrelated-histories situation), OR re-target this PR to main and merge with --allow-unrelated-histories semantics.
  2. Re-target PR feat(core): productize first-mention events + TLL EO read-path #17 base from experiment/phase2-combined-stack to main once (1) is resolved.
  3. CI will auto-trigger on the re-target; gate on green build-and-test.
  4. Run npm run migrate on production after merge (two new tables: temporal_linkage_list if not already migrated, first_mention_events).

Status

  • PR remains DRAFT. Not flipped to Ready-for-Review by this audit.
  • No code changes made by this audit; only this "Merge readiness" section was appended to the PR body.
  • Untracked AGENTS.md left alone (project-context doc; mirrors the embedded CLAUDE.md content).

Swaps EXTRACTION_PROMPT for CONCEPT_FAITHFUL_EXTRACTION_PROMPT as
the active extraction prompt at runtime. Toggle back via env var
EXTRACTION_PROMPT_VARIANT=atomic if needed.

Hypothesis (from H1 judge-sanity): BEAM rubrics score on concept-
level vocabulary ('Transaction error handling', 'Security and
deployment'). Our atomic extractor explicitly strips these phrases
('AGGRESSIVE TECHNOLOGY SPLITTING'). Mem0's 15-80 word memories
preserve them — that's why their published BEAM SUM is 0.635 vs
our 0.10.

This is the upstream-est lever in our stack. Lower layer = higher
leverage. tsc clean, 1213/1213 tests pass.
…rb-form keyword expansion

Three architectural changes that align AtomicMemory with the
Mem0-pattern playbook (April 2026 algorithm).

1. AUDN_LLM_DISABLED config flag (Phase 1)
   When true, memory-audn short-circuits the slow LLM mutation-decision
   path: every fact that's not a fast-AUDN near-duplicate NOOP is stored
   as ADD. Defers state-change semantics to retrieval-time.
   Per Mem0's own description: "the new algorithm collapses extraction
   into a single LLM call that only adds. Every extracted fact becomes
   an independent record."
   Effect: halves ingest latency, eliminates the AUDN-LLM hang vector
   (no LLM call to time out), preserves history (no UPDATE/DELETE that
   destroys information).

2. First-class agent-fact extraction (Phase 1b)
   EXTRACTION_PROMPT now treats assistant-generated facts (confirmations,
   recommendations, computed results) as first-class extractable content
   with explicit prefixes: "Assistant confirmed:", "Assistant recommended:",
   "Assistant computed:". Per Mem0: +53.6 on single-session-assistant.
   Closes a real coverage gap on questions like "what did we decide?"
   that depend on retrievable agent-stated facts.

3. Verb-form keyword expansion (Phase 1c)
   findKeywordCandidates() now expands each keyword into its verb-form
   variants (-ing, -ed, -es, -s) before the ILIKE substring search.
   Mem0 reports measurable lift: "what meetings did I attend?" now
   matches stored "attending a meeting". Minimal English suffix table,
   no NLP dep.

Pre-commit:
- npx tsc --noEmit clean
- npm test: 1214/1214 vitest tests pass (+1 new test for agent-facts)

Hypothesis (to be validated next): together these unlock Mem0-level
performance on BEAM-100K Anthropic stack, while preserving our
distinctive primitives (observation network, lessons, claim slots).
… primitive

H2b finding (2026-05-03): concept-faithful prompt produced raw text
copies on Haiku, not synthesized memories. KU regressed 2/2 -> 0/2.
The Mem0-pattern win comes from ADD-only architecture, not from
changing extraction granularity. Atomic facts + ADD-only + observation
network synthesis is the right combination.

Concept-faithful prompt remains available via
EXTRACTION_PROMPT_VARIANT=concept-faithful for controlled experiments.

See 2026-05-03-deep-state-analysis.md for the full reasoning chain.
Adds the TLL primitive — a per-entity sparse graph of event nodes with
predecessor/successor edges. Each new memory referencing an entity
appends an event node to that entity's chain; the predecessor pointer
allows traversal of the chain backward at query time.

Targets the abilities Mem0 explicitly admits their architecture
doesn't crack at 10M (per their April 2026 blog):
  - temporal reasoning (TR)
  - event ordering (EO)
  - multi-session reasoning (MSR)

These are exactly the abilities still at 0/2 single-run after the
Phase 1 ADD-only architectural shift. They require higher-order
representations of how events relate across time — fact-level and
entity-level matching are insufficient.

Implementation (Karpathy-minimal):
  - schema.sql: new table temporal_linkage_list with composite PK
    (user_id, entity_id, memory_id), predecessor pointer, position_in_chain
  - repository-tll.ts: append() and chain()/chainsFor() ~120 LOC
  - memory-storage.ts: append after entity link in resolveAndLinkEntities
    (best-effort, fire-and-forget — keeps ingest hot path fast)
  - runtime-container.ts: instantiate TllRepository when entity-graph enabled
  - memory-service-types.ts + memory-service.ts: thread through deps

This is the unique architectural primitive: Hindsight gestures at it
via Tempr; the CROME proposal formalizes it; nobody has shipped it
publicly. Retrieval-time traversal is not yet wired — that's Phase 4b
(adding TLL traversal as a retrieval signal alongside semantic/keyword/
entity).

Pre-commit: tsc clean, 1214/1214 vitest tests pass.
…ion -0.56)

Diagnostic 1 finding: per-ability multirun aggregate showed Phase 1's
agent-fact extraction prompt regressed ABS from 0.83 -> 0.27 (-0.56).
Mechanism: over-surfacing assistant explanatory hypothetical content
as facts. ABS questions ask about things the user never raised; if
the assistant explained a similar topic, our extractor stored that
explanation as a 'fact' that retrieval surfaces, defeating abstention.

Fix: keep the original 'Skip generic assistant chatter' rule but add
explicit forbidding of hypothetical/explanatory content extraction.
Preserve narrow factual extraction (named entities, recommendations,
data tables, schedules) which lifted IF/PF.

Predicted: ABS recovers to ~0.83, SUM/PF/KU/IF retain their lifts.
Composite jumps from 0.510 to ~0.57.
Phase 4b: TLL chain-traversal as a deterministic retrieval-time signal
for EO/MSR/TR queries. When the query matches an ordering/temporal
pattern (regex over 'order', 'before/after', 'evolution', 'when did',
etc), we:

1. Take the top-10 initial retrieval candidates' memory_ids
2. Find which entities they link to (memory_entities)
3. For those entities, traverse the TLL chain (chronological event
   sequence per entity) and collect all chain memory_ids
4. Hydrate any chain memories not already in the candidate set
5. Append to the result pool — downstream filtering/reranking applies

Fails open: chain expansion errors don't block primary retrieval.
Skipped for non-ordering queries (factual lookups don't benefit).

This is the unique architectural primitive: per-entity event-chain
traversal at retrieval time. Hindsight gestures at it via Tempr;
CROME formalizes it; Mem0 admits their architecture lacks higher-
order temporal representations entirely.

Pre-commit: tsc clean, 1214/1214 vitest tests pass.
Two architectural primitives previously prototyped harness-side now ship as
first-class atomicmemory-core features.

## TLL EO read-path (B2)

Previously: tll-retrieval helpers existed in core but were dynamic-imported
inside memory-search.ts and not exposed via any public API.

Now:
- `memory-search.ts` uses static `import { shouldUseTLL, expandViaTLL }
  from './tll-retrieval.js'` (line 22). Cleaner deps, no runtime import.
- `repository-tll.ts` adds `chainEventsForEntities(userId, entityIds)`
  that returns enriched events joined with memory content (memoryId,
  content, observationDate, positionInChain, predecessorMemoryId). Used
  by the new HTTP endpoint and by EO-shaped read paths that need content
  alongside chain position.
- `memory-service.ts` adds public `getEventChains(userId, entityIds)`
  wrapper around the new repo method.
- `routes/memories.ts` registers `GET /v1/memories/event-chains?
  user_id=X&entity_ids=Y,Z`. Comma-separated UUIDs, deduped, validated.
  Returns `{ chains: [{ entity_id, events: [...] }] }`.
- `EventChainsQuerySchema` + `EventChainsResponseSchema` + response-map
  entry. Behavior: TLL stays read-only augmentation; AUDN flow unaffected;
  NULL-predecessor handling preserved.

Tests:
- `services/__tests__/tll-retrieval.test.ts` — 26 cases covering
  `shouldUseTLL` regex coverage (positive + negative + case insensitivity),
  `entitiesForMemories` SQL-shape verification, and `expandViaTLL` call
  ordering / 10-id slice / userId pass-through.
- `db/__tests__/repository-tll.test.ts` — 13 integration tests against
  test Postgres covering `append` idempotency + predecessor wiring,
  `chain` and `chainsFor` ordering, and `chainEventsForEntities`
  enriched-join + soft-delete filtering.

## First-mention events productization (B1)

Previously: `extractFirstMentions` lived in the BEAM harness only; chained
through a single LLM call returning JSON; no persistence.

Now:
- New table `first_mention_events` (schema.sql) with `(user_id, memory_id)`
  unique constraint for idempotent re-extraction. Indexed on
  `(user_id, position_in_conversation)` and on `topic` via GIN.
- `repository-first-mentions.ts` — `FirstMentionRepository` with
  `store()`, `getByMemoryId()`, `list()`. Mirrors `TllRepository` pattern.
- `services/first-mention-service.ts` — `FirstMentionService` with
  `extractAndStore(userId, conversationText, sourceSite,
  memoryIdsByTurnId)`. Ports `FIRST_MENTIONS_SYSTEM` prompt and salvage
  parser verbatim from the harness; runs single LLM call via injected
  `ChatFn`; maps loose LLM output to strict `FirstMentionEvent` schema.
- `routes/memories.ts` registers `POST /v1/memories/first-mentions/extract`.
  Body: `{ user_id, conversation_text, source_site, memory_ids_by_turn_id }`
  where `memory_ids_by_turn_id` is `{ "0": "uuid", "5": "uuid", ... }`
  (object form because JSON has no Map). Returns `{ events: [...] }`.
- `app/runtime-container.ts` instantiates the repository + service. The
  service's `ChatFn` adapter wraps the configured `llm.chat` singleton
  from `services/llm.ts`; per-call cost is tracked inside `llm.chat`.
- `MemoryService` constructor accepts a 9th optional parameter
  (`firstMentionService`) and exposes `extractFirstMentions()`.
- `MemoryServiceDeps` adds `firstMentionService: FirstMentionService | null`.

Tests:
- `services/__tests__/first-mention-service.test.ts` — 9 unit tests
  covering happy path, salvage of truncated JSON, garbage-text fallback,
  non-array JSON, chatFn throw, missing `memoryId` mapping drop, schema
  validation drop, anchor_date parsing (valid/invalid/null), ascending
  sort. No DB required.

## Why caller-driven extraction (no in-core ingest hook)

The in-core ingest pipeline does not retain turn structure (it extracts
atomic facts from chunks, not turns). The BEAM harness — which knows the
turn structure — supplies the turn-id-to-memory-id mapping in the request
body. This keeps the extraction path explicit and the core ingest pipeline
unchanged. Adding an automatic post-write hook is deferred to a follow-up
once a core-side notion of "turn" exists.

## Verification

- `npx tsc --noEmit` clean across all changes
- New tests deliberately not run in this commit (`repository-tll.test.ts`
  needs `dotenv -e .env.test` Postgres set up); `npm test` to be run
  before merging the PR

## Files

NEW:
- src/db/repository-first-mentions.ts
- src/services/first-mention-service.ts
- src/services/__tests__/first-mention-service.test.ts
- src/services/__tests__/tll-retrieval.test.ts
- src/db/__tests__/repository-tll.test.ts

MODIFIED:
- src/db/schema.sql (first_mention_events table)
- src/db/repository-tll.ts (chainEventsForEntities)
- src/services/memory-search.ts (static tll-retrieval import)
- src/services/memory-service.ts (getEventChains, extractFirstMentions)
- src/services/memory-service-types.ts (firstMentionService dep)
- src/routes/memories.ts (event-chains + first-mentions/extract routes)
- src/routes/response-schema-map.ts (new schema entries)
- src/schemas/memories.ts (EventChainsQuerySchema,
  FirstMentionsExtractBodySchema)
- src/schemas/responses.ts (EventChainsResponseSchema,
  FirstMentionsExtractResponseSchema)
- src/services/__tests__/cross-workspace-coupling-fence.test.ts
  (firstMentionService: null in test deps)
- src/app/runtime-container.ts (repository + service instantiation,
  9th MemoryService constructor argument)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant