Skip to content

feat(core): productize first-mention events + TLL EO read-path#18

Merged
ethanj merged 16 commits intomainfrom
feat/first-mention-tll-on-main
May 6, 2026
Merged

feat(core): productize first-mention events + TLL EO read-path#18
ethanj merged 16 commits intomainfrom
feat/first-mention-tll-on-main

Conversation

@moralespanitz
Copy link
Copy Markdown
Contributor

Summary

Productizes two memory primitives — first-mention events and Temporal
Linkage List (TLL) read-path retrieval — that were previously only
available in the eval harness.

This branch is a clean port off main (vs the existing draft PR #17
which targets experiment/phase2-combined-stack and shares no history
with main).

What's added

  • GET /v1/memories/event-chains?user_id=X&entity_ids=Y,Z — returns
    ordered event chains for the given entities (TLL read-path).
  • POST /v1/memories/first-mentions/extract — extracts the
    chronological topic-introduction list from a conversation.
  • Schema migration: first_mention_events table + 2 indexes.
  • TLL infrastructure: temporal_linkage_list table + repository +
    read-path service.

Behavior changes for callers

  • Two new HTTP endpoints (additive; no breaking changes).
  • Schema migration runs on next npm run migrate.
  • TLL chain expansion is a fail-open augmentation in memory-search:
    when an ordering/temporal query is detected and TLL is wired,
    candidates from the entity event chain are appended; errors don't
    block primary retrieval.

Test coverage

  • 26 unit cases in tll-retrieval.test.ts (shouldUseTLL regex,
    entitiesForMemories SQL shape, expandViaTLL call ordering /
    10-id slice / userId pass-through).
  • 13 integration cases in repository-tll.test.ts (append
    idempotency + predecessor wiring, chain and chainsFor ordering,
    chainEventsForEntities enriched-join + soft-delete filtering).
  • 9 unit cases in first-mention-service.test.ts (happy path, salvage
    of truncated JSON, garbage-text fallback, non-array JSON, chatFn
    throw, missing memoryId mapping drop, schema validation drop,
    anchor_date parsing, ascending sort).

Test plan

  • npm run migrate:test
  • npm test (full vitest suite)
  • Manual smoke: ingest a conversation, hit GET /v1/memories/event-chains and POST /v1/memories/first-mentions/extract via curl.

Verification done locally

  • npx tsc --noEmit clean.
  • npm test — 1253/1253 vitest tests pass against .env.test Postgres
    on this branch.
  • npx fallow audit clean (no fatal issues; one duplication warn in
    the new test file's small repeated setup blocks, audit gate exits 0).

Adds the TLL primitive — a per-entity sparse graph of event nodes with
predecessor/successor edges. Each new memory referencing an entity
appends an event node to that entity's chain; the predecessor pointer
allows traversal of the chain backward at query time.

Targets the abilities current SOTA architectures admit they don't crack
at scale: temporal reasoning (TR), event ordering (EO), and multi-
session reasoning (MSR). These require higher-order representations of
how events relate across time — fact-level and entity-level matching
are insufficient on their own.

Implementation:
  - schema.sql: new table temporal_linkage_list with composite uniqueness
    on (user_id, entity_id, memory_id), predecessor pointer,
    position_in_chain, and supporting indexes on (user_id, entity_id,
    position_in_chain) and (memory_id).
  - db/repository-tll.ts: TllRepository with append() (idempotent batch
    insert with predecessor wiring) and chain()/chainsFor() readers.
  - services/memory-storage.ts: append after entity link in
    resolveAndLinkEntities (best-effort, fire-and-forget — keeps the
    ingest hot path fast).
  - services/tll-retrieval.ts: shouldUseTLL() regex gate over ordering/
    temporal query phrasing, plus expandViaTLL() helper that takes the
    top-N retrieval candidates, finds their linked entities, and pulls
    the entities' chain memory_ids.
  - services/memory-search.ts: when the gate fires, expand the candidate
    set with chain members (deterministic chain-traversal augmentation,
    not a similarity search). Fail-open — chain expansion errors don't
    block primary retrieval.
  - app/runtime-container.ts: instantiate TllRepository when entity
    graph is enabled.
  - services/memory-service-types.ts + memory-service.ts: thread the
    repository through MemoryServiceDeps as an optional null-able field.

Read-only retrieval augmentation; AUDN/ingest behaviour unchanged.
…rimitive

Introduces a new memory primitive — first-mention events — distinct from
both atomic facts (claims) and memories (ingested chunks). For a given
conversation, captures the first turn at which each topic is brought up.
The grain matches event-ordering rubrics that ask "in what order did the
user bring up these aspects."

Caller-driven extraction (no in-core ingest hook): the in-core ingest
pipeline does not retain turn structure (it extracts atomic facts from
chunks, not turns). External callers that know the turn structure
supply a turn-id-to-memory-id mapping via the public API. This keeps
extraction explicit and the core ingest pipeline unchanged. An
automatic post-write hook is deferred until a core-side notion of "turn"
exists.

Implementation:
  - schema.sql: new table first_mention_events with `(user_id, memory_id)`
    unique constraint for idempotent re-extraction. Indexed on
    `(user_id, position_in_conversation)` and on `topic` via GIN.
  - db/repository-first-mentions.ts: FirstMentionRepository with
    `store()`, `getByMemoryId()`, `list()`. Mirrors the TllRepository
    pattern.
  - services/first-mention-service.ts: FirstMentionService with
    `extractAndStore(userId, conversationText, sourceSite,
    memoryIdsByTurnId)`. One LLM call via an injected ChatFn; salvage
    parser tolerates truncated JSON; loose LLM output is mapped to the
    strict FirstMentionEvent schema.
  - app/runtime-container.ts: instantiate the repository + service. The
    ChatFn adapter wraps the configured `llm.chat` singleton (per-call
    cost is tracked inside `llm.chat`).
  - services/memory-service-types.ts + memory-service.ts: thread the
    service through MemoryServiceDeps as an optional null-able field.

The HTTP endpoint that exposes this primitive is added in a follow-up
commit alongside the TLL read endpoint.
Exposes the first-mention-events and Temporal Linkage List primitives
through public HTTP endpoints. Both primitives existed inside the runtime
container (previous commits) but had no callable surface; this commit
adds the routes, schemas, public service methods, and tests.

## TLL EO read endpoint

  - `GET /v1/memories/event-chains?user_id=X&entity_ids=Y,Z` returns
    `{ chains: [{ entity_id, events: [...] }] }`. `entity_ids` is
    comma-separated, trimmed, deduplicated, and validated as UUIDs.
  - `db/repository-tll.ts` adds `chainEventsForEntities(userId,
    entityIds)` — enriched events joined with memory content
    (memoryId, content, observationDate, positionInChain,
    predecessorMemoryId). Soft-deleted memories are filtered out;
    entities with no events are dropped.
  - `services/memory-service.ts` adds public `getEventChains()` wrapper.
  - `schemas/memories.ts`: `EventChainsQuerySchema`.
  - `schemas/responses.ts`: `EventChainsResponseSchema`.
  - `routes/response-schema-map.ts`: corresponding entry.

## First-mention extract endpoint

  - `POST /v1/memories/first-mentions/extract` body:
    `{ user_id, conversation_text, source_site, memory_ids_by_turn_id }`
    where `memory_ids_by_turn_id` is `{ "0": "uuid", "5": "uuid", ... }`
    (object form because JSON has no Map). Returns `{ events: [...] }`.
  - `services/memory-service.ts` adds public `extractFirstMentions()`
    wrapper around `FirstMentionService.extractAndStore()`.
  - `schemas/memories.ts`: `FirstMentionsExtractBodySchema` (transforms
    the object into a `Map<number, string>`).
  - `schemas/responses.ts`: `FirstMentionsExtractResponseSchema`.
  - `routes/response-schema-map.ts`: corresponding entry.

Both endpoints fail closed (404/400 on validation) and additive — no
existing behaviour changes.

## Tests

  - `services/__tests__/tll-retrieval.test.ts` — 26 cases covering
    `shouldUseTLL` regex coverage (positive + negative + case
    insensitivity), `entitiesForMemories` SQL-shape verification, and
    `expandViaTLL` call ordering / 10-id slice / userId pass-through.
  - `db/__tests__/repository-tll.test.ts` — 13 integration tests
    against test Postgres covering `append` idempotency + predecessor
    wiring, `chain` and `chainsFor` ordering, and
    `chainEventsForEntities` enriched-join + soft-delete filtering.
  - `services/__tests__/first-mention-service.test.ts` — 9 unit tests
    covering happy path, salvage of truncated JSON, garbage-text
    fallback, non-array JSON, chatFn throw, missing `memoryId` mapping
    drop, schema validation drop, anchor_date parsing
    (valid/invalid/null), ascending sort. No DB required.

The previously-needed fallow suppression on `entitiesForMemories` is
removed now that the unit test consumes it.
@ethanj ethanj marked this pull request as ready for review May 5, 2026 23:50
@ethanj ethanj self-requested a review as a code owner May 5, 2026 23:50
@ethanj
Copy link
Copy Markdown
Contributor

ethanj commented May 6, 2026

Code review

Read all 18 changed files end-to-end. Tests look thoughtful (26 + 13 + 9 cases) and the architectural shape (repository → service → route, threading both new deps through MemoryServiceDeps as nullable fields) matches the rest of the codebase. The two endpoints are additive and won't break existing callers. Three findings below are blocking; the rest are smaller.

Critical

1. TllRepository.append() has a TOCTOU race that produces duplicate position_in_chain entries
src/db/repository-tll.ts:41–80

append() first does a SELECT MAX(position_in_chain) per entity, then issues a separate INSERT at tip+1. This runs without a transaction or advisory lock, and the unique constraint is only on (user_id, entity_id, memory_id). Two concurrent appends to the same (user_id, entity_id) (the normal case: two ingest requests sharing one entity, e.g. "Postgres") will both read the same tip and both insert at the same position, both with the same predecessor_memory_id. chain() and chainsFor() order by position_in_chain ASC, so tie-break is undefined and chain traversal becomes ambiguous.

The fire-and-forget call site in src/services/memory-storage.ts:226-231 makes this worse — there's no upstream serialization, and failures are only printed to console.error. With batch ingest or a busy user this is not theoretical.

Fix options:

  • Replace the read+insert with a single atomic INSERT ... SELECT (SELECT MAX(...)+1 ... FOR UPDATE) so the whole assignment happens inside one statement, or
  • Wrap append in BEGIN/COMMIT with SELECT ... FOR UPDATE on the existing chain rows for that (user_id, entity_id).
  • Either way, add a UNIQUE (user_id, entity_id, position_in_chain) index so a duplicate-position bug fails loudly at the DB layer instead of corrupting the chain silently.
  • Add a concurrent-write integration test (Promise.all([append, append]) with a real pool, then assert chain integrity).

2. Hardcoded similarity: 0.5 on TLL-augmented rows passes through the relevance filter and either silently drops them or pollutes ranking
src/services/memory-search.ts:160-180

Trace: executeSearchStepmaybeExpandViaTLLhydrateChainMemories returns rows with similarity: 0.5. The combined set is then passed to postProcessResults (line 368), which calls applySearchRelevanceFilterapplyRelevanceFilter. Inside relevance-policy.ts:100-102, relevance = memory.relevance ?? memory.semantic_similarity ?? memory.similarity — so for hydrated rows relevance = 0.5.

Net effect:

  • If similarityThreshold > 0.5 (a common 0.6–0.7 production value), every TLL-augmented memory gets filtered out and the entire feature does nothing.
  • If similarityThreshold <= 0.5, augmentations pass with a meaningless score that pollutes downstream ranking and audit decisions.

In either case the score is noise, not a signal. TLL membership is a different retrieval signal and shouldn't share the similarity gate. Suggest one of:

  • Mark hydrated rows with a dedicated field (retrieval_signal: 'tll-chain') and short-circuit the relevance filter for them.
  • Append TLL rows after applySearchRelevanceFilter rather than before, so they're not gated against a similarity score they don't have.

3. observation_date semantic gap — new Date() at TLL-append time, not the memory's observed_at
src/services/memory-storage.ts:225

memories.observed_at already exists in schema.sql:66 and is documented as "when the conversation actually happened (vs created_at = DB insertion time)." The TLL append uses const observationDate = new Date() instead — wall-clock at HTTP-request time. chainsFor() orders by observation_date ASC, so any backfill, batch import, or client passing an explicit observed_at produces a chain in ingest-arrival order rather than conversation chronology. Every downstream temporal-reasoning query will be wrong for that data.

Fix: pass the memory's observed_at (already available on the stored record at this call site) into append() instead of new Date().

Important

4. predecessor_memory_id ON DELETE SET NULL corrupts chain backward-traversal on hard deletes
src/db/schema.sql:337

Hard-delete paths exist (resetBySource in repository-write.ts, full reset). When a node referenced as a predecessor_memory_id is hard-deleted, the FK silently nulls its successor's predecessor pointer — the successor now looks like a chain root and backward traversal stops. Suggest ON DELETE CASCADE for predecessor_memory_id (matches the policy already in place for memory_id); orphaned nodes get removed rather than leaving a half-broken chain.

5. No HTTP-level tests for either new endpoint
src/routes/memories.ts:457-520

Schema validation (missing user_id, invalid UUID in entity_ids, conversation_text exceeding MAX_CONVERSATION_LENGTH, empty memory_ids_by_turn_id), error shapes, and successful-response shapes are all untested at the route layer. Other endpoints in this file have route-level integration tests; these two ship without any. Recommend at least: missing user_id → 400, invalid UUID → 400, oversized body → 400, plus a happy-path response-shape assertion per endpoint.

6. EventChainsQuerySchema has no upper bound on entity_ids count
src/schemas/memories.ts:543-562

A request with thousands of UUIDs generates a single ANY($2::uuid[]) joined against memories with no pagination. An authenticated user has a straightforward amplification vector. Add .refine(q => q.entityIds.length <= MAX_ENTITY_IDS_PER_REQUEST, ...) with a named constant (e.g. 50–100).

7. positionInConversation = m.turn_id conflates two separate concepts
src/services/first-mention-service.ts:198

positionInConversation becomes a duplicate of turnId rather than an independent ordinal. If extractAndStore is called twice on the same conversation and the LLM returns slightly different turn_id values for the same logical topic, the (user_id, memory_id) UNIQUE swallows the second write and you get whichever was first. Suggest computing positionInConversation as the 0-based index in the post-sorted output, and adding a re-extraction idempotency test.

Minor

8. Silent error catching at three sites

  • memory-storage.ts:228-231 — fire-and-forget .catch(err => console.error(...)) swallows TLL-append failures and the ingest caller gets a 200 even when chain state is now inconsistent.
  • memory-search.ts:143-147maybeExpandViaTLL swallows all errors as fail-open augmentation.
  • first-mention-service.ts invokeLlm — catches and returns null silently to stderr.

The fail-open behavior may be deliberate, but per CLAUDE.md "no silent error catching" / "no fallback modes," at minimum: route through the structured logger rather than console.error, surface a tllExpansionFailed flag in the trace, and document the deliberate fail-open in tech-debt.md if it stays.

9. shouldUseTLL regex false-positives
src/services/tll-retrieval.ts:22-27

Bare \b(first|last|before|after|then|later|track)\b fires on common factual lookups: "what is my first name", "the model used before GPT-4", "track my spending". Direct harm is modest because expansion is fail-open and additive — but see finding #2, where over-triggering compounds the similarity-gate problem. Tightening the regex (e.g. require two ordering signals) would reduce both.

10. slice(0, 10) magic number duplicated in two files
src/services/tll-retrieval.ts:59 and src/services/memory-search.ts:135. Pull into a named constant (e.g. TLL_ENTITY_LOOKUP_SEED_LIMIT).

11. Hardcoded inputTokens: 0, outputTokens: 0 in the runtime-container.ts chat adapter
src/app/runtime-container.ts:228-229. Cost telemetry from FirstMentionService will be wrong if anything starts reading these fields. Either thread real token counts through llm.chat, or remove the fields from the ChatResult shape.


Findings #1#3 are blocking; #4#7 are important to address before merge or in a fast-follow. Strong shape overall — the architectural primitives (TLL, first-mention) are well-isolated, repositories are thin, and the test files demonstrate the intended invariants clearly. Just need the concurrency, scoring, and timestamp issues nailed down before the read-path goes live.

…eview #1)

Replace the read-then-insert pattern in TllRepository.append() with an
atomic INSERT...SELECT serialized by pg_advisory_xact_lock keyed on
(user_id, entity_id). Concurrent appends to the same entity chain now
serialize on the lock and compute predecessor + position from the latest
committed row, eliminating the TOCTOU race where two parallel callers
read the same MAX(position_in_chain) and both wrote at tip+1.

Add a UNIQUE (user_id, entity_id, position_in_chain) index as
defense-in-depth so any future code path that bypasses the lock fails
loudly at the DB layer rather than silently producing duplicate
positions.

Add an integration test that fires three concurrent appends to the same
chain via Promise.all and asserts positions 0,1,2 with correctly wired
predecessor pointers.
The previous flow ran TLL chain augmentation inside executeSearchStep
and tagged hydrated rows with `similarity: 0.5` so they could pass
through applySearchRelevanceFilter. That magic constant either filtered
chain rows out (when the threshold was higher) or polluted ranking with
a meaningless score.

Move TLL augmentation AFTER postProcessResults / applySearchRelevanceFilter.
Hydrated rows now carry `similarity: null` and `retrieval_signal:
'tll-chain'`, so they ride around the similarity gate entirely —
chain-membership is a structurally different retrieval signal than
semantic similarity. The trace adds a `tll-augmentation` stage with the
ids that were appended.

Replace the `slice(0, 10)` magic number with a named constant
(TLL_SEED_CANDIDATE_COUNT).
…eview #3)

The TLL append in resolveAndLinkEntities was passing `new Date()` as
observation_date, which is the ingest-arrival time. Chains order by
observation_date ASC, so out-of-order or backfilled conversations would
chain by ingest order — destroying conversation chronology that EO and
MSR queries rely on.

Thread the caller-supplied logicalTimestamp through from
storeCanonicalFact to the new maybeAppendTll helper. When a logical
timestamp isn't supplied, look up the just-stored memory's observed_at
column rather than fabricating one. Last-resort new Date() fallback
only fires if the row lookup fails — keeps the append from silently
dropping.
Hard-deletes (e.g. resetBySource in repository-write) silently nulled
out predecessor_memory_id pointers, breaking backward chain traversal
and leaving half-broken chains where some events still reference a
deleted ancestor.

Change the predecessor FK to ON DELETE CASCADE so the dependent chain
node gets removed cleanly when its predecessor goes. Matches the
memory_id FK policy on the same table.

Schema migration runs an idempotent DROP CONSTRAINT / ADD CONSTRAINT in
a DO block because CREATE TABLE IF NOT EXISTS won't update column-level
FK definitions on an existing table.
The /v1/memories/event-chains endpoint fans out per entity. Without an
upper bound on entity_ids, a single caller could pull tens of thousands
of chain rows in one request — straightforward amplification target.

Add a MAX_ENTITY_IDS_PER_REQUEST = 100 named constant and refine the
schema to reject larger lists with a 400 response. Cap chosen to match
the existing MAX_SEARCH_LIMIT ceiling.
@moralespanitz
Copy link
Copy Markdown
Contributor Author

moralespanitz commented May 6, 2026

Thanks for the detailed review. Addressed in commits dc7b9ff, 91676f4, 4020135, d6fa8ed, 63efbea (initial round) and 58c3095, 623ac96, 9eacd79, 36701b0, 35dc36a, c46cddf (fast-follow round).

Blockers (fixed)

  • Improve generic temporal retrieval evidence #1 TOCTOU race in TllRepository.append() (dc7b9ff) — replaced read+insert with atomic INSERT...SELECT under pg_advisory_xact_lock keyed on (user_id, entity_id); added UNIQUE (user_id, entity_id, position_in_chain) index as defense-in-depth; added a concurrent-write integration test that fires Promise.all([append, append, append]) to the same chain and asserts positions 0/1/2 with correctly wired predecessors.
  • feat: enforce retrieval relevance thresholds #2 hardcoded similarity: 0.5 on TLL-augmented rows (91676f4) — moved TLL augmentation AFTER applySearchRelevanceFilter. Hydrated rows now carry similarity: null and retrieval_signal: 'tll-chain' and ride around the similarity gate entirely.
  • feat(extraction): preserve first-person negations in extraction prompt #3 observation_date semantic gap (4020135) — TLL append now threads the caller's logicalTimestamp through from storeCanonicalFact, falling back to the just-stored memory's observed_at column when absent. No more new Date() reordering chains by ingest-arrival time.

Important (fixed)

Fast-follow round (now also fixed)

  • feat(search): EXP-05 — instruction-type tagging and search boost #5 HTTP-level route tests (c46cddf) — new src/routes/__tests__/event-chains-and-first-mentions.test.ts covers the schema-validation 400 paths for both endpoints (missing user_id, invalid UUID in entity_ids, > 100 cap, empty tokens, missing/oversized conversation_text, missing memory_ids_by_turn_id, missing source_site) plus a happy-path shape assertion per endpoint that parses the response with EventChainsResponseSchema / FirstMentionsExtractResponseSchema. 12/12 new tests pass.
  • feat(extraction): EXP-06 — generic event anchors for As-of facts #7 positionInConversation = m.turn_id conflation (35dc36a) — mapToEvents now sorts candidates by turnId ASC and assigns positionInConversation as the 0-based post-sort index. Re-runs of extractAndStore on the same conversation now produce identical (position, topic) tuples even when the LLM's turn_id assignment drifts between runs. Added a service-level idempotency test plus a new DB-integration test in repository-first-mentions.test.ts that exercises ON CONFLICT DO NOTHING behaviour with drifted turn_ids and asserts only 2 rows survive.
  • feat(search): EXP-14 — retrieval-side abstention gate #8 silent error catching at 3 sites (36701b0) — TLL append (memory-storage.ts), TLL expansion (memory-search.ts), and first-mention LLM call (first-mention-service.ts) now log via structured prefixes ([tll-append-failed], [tll-expansion-failed], [first-mention-llm-failed]). The TLL-expansion path additionally surfaces a tll_expansion_failed event on the active retrieval TraceCollector so the failure is observable in trace artifacts. Comments at each site document the deliberate fail-open (TLL augmentation and first-mention extraction are best-effort augmentations, not primary mutations — fail-open here is the correct policy under CLAUDE.md's "no fallback modes" rule, which targets AUDN mutations).
  • feat(extraction): EXP-05 H-310 — extend INSTRUCTION_MARKERS for BEAM phrasings #9 shouldUseTLL regex false-positives (9eacd79) — replaced the single-alternation regex with a two-tier check: an ORDERING_TERMS_RE set of single-token signals (first/last/before/after/then/later/earlier/previous/next/prior) where TWO must co-occur to fire, plus a curated SEQUENCE_PATTERNS list of structural phrases (in (chronological/reverse/the) order, when did, since when, over time, evolution of, history|timeline of, originally/initially, progression of, how X evolved/shifted/changed, brought up) where one phrase suffices. Removed track, sequence, and bare order from the gate. Updated test asserts false for the three reviewer-cited false-positives ("what is my first name", "the model used before GPT-4", "track my spending") and true for canonical EO/MSR/TR shapes. 41/41 unit tests pass.
  • feat(memory-ingest): EXP-08 — scheduled consolidation every N turns #10 slice(0, 10) duplicated magic number (623ac96) — defined and exported TLL_ENTITY_LOOKUP_SEED_LIMIT = 10 in tll-retrieval.ts; removed the duplicate TLL_SEED_CANDIDATE_COUNT declaration from memory-search.ts and imported the canonical constant; updated the unit test to reference the exported name instead of the literal 10.
  • feat(search): Phase 4B — broader IF retrieval preference (two-stage) #11 hardcoded inputTokens: 0 in chat adapter (58c3095) — LLMProvider.chat() returns Promise<string> (no usage), and nothing in the FirstMentionService path actually consumed the fields. Took the safer of the two reviewer-suggested options: dropped inputTokens / outputTokens from the local ChatResult shape entirely. Per-call cost telemetry continues to flow from LLMProvider.chat -> writeCostEvent unchanged. JSDoc on the trimmed type and on the runtime-container adapter both document the deliberate decision.

Pre-commit status (this push)

  • npx tsc --noEmit: clean
  • npm test: 1277/1277 tests pass across 127 test files
  • npx fallow audit: clean (audit gate excluded 1 inherited finding; new findings: 0)

PR is now ready for re-review.

Same root cause as benchmarks repo `327326a`: git sets GIT_INDEX_FILE
before invoking pre-commit hooks; fallow's `git worktree add` for
base-ref scanning performs a checkout that writes to GIT_INDEX_FILE,
corrupting the main worktree's index by replacing it with the base
ref's tree (silently deletes files the feature branch added that
don't exist on base, then commits those deletions).

Reproduced during the PR #18 review-response work — 7 unrelated
files briefly marked deleted in the index after a fallow run; had to
recover with git reset --soft + git read-tree HEAD.

Fix: unset both vars at the top of the hook so nested git invocations
run against the worktree's own default index.
…iew #11)

The `chatFn` adapter wired in `runtime-container.ts` returned hardcoded
zero token counts for every call. `LLMProvider.chat()` returns
`Promise<string>` (no usage), so threading real counts here would require
widening that interface across every adapter. Nothing in the
`FirstMentionService` path actually consumed the fields — they only
existed to satisfy the local `ChatResult` shape — so dropping them is
strictly safer than leaving misleading zeros in place. Per-call cost
telemetry continues to flow from `LLMProvider.chat` -> `writeCostEvent`
unchanged.

Updated:
  - `ChatResult` in `first-mention-service.ts` -> `{ text: string }` only,
    with a comment documenting the deliberate decision.
  - `runtime-container.ts` adapter no longer fabricates zero usage.
  - `first-mention-service.test.ts` fixture updated to match.
The same magic 10 lived in two places: `tll-retrieval.ts:expandViaTLL`
sliced its input ids before entity lookup, and `memory-search.ts`
re-declared a private `TLL_SEED_CANDIDATE_COUNT = 10` for the same
purpose. Defined the constant once in `tll-retrieval.ts` and re-used it
from both call sites so a tuning change can't drift between them.

Updated the unit test to reference the exported constant directly
instead of asserting against the literal 10.
…#9)

The original gate was a single alternation regex that fired on any
single occurrence of `first|last|before|after|then|later|track|...`.
That over-fired on plain factual queries that incidentally contained
one of those tokens — `what is my first name`, `the model used before
GPT-4`, `track my spending` — pulling in unrelated TLL chain memories
on the augmented retrieval path.

Replaced the gate with a two-tier check:

  1. ORDERING_TERMS_RE — a curated set of single-token signals
     (first/last/before/after/then/later/earlier/previous/next/prior).
     Only fires TLL when TWO co-occur, e.g. "what aspects did I
     discuss BEFORE and AFTER X".
  2. SEQUENCE_PATTERNS — phrase-level structural signals
     (`in (chronological/reverse/the) order`, `when did`, `since when`,
     `over time`, `evolution of`, `history|timeline of`,
     `originally`/`initially`, `progression of`,
     `how X evolved/shifted/changed`, `brought up`). Single phrase
     hit is enough.

Removed `track`, `sequence`, and bare `order` from the gate — they
were the largest false-positive contributors.

Updated `src/services/__tests__/tll-retrieval.test.ts`:
  - Positive list rewritten to canonical EO/MSR/TR shapes that hit
    one of the structural patterns or co-occurring ordering terms.
  - Negative list now includes the false-positive shapes the loose
    regex used to match (the three reviewer-cited ones plus a handful
    of single-ordering-term factual queries).

41/41 unit tests pass against the updated gate.
…n paths (review #8)

Three deliberate fail-open sites were swallowing errors with weak or
no log signal, hiding production failures behind ephemeral
`console.error('[tll]', ...)` lines and `process.stderr.write` calls
that no log scraper greps for. Behaviour stays fail-open by design
(per CLAUDE.md "no fallback modes" applies to mutations, not to
augmentation paths) — only the observability changes.

Changes:

  1. `memory-storage.ts:maybeAppendTll` — fire-and-forget TLL append now
     logs `[tll-append-failed]` with the message and stringified
     fallback for non-Error throws. Comment documents the deliberate
     fire-and-forget choice (ingest hot path can't block on chain
     bookkeeping).

  2. `memory-search.ts:maybeExpandViaTLL` — return type widened from
     `SearchResult[]` to `{ memories, failed, errorMessage? }` so the
     caller can surface the failure on the retrieval trace. On the
     catch path: log `[tll-expansion-failed]` and propagate
     `{ failed: true }`. Added comment marking the fail-open as
     deliberate.

  3. `appendTllAugmentation` — emits a `tll_expansion_failed` event on
     the active retrieval `TraceCollector` whenever
     `maybeExpandViaTLL` reports `failed: true`, so the trace artifacts
     written to disk capture the failure instead of dropping it.

  4. `first-mention-service.ts:invokeLlm` and helpers — replaced ad-hoc
     `process.stderr.write` lines with structured
     `[first-mention-llm-failed]` / `[first-mention-llm-salvaged]` /
     `[first-mention-mapping]` prefixes routed through `console.error`
     / `console.warn`. JSDoc on `invokeLlm` now documents the
     deliberate fail-open (the EO read path treats no-events as
     "no signal", not an error).

Existing first-mention and tll-retrieval unit tests pass unchanged.
…view #7)

`positionInConversation` was set directly to `turn_id`. That looked
correct for a single extraction, but the (user_id, memory_id) UNIQUE
on `first_mention_events` means a re-run of `extractAndStore` for the
same conversation silently keeps the FIRST inserted row — including
its position. If the LLM's turn_id assignment drifted between runs
(which it does in practice — non-deterministic decoding even at
temperature=0 plus prompt-cache-state variation), readers would see
position values that depend on which run happened to write first,
breaking deterministic chronological ordering.

Fix: `positionInConversation` is now the 0-based index in the FINAL
turn-id-sorted output, NOT `turn_id` itself. Sort first, then enumerate.
Re-runs produce identical (position, topic) tuples regardless of any
turn_id drift, so the post-write read is stable.

Updated `mapToEvents` in `src/services/first-mention-service.ts`:
  - Build candidates first (without position).
  - Sort by `turnId` ASC.
  - Assign `positionInConversation = index` during the final map.

Tests:

  - `src/services/__tests__/first-mention-service.test.ts`:
    * existing happy-path / sort tests now assert position 0/1/...
      instead of position == turn_id.
    * new `produces stable positionInConversation across re-runs even
      when LLM turn_id drifts` test runs `extractAndStore` twice with
      drifted turn_ids and confirms both runs produce the same
      `[0, 1]` position sequence.

  - `src/db/__tests__/repository-first-mentions.test.ts` (new):
    integration test seeding a memory + running `store()` twice with
    drifted turn_ids — asserts only 2 rows survive, position sequence
    is `[0, 1]`, and the first-write turn_id (5) is what the read-back
    returns (ON CONFLICT DO NOTHING semantics).
…xtract (review #5)

The two PR #18 read endpoints had no HTTP-level tests — only the
underlying repository / service unit tests existed, which left the
schema-validation middleware and route-level wiring uncovered. A
schema rename or a route-handler regression could ship green.

New file `src/routes/__tests__/event-chains-and-first-mentions.test.ts`
mirrors the route-test pattern from `src/__tests__/route-validation.test.ts`:
spin up an Express app on `app.listen(0, ...)`, wire `createMemoryRouter`
against a real `MemoryService` backed by the test DB, drive endpoints
via `fetch`. The MemoryService gets a real `FirstMentionRepository`
plus a stubbed `chatFn` so the LLM call returns a deterministic JSON
array.

Coverage:

  GET /v1/memories/event-chains
    - 400 when `user_id` is missing
    - 400 when `entity_ids` is missing
    - 400 when `entity_ids` contains an invalid UUID
    - 400 when `entity_ids` exceeds the 100-entry cap (review #6)
    - 400 when `entity_ids` is present but holds only empty tokens
    - happy path: seed memory + entity + TLL row, hit the route, parse
      the response with `EventChainsResponseSchema`

  POST /v1/memories/first-mentions/extract
    - 400 when `user_id` is missing
    - 400 when `conversation_text` is empty
    - 400 when `conversation_text` exceeds MAX_CONVERSATION_LENGTH
      (100_000 chars)
    - 400 when `memory_ids_by_turn_id` is missing entirely
    - 400 when `source_site` is missing
    - happy path: stub LLM returns 2 events, route stores+returns
      them, response parsed with `FirstMentionsExtractResponseSchema`,
      `position_in_conversation` is the post-sorted [0, 1] sequence
      from review #7.

12/12 new tests pass.
@ethanj
Copy link
Copy Markdown
Contributor

ethanj commented May 6, 2026

Code review (v2 — after dc7b9ff..c46cddf)

Re-read the 12 follow-up commits, each labeled to the v1 finding it addresses. 10 of the 11 prior findings are properly resolved. One fix (#2 — move TLL augmentation past the similarity gate) is the right design but introduces a new blocking regression on the response-formatting path. Details below.

✅ Verified resolved

❌ New regression introduced by fix #2

Blocking. hydrateChainMemories returns objects that aren't valid SearchResults and will crash the response-rendering path. 91676f4 src/services/memory-search.ts:198-210

The intent is right — chain-augmented rows shouldn't pass through the similarity gate. But the new shape sets similarity: null and omits score, source_site, summary, and several other required SearchResult/MemoryRow fields. The as unknown as SearchResult cast at line 210 hides this from the typechecker.

When the augmented rows reach retrieval-format.ts via appendTllAugmentationassembleResponsepackageSearchOutputbuildInjectionformatFullLine/formatStagedLinebuildCommonAttrs, the formatter crashes on missing/null fields:

// src/services/retrieval-format.ts:275-286
function buildCommonAttrs(memory: SearchResult, index: number): string {
  return [
    `index="${index + 1}"`,
    `source="${escapeXml(memory.source_site)}"`,         // undefined → "undefined" in output
    `memory_id="${memory.id}"`,
    `created_at="${memory.created_at.toISOString()}"`,
    `importance="${memory.importance.toFixed(1)}"`,
    `similarity="${memory.similarity.toFixed(2)}"`,      // 💥 TypeError: null
    `score="${memory.score.toFixed(2)}"`,                // 💥 TypeError: undefined
    ...
  ].join(' ');
}

SearchResult requires similarity: number (not nullable) and score: number, plus the full MemoryRow surface (source_site, summary, embedding, memory_type, metadata, keywords, overview, trust_score, observed_at, last_accessed_at, access_count, expired_at, deleted_at, network, opinion_confidence, observation_subject, source_url, episode_id, status).

Why CI is green:

  1. The as unknown as SearchResult cast silences tsc --noEmit.
  2. Unit tests for maybeExpandViaTLL (tll-retrieval.test.ts) mock at the expandViaTLL boundary, never running real hydration.
  3. The new HTTP tests in event-chains-and-first-mentions.test.ts hit the read endpoints (which use chainEventsForEntities directly), not the search retrieval path that triggers appendTllAugmentation.
  4. There is no end-to-end test where a shouldUseTLL-positive query goes through performSearch with a real seeded chain.

This will fire the first time a production user issues an ordering query against a workspace that has TLL chains populated — i.e., the moment the feature does what it's supposed to.

Suggested fix. Either of these lines up cleanly:

  • (a) Hydrate the full MemoryRow shape, with similarity: 0 and score: 0 (or copy the top-candidate's similarity as an inherited signal). The query becomes SELECT * (or an explicit field list matching MemoryRow) and the synthesized object covers every required field. Strictest fix — the type cast can come out.

  • (b) Special-case retrieval_signal === 'tll-chain' in buildCommonAttrs — emit the chain-membership flag instead of similarity/score, and tolerate missing fields. Keeps the projection narrow but spreads TLL awareness into the formatter.

  • (c) is fragile but minimal: set similarity: 0, score: 0, leave the missing string fields as empty strings or 'tll-chain'. Stops the crash but pollutes the rendered output with placeholder values.

Add an integration test that drives a shouldUseTLL-positive query through performSearch against a populated chain and asserts the response renders without throwing. The unit tests as written cannot catch this class of bug.

Other notes

  • The advisory-lock namespace 0x544c4c00 ("TLL\0") is a nice touch and avoids collisions with unrelated callers; worth a comment pointing at any other namespaces in use elsewhere if/when they're added.
  • Per-entity transactions in appendOne (one DB round-trip per entity) is a bigger correctness win than a perf cost in the typical case (1–3 entities per memory). If batch-ingest perf becomes an issue later, consider a sorted-by-entity-id batched path that still serializes via the advisory lock per chain.
  • The structured-logging change in feat(search): EXP-14 — retrieval-side abstention gate #8 is good but I'd lean toward routing through the project's structured logger (when one exists) rather than console.error — but the prefix scheme ([tll-expansion-failed]) is grep-friendly and a strict improvement over the prior state.

Net: ship-ready once #2's hydration shape is fixed and an integration test covers the augmentation path.

@ethanj
Copy link
Copy Markdown
Contributor

ethanj commented May 6, 2026

additional review from codex:

Findings

  • High: TLL’s public event-chain read path is still append-order, not observation-time order. appendOne() selects predecessor and position from the current chain tip by position_in_chain
    atomicmemory-core/src/db/repository-tll.ts:81, and chainEventsForEntities() returns endpoint events ordered by that same append position atomicmemory-core/src/db/repository-tll.ts:165. That
    contradicts the stated backfill/chronology contract in atomicmemory-core/src/services/memory-storage.ts:228. Out-of-order/backfilled appends for the same entity will return and predecessor-link by
    ingest order, so EO/TR claims are not proven.
  • Medium: POST /memories/first-mentions/extract validates memory_ids_by_turn_id as arbitrary string values atomicmemory-core/src/schemas/memories.ts:584, then passes them through to an insert into a
    UUID column atomicmemory-core/src/db/repository-first-mentions.ts:58. If the extractor emits a matching turn for an invalid memory ID, this becomes a DB error and the route returns 500 via
    atomicmemory-core/src/routes/route-errors.ts:14, not a 400. The route tests cover missing fields, but not invalid mapping values atomicmemory-core/src/routes/tests/event-chains-and-first-
    mentions.test.ts:215.

Squashed across four review rounds on PR #18, all of which surfaced after
the initial wave of fixes (#1#11). Each item below maps to a finding in
the PR's review threads.

Search path

- hydrateChainMemories now returns fully-shaped SearchResults via SELECT *
  + normalizeMemoryRow. The previous projection set similarity: null and
  omitted source_site / score / summary / observed_at, crashing the
  buildInjection formatter (`memory.similarity.toFixed(2)`) the first
  time TLL augmentation actually fired against a populated chain. The
  `as unknown as SearchResult` cast was hiding it from tsc. (review v2 #1)
- Hydration query now uses `unnest($2::uuid[]) WITH ORDINALITY ... ORDER
  BY req.ord` so the chronological order chainsFor returns through the
  augmentation pipeline survives. (review v4 #2)
- Workspace isolation: hydrateChainMemories filters `m.workspace_id IS NULL`
  to match the gate behavior performSearch's postProcessResults already
  applies. Without it, a workspace memory chained from a global memory's
  entity could surface in a global response. (review v4 #1)
- Defensive `relevance: 1.0` on hydrated rows locks in the chain-membership
  bypass invariant against future filter drift. The augmented rows are
  appended after applySearchRelevanceFilter today, but `similarity: 0`
  + `score: 0` would make them load-bearing on `relevance` if any future
  filter past appendTllAugmentation checked `memory.relevance >= threshold`.
  Regression test drives performSearch with a high
  retrievalOptions.relevanceThreshold and confirms the augmented row
  survives. (review v5 #2)

Repository

- TLL chain reads (chain, chainEventsForEntities) now derive chronological
  position and predecessor via `ROW_NUMBER()` and `LAG()` window functions
  ordered by observation_date ASC (with stored position_in_chain as a
  deterministic tiebreaker for events sharing an observation_date). The
  stored predecessor_memory_id and position_in_chain columns become
  insertion-order audit metadata; the API surface returns chronological
  ordering. Backfilled out-of-order events surface in their true position
  with chronologically-correct predecessors. (review v3 #1)
- chainEventsForEntities adds `m.workspace_id IS NULL` for the same
  reason as the search-path fix above — the global event-chains HTTP
  endpoint must not surface workspace memories. (review v4 #1)

Schema

- FirstMentionsExtractBodySchema validates memory_ids_by_turn_id values
  as UUIDs so a non-UUID returns 400 (schema layer) instead of leaking a
  Postgres "invalid input syntax for type uuid" as 500 from the route.
  (review v3 #2)
- New SearchResult.retrieval_signal optional field tags chain-augmented
  rows so observability and any future ranker can distinguish them from
  similarity-ranked candidates. (review v2 #1, plumbed through v4)

Refactor

- Extracted maybeExpandViaTLL, hydrateChainMemories, appendTllAugmentation
  out of memory-search.ts into a sibling tll-augmentation.ts module. The
  search file dropped from 551 → 385 LOC, back under the 400-LOC project
  cap. Shared internal types (PostProcessedSearch, RelevanceFilterSummary)
  pulled into memory-search-types.ts so the two consumers don't duplicate.
  (review v5 #4)

Test coverage

- New integration test (services/__tests__/tll-augmentation-integration.test.ts)
  drives performSearch end-to-end through appendTllAugmentation, with
  cases for: rendering augmented rows through buildInjection without
  crashing, the SQL contract (unnest ORDINALITY + ORDER BY req.ord),
  workspace-leak prevention, the relevance-1.0 bypass invariant, and
  no-augmentation for non-TLL queries.
- New repository tests for backfill chronological ordering of chain()
  and chainEventsForEntities() and for chainEventsForEntities workspace
  isolation.
- New route test asserts 400 (not 500) on non-UUID memory_ids_by_turn_id.

Verification: `npx tsc --noEmit` clean · 1286/1286 vitest pass against the
test DB · `npx fallow audit --no-cache` exit 0.

Deferrals (parked durably in the research repo's tech-debt log at
Atomicmemory-research/docs/core-repo/tech-debt.md):
- predecessor_memory_id ON DELETE CASCADE vs SET NULL — design call,
  contested between two reviewers; current default kept.
- process.env.ALLOWED_ORIGINS direct read in src/routes/memories.ts —
  pre-existing CLAUDE.md violation, out of scope for this PR.
- shouldUseTLL non-adjacent "after did" — low operational risk;
  tightening the regex was the explicit goal of review #9 and
  re-broadening risks reintroducing false positives.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@moralespanitz
Copy link
Copy Markdown
Contributor Author

it's ok @ethanj

@ethanj ethanj merged commit 6e9e51e into main May 6, 2026
2 checks passed
@ethanj ethanj deleted the feat/first-mention-tll-on-main branch May 6, 2026 23:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants