Skip to content

Commit 216abc9

Browse files
ethanjclaude
andcommitted
fix(tll, schemas): address PR #18 review v2–v5 follow-ups
Squashed across four review rounds on PR #18, all of which surfaced after the initial wave of fixes (#1#11). Each item below maps to a finding in the PR's review threads. Search path - hydrateChainMemories now returns fully-shaped SearchResults via SELECT * + normalizeMemoryRow. The previous projection set similarity: null and omitted source_site / score / summary / observed_at, crashing the buildInjection formatter (`memory.similarity.toFixed(2)`) the first time TLL augmentation actually fired against a populated chain. The `as unknown as SearchResult` cast was hiding it from tsc. (review v2 #1) - Hydration query now uses `unnest($2::uuid[]) WITH ORDINALITY ... ORDER BY req.ord` so the chronological order chainsFor returns through the augmentation pipeline survives. (review v4 #2) - Workspace isolation: hydrateChainMemories filters `m.workspace_id IS NULL` to match the gate behavior performSearch's postProcessResults already applies. Without it, a workspace memory chained from a global memory's entity could surface in a global response. (review v4 #1) - Defensive `relevance: 1.0` on hydrated rows locks in the chain-membership bypass invariant against future filter drift. The augmented rows are appended after applySearchRelevanceFilter today, but `similarity: 0` + `score: 0` would make them load-bearing on `relevance` if any future filter past appendTllAugmentation checked `memory.relevance >= threshold`. Regression test drives performSearch with a high retrievalOptions.relevanceThreshold and confirms the augmented row survives. (review v5 #2) Repository - TLL chain reads (chain, chainEventsForEntities) now derive chronological position and predecessor via `ROW_NUMBER()` and `LAG()` window functions ordered by observation_date ASC (with stored position_in_chain as a deterministic tiebreaker for events sharing an observation_date). The stored predecessor_memory_id and position_in_chain columns become insertion-order audit metadata; the API surface returns chronological ordering. Backfilled out-of-order events surface in their true position with chronologically-correct predecessors. (review v3 #1) - chainEventsForEntities adds `m.workspace_id IS NULL` for the same reason as the search-path fix above — the global event-chains HTTP endpoint must not surface workspace memories. (review v4 #1) Schema - FirstMentionsExtractBodySchema validates memory_ids_by_turn_id values as UUIDs so a non-UUID returns 400 (schema layer) instead of leaking a Postgres "invalid input syntax for type uuid" as 500 from the route. (review v3 #2) - New SearchResult.retrieval_signal optional field tags chain-augmented rows so observability and any future ranker can distinguish them from similarity-ranked candidates. (review v2 #1, plumbed through v4) Refactor - Extracted maybeExpandViaTLL, hydrateChainMemories, appendTllAugmentation out of memory-search.ts into a sibling tll-augmentation.ts module. The search file dropped from 551 → 385 LOC, back under the 400-LOC project cap. Shared internal types (PostProcessedSearch, RelevanceFilterSummary) pulled into memory-search-types.ts so the two consumers don't duplicate. (review v5 #4) Test coverage - New integration test (services/__tests__/tll-augmentation-integration.test.ts) drives performSearch end-to-end through appendTllAugmentation, with cases for: rendering augmented rows through buildInjection without crashing, the SQL contract (unnest ORDINALITY + ORDER BY req.ord), workspace-leak prevention, the relevance-1.0 bypass invariant, and no-augmentation for non-TLL queries. - New repository tests for backfill chronological ordering of chain() and chainEventsForEntities() and for chainEventsForEntities workspace isolation. - New route test asserts 400 (not 500) on non-UUID memory_ids_by_turn_id. Verification: `npx tsc --noEmit` clean · 1286/1286 vitest pass against the test DB · `npx fallow audit --no-cache` exit 0. Deferrals (parked durably in the research repo's tech-debt log at Atomicmemory-research/docs/core-repo/tech-debt.md): - predecessor_memory_id ON DELETE CASCADE vs SET NULL — design call, contested between two reviewers; current default kept. - process.env.ALLOWED_ORIGINS direct read in src/routes/memories.ts — pre-existing CLAUDE.md violation, out of scope for this PR. - shouldUseTLL non-adjacent "after did" — low operational risk; tightening the regex was the explicit goal of review #9 and re-broadening risks reintroducing false positives. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent c46cddf commit 216abc9

9 files changed

Lines changed: 606 additions & 161 deletions

File tree

src/db/__tests__/repository-tll.test.ts

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,25 @@ describe('TllRepository', () => {
6666
return { id, date };
6767
}
6868

69+
/**
70+
* Store a workspace-scoped memory. Used to verify that the global
71+
* `chainEventsForEntities` endpoint never surfaces workspace memories.
72+
* Any non-null UUID for `workspaceId` is sufficient for the
73+
* `workspace_id IS NULL` filter to drop the row.
74+
*/
75+
async function makeWorkspaceMemory(content: string, seed: number, date: Date): Promise<SeededMemory> {
76+
const id = await memoryRepo.storeMemory({
77+
userId: TEST_USER,
78+
content,
79+
embedding: unitVector(seed),
80+
importance: 0.6,
81+
sourceSite: 'test',
82+
workspaceId: '00000000-0000-0000-0000-00000000ffff',
83+
agentId: '00000000-0000-0000-0000-0000000000aa',
84+
});
85+
return { id, date };
86+
}
87+
6988
/** Count rows in TLL for a (user, entity, memory) triple. */
7089
async function countRows(entityId: string, memoryId: string): Promise<number> {
7190
const r = await pool.query<{ c: string }>(
@@ -169,6 +188,22 @@ describe('TllRepository', () => {
169188
const events = await tllRepo.chain(TEST_USER, ent.id);
170189
expect(events).toEqual([]);
171190
});
191+
192+
it('orders by observation_date — backfilled out-of-order events get chronological positions and predecessors', async () => {
193+
const ent = await makeEntity('BackfillChain', 56);
194+
const middle = await makeMemory('mid', 57, new Date('2026-02-01'));
195+
const latest = await makeMemory('late', 58, new Date('2026-03-01'));
196+
const earliest = await makeMemory('early', 59, new Date('2026-01-01'));
197+
198+
await tllRepo.append(TEST_USER, middle.id, [ent.id], middle.date);
199+
await tllRepo.append(TEST_USER, latest.id, [ent.id], latest.date);
200+
await tllRepo.append(TEST_USER, earliest.id, [ent.id], earliest.date);
201+
202+
const events = await tllRepo.chain(TEST_USER, ent.id);
203+
expect(events.map((e) => e.memoryId)).toEqual([earliest.id, middle.id, latest.id]);
204+
expect(events.map((e) => e.positionInChain)).toEqual([0, 1, 2]);
205+
expect(events.map((e) => e.predecessorMemoryId)).toEqual([null, earliest.id, middle.id]);
206+
});
172207
});
173208

174209
describe('chainsFor()', () => {
@@ -261,5 +296,39 @@ describe('TllRepository', () => {
261296
expect(result).toHaveLength(1);
262297
expect(result[0].events.map((e) => e.memoryId)).toEqual([live.id]);
263298
});
299+
300+
it('orders backfilled events by observation_date with chronological predecessors', async () => {
301+
const ent = await makeEntity('Backfill', 111);
302+
// Insertion order: middle, latest, earliest. Observation order: jan, feb, mar.
303+
const middle = await makeMemory('middle', 112, new Date('2026-02-01'));
304+
const latest = await makeMemory('latest', 113, new Date('2026-03-01'));
305+
const earliest = await makeMemory('earliest', 114, new Date('2026-01-01'));
306+
307+
await tllRepo.append(TEST_USER, middle.id, [ent.id], middle.date);
308+
await tllRepo.append(TEST_USER, latest.id, [ent.id], latest.date);
309+
await tllRepo.append(TEST_USER, earliest.id, [ent.id], earliest.date);
310+
311+
const result = await tllRepo.chainEventsForEntities(TEST_USER, [ent.id]);
312+
expect(result).toHaveLength(1);
313+
const events = result[0].events;
314+
expect(events.map((e) => e.memoryId)).toEqual([earliest.id, middle.id, latest.id]);
315+
expect(events.map((e) => e.positionInChain)).toEqual([0, 1, 2]);
316+
expect(events[0].predecessorMemoryId).toBeNull();
317+
expect(events[1].predecessorMemoryId).toBe(earliest.id);
318+
expect(events[2].predecessorMemoryId).toBe(middle.id);
319+
});
320+
321+
it('omits events whose memory has a non-null workspace_id (global endpoint isolation)', async () => {
322+
const ent = await makeEntity('GlobalEntity', 121);
323+
const global = await makeMemory('global memory', 122, new Date('2026-01-01'));
324+
const workspaceMem = await makeWorkspaceMemory('workspace memory', 123, new Date('2026-02-01'));
325+
326+
await tllRepo.append(TEST_USER, global.id, [ent.id], global.date);
327+
await tllRepo.append(TEST_USER, workspaceMem.id, [ent.id], workspaceMem.date);
328+
329+
const result = await tllRepo.chainEventsForEntities(TEST_USER, [ent.id]);
330+
expect(result).toHaveLength(1);
331+
expect(result[0].events.map((e) => e.memoryId)).toEqual([global.id]);
332+
});
264333
});
265334
});

src/db/repository-tll.ts

Lines changed: 42 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -102,22 +102,40 @@ export class TllRepository {
102102
}
103103

104104
/**
105-
* Get the full event chain for an entity, ordered chronologically.
106-
* Used for "list events in order" (EO) and "how did X evolve" (TR/MSR).
105+
* Get the full event chain for an entity, ordered by observation_date
106+
* (the conversation timestamp). Used for EO/TR/MSR queries that need
107+
* chronological order — a backfilled event with an earlier
108+
* observation_date surfaces in its true chronological position even
109+
* though it was inserted last.
110+
*
111+
* `positionInChain` returned here is the 0-based chronological rank,
112+
* derived via `ROW_NUMBER()` over the chronological order. The stored
113+
* `position_in_chain` column is insertion-order audit metadata and is
114+
* not exposed by this API. `predecessorMemoryId` is the immediate
115+
* chronologically-prior memory (via `LAG()`) — also chronological,
116+
* not the position-tip predecessor recorded at insert time.
117+
*
118+
* Tiebreaker for events sharing an `observation_date` (e.g. the same
119+
* conversation timestamp) is the stored `position_in_chain` (insertion
120+
* order), keeping the result deterministic.
107121
*/
108122
async chain(userId: string, entityId: string): Promise<TLLEvent[]> {
109123
const result = await this.pool.query(
110-
`SELECT memory_id, predecessor_memory_id, observation_date, position_in_chain
124+
`SELECT memory_id,
125+
observation_date,
126+
LAG(memory_id) OVER w AS chronological_predecessor,
127+
ROW_NUMBER() OVER w - 1 AS chronological_position
111128
FROM temporal_linkage_list
112129
WHERE user_id = $1 AND entity_id = $2
113-
ORDER BY position_in_chain ASC`,
130+
WINDOW w AS (ORDER BY observation_date ASC, position_in_chain ASC)
131+
ORDER BY observation_date ASC, position_in_chain ASC`,
114132
[userId, entityId],
115133
);
116134
return result.rows.map((row) => ({
117135
memoryId: row.memory_id,
118-
predecessorMemoryId: row.predecessor_memory_id,
136+
predecessorMemoryId: row.chronological_predecessor ?? null,
119137
observationDate: row.observation_date,
120-
positionInChain: row.position_in_chain,
138+
positionInChain: Number(row.chronological_position),
121139
}));
122140
}
123141

@@ -144,7 +162,11 @@ export class TllRepository {
144162
* by EO-shaped read paths that need content alongside chain position.
145163
*
146164
* Returns one entry per entity; entities with no events are dropped.
147-
* Within an entity, events are ordered by position_in_chain ASC.
165+
* Within an entity, events are ordered by `observation_date` (the
166+
* conversation timestamp), not insertion order — see `chain()` for
167+
* the rationale and the `LAG`/`ROW_NUMBER` derivation of
168+
* `predecessorMemoryId` and `positionInChain`. Tiebreaker for events
169+
* sharing an observation_date is the stored insertion `position_in_chain`.
148170
*/
149171
async chainEventsForEntities(
150172
userId: string,
@@ -161,20 +183,28 @@ export class TllRepository {
161183
}>> {
162184
if (entityIds.length === 0) return [];
163185
const unique = [...new Set(entityIds)];
186+
// `m.workspace_id IS NULL` — this is the global event-chain endpoint;
187+
// workspace-scoped memories must not surface here even if they share
188+
// an entity with a global memory.
164189
const result = await this.pool.query(
165190
`SELECT t.entity_id,
166191
t.memory_id,
167-
t.predecessor_memory_id,
168192
t.observation_date,
169-
t.position_in_chain,
193+
LAG(t.memory_id) OVER w AS chronological_predecessor,
194+
ROW_NUMBER() OVER w - 1 AS chronological_position,
170195
m.content
171196
FROM temporal_linkage_list t
172197
JOIN memories m ON m.id = t.memory_id
173198
WHERE t.user_id = $1
174199
AND t.entity_id = ANY($2::uuid[])
175200
AND m.deleted_at IS NULL
176201
AND m.status = 'active'
177-
ORDER BY t.entity_id, t.position_in_chain ASC`,
202+
AND m.workspace_id IS NULL
203+
WINDOW w AS (
204+
PARTITION BY t.entity_id
205+
ORDER BY t.observation_date ASC, t.position_in_chain ASC
206+
)
207+
ORDER BY t.entity_id, t.observation_date ASC, t.position_in_chain ASC`,
178208
[userId, unique],
179209
);
180210
const grouped = new Map<string, Array<{
@@ -190,8 +220,8 @@ export class TllRepository {
190220
memoryId: row.memory_id,
191221
content: row.content,
192222
observationDate: row.observation_date,
193-
positionInChain: row.position_in_chain,
194-
predecessorMemoryId: row.predecessor_memory_id,
223+
positionInChain: Number(row.chronological_position),
224+
predecessorMemoryId: row.chronological_predecessor ?? null,
195225
});
196226
grouped.set(row.entity_id, list);
197227
}

src/db/repository-types.ts

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -169,6 +169,12 @@ export interface SearchResult extends MemoryRow {
169169
matched_facts?: string[];
170170
matched_fact_ids?: string[];
171171
retrieval_layer?: 'memory' | 'atomic_fact';
172+
/**
173+
* Origin of the row in the candidate pool. Absent for similarity-ranked
174+
* rows; set to `'tll-chain'` for rows hydrated by TLL chain expansion
175+
* after the relevance gate (see `hydrateChainMemories`).
176+
*/
177+
retrieval_signal?: 'tll-chain';
172178
}
173179

174180
export type AtomicFactType = 'preference' | 'project' | 'knowledge' | 'person' | 'plan';

src/routes/__tests__/event-chains-and-first-mentions.test.ts

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -272,6 +272,19 @@ describe('POST /memories/first-mentions/extract — schema validation', () => {
272272
});
273273
expect(error).toMatch(/source_site/i);
274274
});
275+
276+
it('returns 400 (not 500) when a memory_ids_by_turn_id value is not a UUID', async () => {
277+
// Without schema-layer UUID validation, a bad value reaches Postgres
278+
// and crashes with "invalid input syntax for type uuid", which the
279+
// route handler maps to 500. The schema must reject it as 400 instead.
280+
const { error } = await postExpecting400({
281+
user_id: TEST_USER,
282+
conversation_text: 'hi',
283+
source_site: 'beam',
284+
memory_ids_by_turn_id: { '1': INVALID_UUID },
285+
});
286+
expect(error).toMatch(/memory_ids_by_turn_id/i);
287+
});
275288
});
276289

277290
// ---------------------------------------------------------------------------

src/schemas/memories.ts

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -586,7 +586,11 @@ export const FirstMentionsExtractBodySchema = z
586586
user_id: z.string().min(1),
587587
conversation_text: z.string().min(1).max(MAX_CONVERSATION_LENGTH),
588588
source_site: z.string().min(1),
589-
memory_ids_by_turn_id: z.record(z.string(), z.string()),
589+
// Values must be UUIDs — they're inserted into a UUID column
590+
// (`first_mention_events.memory_id`). Validate at the schema layer so a
591+
// bad value returns 400 instead of leaking a Postgres "invalid input
592+
// syntax for type uuid" error as a 500.
593+
memory_ids_by_turn_id: z.record(z.string(), z.string().uuid()),
590594
})
591595
.transform(b => {
592596
const map = new Map<number, string>();

0 commit comments

Comments
 (0)