feat(search): Phase 4B — broader IF retrieval preference (two-stage) by moralespanitz · Pull Request #11 · atomicmemory/atomicmemory-core

moralespanitz · 2026-05-01T20:12:44Z

Summary

Two-stage retrieval that prefers instruction-tagged memories when the query is itself instruction-style. Targets BEAM IF — currently 0/2 across all 19 sprint-2 iters and 0/2 on synth slices.

Why

Sprint 2 EXP-05 H-310 markers extension lifted fact_role: 'instruction' coverage from 0.8% → 4.1%, but the boost on a 4% slice didn't surface instruction memories above non-instruction memories. The fix is at the retrieval layer.

When query is instruction-style ("how should I format X?", "what did I tell you to always do?"):

Vector search filtered to metadata.fact_role === 'instruction' — top instructionPreferenceTopK (default 5)
Regular vector search for the remainder
Merge with instruction-stage first, then dedupe

When flag off OR query isn't instruction-style: existing pipeline unchanged.

Scope (525 insertions, 1 deletion)

src/services/instruction-query-detector.ts (new, 59 LOC) — keyword classifier
src/services/instruction-preference-retrieval.ts (new, 130 LOC) — two-stage helper
src/services/__tests__/instruction-preference-retrieval.test.ts (new, 288 LOC, 12 tests passing)
src/services/search-pipeline.ts (+35 LOC) — wires new stage between initial retrieval and entity co-retrieval
src/config.ts (+14 LOC) — two new flags + env loaders + allowlist
src/app/runtime-container.ts (+2 LOC) — CoreRuntimeConfig extension

Implementation note (flagged by subagent)

The existing repository-vector-search.ts does NOT support metadata filters in SQL — SearchStore.searchSimilar(userId, queryEmbedding, limit, sourceSite?, referenceTime?) has no seam for a fact_role filter. Rather than invent new SQL infrastructure inline (out of scope), this PR oversamples via existing searchSimilar with max(candidateDepth, topK * 5) and post-filters in JS for metadata.fact_role === 'instruction'.

Follow-up could push the filter into SQL (new searchSimilarFilteredByMetadata store method or indexed partial expression on (metadata->>'fact_role')) to skip oversampling on instruction-heavy corpora.

New config keys (defaults-off)

instructionPreferenceRetrievalEnabled: boolean = false
instructionPreferenceTopK: number = 5

Tests

12 new cases all passing:

detector tests: instruction phrasings, non-instruction, empty, case-insensitivity
retrieval tests: 4 required cases (flag-off no-op, non-instruction no-op, instruction-style surfacing, mixed-corpus low-similarity surfacing) + 5 edge cases
pre-existing tests still green: search-pipeline-runtime-config (5), instruction-boost (8)

Test plan

Typecheck clean
All new + existing tests pass
Stage-7 dryrun with this enabled — measure IF lift from 0/2 baseline

Routes instruction-style queries ("what did I tell you", "what's my preference", "how should I", "always", "never", etc.) to a dedicated retrieval stage that surfaces memories tagged `metadata.fact_role: 'instruction'` ahead of general results. Targets the BEAM IF slice where the EXP-05 boost on a 4% subset is insufficient to move the score — fix is at the routing layer, not the boost layer. Defaults-OFF behind `instructionPreferenceRetrievalEnabled`. When the flag is on AND the query matches the instruction-style detector, the pipeline oversamples via `searchSimilar` and post-filters by `metadata.fact_role === 'instruction'`, then merges the top `instructionPreferenceTopK` (default 5) instruction-tagged results ahead of the existing candidate pool with id-dedup. Design note: the existing `repository-vector-search` does NOT support a metadata filter at the SQL layer. Adding one is out-of-scope for this PR. The post-filter approach delivers the routing behavior the BEAM IF slice needs without inventing SQL infrastructure inline. A follow-up can push the filter into SQL once we've measured the score lift. Files: - src/services/instruction-query-detector.ts (new, 59 LOC) — pure regex/keyword classifier; no LLM. - src/services/instruction-preference-retrieval.ts (new, 130 LOC) — two-stage retrieval helper. - src/services/search-pipeline.ts — wires the new stage between initial retrieval and entity co-retrieval; gated by the new flag. - src/config.ts — adds `instructionPreferenceRetrievalEnabled` (default false) and `instructionPreferenceTopK` (default 5) to RuntimeConfig, env loader, and INTERNAL_POLICY_CONFIG_FIELDS. - src/app/runtime-container.ts — adds both fields to CoreRuntimeConfig. - src/services/__tests__/instruction-preference-retrieval.test.ts (new, 288 LOC, 12 tests) — flag-off no-op, non-instruction-query no-op, instruction-tagged surfacing, mixed-corpus low-similarity surfacing, zero-instructions corpus, dedup, topK=0 short-circuit, sourceSite/ referenceTime threading. Verification: - npx tsc --noEmit → clean. - 12/12 new tests pass; existing search-pipeline-runtime-config and instruction-boost tests still green.

…iew #11) The `chatFn` adapter wired in `runtime-container.ts` returned hardcoded zero token counts for every call. `LLMProvider.chat()` returns `Promise<string>` (no usage), so threading real counts here would require widening that interface across every adapter. Nothing in the `FirstMentionService` path actually consumed the fields — they only existed to satisfy the local `ChatResult` shape — so dropping them is strictly safer than leaving misleading zeros in place. Per-call cost telemetry continues to flow from `LLMProvider.chat` -> `writeCostEvent` unchanged. Updated: - `ChatResult` in `first-mention-service.ts` -> `{ text: string }` only, with a comment documenting the deliberate decision. - `runtime-container.ts` adapter no longer fabricates zero usage. - `first-mention-service.test.ts` fixture updated to match.

Squashed across four review rounds on PR #18, all of which surfaced after the initial wave of fixes (#1–#11). Each item below maps to a finding in the PR's review threads. Search path - hydrateChainMemories now returns fully-shaped SearchResults via SELECT * + normalizeMemoryRow. The previous projection set similarity: null and omitted source_site / score / summary / observed_at, crashing the buildInjection formatter (`memory.similarity.toFixed(2)`) the first time TLL augmentation actually fired against a populated chain. The `as unknown as SearchResult` cast was hiding it from tsc. (review v2 #1) - Hydration query now uses `unnest($2::uuid[]) WITH ORDINALITY ... ORDER BY req.ord` so the chronological order chainsFor returns through the augmentation pipeline survives. (review v4 #2) - Workspace isolation: hydrateChainMemories filters `m.workspace_id IS NULL` to match the gate behavior performSearch's postProcessResults already applies. Without it, a workspace memory chained from a global memory's entity could surface in a global response. (review v4 #1) - Defensive `relevance: 1.0` on hydrated rows locks in the chain-membership bypass invariant against future filter drift. The augmented rows are appended after applySearchRelevanceFilter today, but `similarity: 0` + `score: 0` would make them load-bearing on `relevance` if any future filter past appendTllAugmentation checked `memory.relevance >= threshold`. Regression test drives performSearch with a high retrievalOptions.relevanceThreshold and confirms the augmented row survives. (review v5 #2) Repository - TLL chain reads (chain, chainEventsForEntities) now derive chronological position and predecessor via `ROW_NUMBER()` and `LAG()` window functions ordered by observation_date ASC (with stored position_in_chain as a deterministic tiebreaker for events sharing an observation_date). The stored predecessor_memory_id and position_in_chain columns become insertion-order audit metadata; the API surface returns chronological ordering. Backfilled out-of-order events surface in their true position with chronologically-correct predecessors. (review v3 #1) - chainEventsForEntities adds `m.workspace_id IS NULL` for the same reason as the search-path fix above — the global event-chains HTTP endpoint must not surface workspace memories. (review v4 #1) Schema - FirstMentionsExtractBodySchema validates memory_ids_by_turn_id values as UUIDs so a non-UUID returns 400 (schema layer) instead of leaking a Postgres "invalid input syntax for type uuid" as 500 from the route. (review v3 #2) - New SearchResult.retrieval_signal optional field tags chain-augmented rows so observability and any future ranker can distinguish them from similarity-ranked candidates. (review v2 #1, plumbed through v4) Refactor - Extracted maybeExpandViaTLL, hydrateChainMemories, appendTllAugmentation out of memory-search.ts into a sibling tll-augmentation.ts module. The search file dropped from 551 → 385 LOC, back under the 400-LOC project cap. Shared internal types (PostProcessedSearch, RelevanceFilterSummary) pulled into memory-search-types.ts so the two consumers don't duplicate. (review v5 #4) Test coverage - New integration test (services/__tests__/tll-augmentation-integration.test.ts) drives performSearch end-to-end through appendTllAugmentation, with cases for: rendering augmented rows through buildInjection without crashing, the SQL contract (unnest ORDINALITY + ORDER BY req.ord), workspace-leak prevention, the relevance-1.0 bypass invariant, and no-augmentation for non-TLL queries. - New repository tests for backfill chronological ordering of chain() and chainEventsForEntities() and for chainEventsForEntities workspace isolation. - New route test asserts 400 (not 500) on non-UUID memory_ids_by_turn_id. Verification: `npx tsc --noEmit` clean · 1286/1286 vitest pass against the test DB · `npx fallow audit --no-cache` exit 0. Deferrals (parked durably in the research repo's tech-debt log at Atomicmemory-research/docs/core-repo/tech-debt.md): - predecessor_memory_id ON DELETE CASCADE vs SET NULL — design call, contested between two reviewers; current default kept. - process.env.ALLOWED_ORIGINS direct read in src/routes/memories.ts — pre-existing CLAUDE.md violation, out of scope for this PR. - shouldUseTLL non-adjacent "after did" — low operational risk; tightening the regex was the explicit goal of review #9 and re-broadening risks reintroducing false positives. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

moralespanitz mentioned this pull request May 6, 2026

feat(core): productize first-mention events + TLL EO read-path #18

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(search): Phase 4B — broader IF retrieval preference (two-stage)#11

feat(search): Phase 4B — broader IF retrieval preference (two-stage)#11
moralespanitz wants to merge 1 commit intoexperiment/phase2-combined-stackfrom
feature/exp-if-retrieval-preference

moralespanitz commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

moralespanitz commented May 1, 2026

Summary

Why

Scope (525 insertions, 1 deletion)

Implementation note (flagged by subagent)

New config keys (defaults-off)

Tests

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant