Skip to content

feat(search): Phase 4B — broader IF retrieval preference (two-stage)#11

Draft
moralespanitz wants to merge 1 commit intoexperiment/phase2-combined-stackfrom
feature/exp-if-retrieval-preference
Draft

feat(search): Phase 4B — broader IF retrieval preference (two-stage)#11
moralespanitz wants to merge 1 commit intoexperiment/phase2-combined-stackfrom
feature/exp-if-retrieval-preference

Conversation

@moralespanitz
Copy link
Copy Markdown
Contributor

Summary

Two-stage retrieval that prefers instruction-tagged memories when the query is itself instruction-style. Targets BEAM IF — currently 0/2 across all 19 sprint-2 iters and 0/2 on synth slices.

Why

Sprint 2 EXP-05 H-310 markers extension lifted fact_role: 'instruction' coverage from 0.8% → 4.1%, but the boost on a 4% slice didn't surface instruction memories above non-instruction memories. The fix is at the retrieval layer.

When query is instruction-style ("how should I format X?", "what did I tell you to always do?"):

  1. Vector search filtered to metadata.fact_role === 'instruction' — top instructionPreferenceTopK (default 5)
  2. Regular vector search for the remainder
  3. Merge with instruction-stage first, then dedupe

When flag off OR query isn't instruction-style: existing pipeline unchanged.

Scope (525 insertions, 1 deletion)

  • src/services/instruction-query-detector.ts (new, 59 LOC) — keyword classifier
  • src/services/instruction-preference-retrieval.ts (new, 130 LOC) — two-stage helper
  • src/services/__tests__/instruction-preference-retrieval.test.ts (new, 288 LOC, 12 tests passing)
  • src/services/search-pipeline.ts (+35 LOC) — wires new stage between initial retrieval and entity co-retrieval
  • src/config.ts (+14 LOC) — two new flags + env loaders + allowlist
  • src/app/runtime-container.ts (+2 LOC) — CoreRuntimeConfig extension

Implementation note (flagged by subagent)

The existing repository-vector-search.ts does NOT support metadata filters in SQL — SearchStore.searchSimilar(userId, queryEmbedding, limit, sourceSite?, referenceTime?) has no seam for a fact_role filter. Rather than invent new SQL infrastructure inline (out of scope), this PR oversamples via existing searchSimilar with max(candidateDepth, topK * 5) and post-filters in JS for metadata.fact_role === 'instruction'.

Follow-up could push the filter into SQL (new searchSimilarFilteredByMetadata store method or indexed partial expression on (metadata->>'fact_role')) to skip oversampling on instruction-heavy corpora.

New config keys (defaults-off)

instructionPreferenceRetrievalEnabled: boolean = false
instructionPreferenceTopK: number = 5

Tests

12 new cases all passing:

  • detector tests: instruction phrasings, non-instruction, empty, case-insensitivity
  • retrieval tests: 4 required cases (flag-off no-op, non-instruction no-op, instruction-style surfacing, mixed-corpus low-similarity surfacing) + 5 edge cases
  • pre-existing tests still green: search-pipeline-runtime-config (5), instruction-boost (8)

Test plan

  • Typecheck clean
  • All new + existing tests pass
  • Stage-7 dryrun with this enabled — measure IF lift from 0/2 baseline

Routes instruction-style queries ("what did I tell you", "what's my
preference", "how should I", "always", "never", etc.) to a dedicated
retrieval stage that surfaces memories tagged
`metadata.fact_role: 'instruction'` ahead of general results. Targets the
BEAM IF slice where the EXP-05 boost on a 4% subset is insufficient to
move the score — fix is at the routing layer, not the boost layer.

Defaults-OFF behind `instructionPreferenceRetrievalEnabled`. When the
flag is on AND the query matches the instruction-style detector, the
pipeline oversamples via `searchSimilar` and post-filters by
`metadata.fact_role === 'instruction'`, then merges the top
`instructionPreferenceTopK` (default 5) instruction-tagged results ahead
of the existing candidate pool with id-dedup.

Design note: the existing `repository-vector-search` does NOT support a
metadata filter at the SQL layer. Adding one is out-of-scope for this
PR. The post-filter approach delivers the routing behavior the BEAM IF
slice needs without inventing SQL infrastructure inline. A follow-up can
push the filter into SQL once we've measured the score lift.

Files:
- src/services/instruction-query-detector.ts (new, 59 LOC) — pure
  regex/keyword classifier; no LLM.
- src/services/instruction-preference-retrieval.ts (new, 130 LOC) —
  two-stage retrieval helper.
- src/services/search-pipeline.ts — wires the new stage between initial
  retrieval and entity co-retrieval; gated by the new flag.
- src/config.ts — adds `instructionPreferenceRetrievalEnabled` (default
  false) and `instructionPreferenceTopK` (default 5) to RuntimeConfig,
  env loader, and INTERNAL_POLICY_CONFIG_FIELDS.
- src/app/runtime-container.ts — adds both fields to CoreRuntimeConfig.
- src/services/__tests__/instruction-preference-retrieval.test.ts (new,
  288 LOC, 12 tests) — flag-off no-op, non-instruction-query no-op,
  instruction-tagged surfacing, mixed-corpus low-similarity surfacing,
  zero-instructions corpus, dedup, topK=0 short-circuit, sourceSite/
  referenceTime threading.

Verification:
- npx tsc --noEmit → clean.
- 12/12 new tests pass; existing search-pipeline-runtime-config and
  instruction-boost tests still green.
moralespanitz added a commit that referenced this pull request May 6, 2026
…iew #11)

The `chatFn` adapter wired in `runtime-container.ts` returned hardcoded
zero token counts for every call. `LLMProvider.chat()` returns
`Promise<string>` (no usage), so threading real counts here would require
widening that interface across every adapter. Nothing in the
`FirstMentionService` path actually consumed the fields — they only
existed to satisfy the local `ChatResult` shape — so dropping them is
strictly safer than leaving misleading zeros in place. Per-call cost
telemetry continues to flow from `LLMProvider.chat` -> `writeCostEvent`
unchanged.

Updated:
  - `ChatResult` in `first-mention-service.ts` -> `{ text: string }` only,
    with a comment documenting the deliberate decision.
  - `runtime-container.ts` adapter no longer fabricates zero usage.
  - `first-mention-service.test.ts` fixture updated to match.
ethanj added a commit that referenced this pull request May 6, 2026
Squashed across four review rounds on PR #18, all of which surfaced after
the initial wave of fixes (#1#11). Each item below maps to a finding in
the PR's review threads.

Search path

- hydrateChainMemories now returns fully-shaped SearchResults via SELECT *
  + normalizeMemoryRow. The previous projection set similarity: null and
  omitted source_site / score / summary / observed_at, crashing the
  buildInjection formatter (`memory.similarity.toFixed(2)`) the first
  time TLL augmentation actually fired against a populated chain. The
  `as unknown as SearchResult` cast was hiding it from tsc. (review v2 #1)
- Hydration query now uses `unnest($2::uuid[]) WITH ORDINALITY ... ORDER
  BY req.ord` so the chronological order chainsFor returns through the
  augmentation pipeline survives. (review v4 #2)
- Workspace isolation: hydrateChainMemories filters `m.workspace_id IS NULL`
  to match the gate behavior performSearch's postProcessResults already
  applies. Without it, a workspace memory chained from a global memory's
  entity could surface in a global response. (review v4 #1)
- Defensive `relevance: 1.0` on hydrated rows locks in the chain-membership
  bypass invariant against future filter drift. The augmented rows are
  appended after applySearchRelevanceFilter today, but `similarity: 0`
  + `score: 0` would make them load-bearing on `relevance` if any future
  filter past appendTllAugmentation checked `memory.relevance >= threshold`.
  Regression test drives performSearch with a high
  retrievalOptions.relevanceThreshold and confirms the augmented row
  survives. (review v5 #2)

Repository

- TLL chain reads (chain, chainEventsForEntities) now derive chronological
  position and predecessor via `ROW_NUMBER()` and `LAG()` window functions
  ordered by observation_date ASC (with stored position_in_chain as a
  deterministic tiebreaker for events sharing an observation_date). The
  stored predecessor_memory_id and position_in_chain columns become
  insertion-order audit metadata; the API surface returns chronological
  ordering. Backfilled out-of-order events surface in their true position
  with chronologically-correct predecessors. (review v3 #1)
- chainEventsForEntities adds `m.workspace_id IS NULL` for the same
  reason as the search-path fix above — the global event-chains HTTP
  endpoint must not surface workspace memories. (review v4 #1)

Schema

- FirstMentionsExtractBodySchema validates memory_ids_by_turn_id values
  as UUIDs so a non-UUID returns 400 (schema layer) instead of leaking a
  Postgres "invalid input syntax for type uuid" as 500 from the route.
  (review v3 #2)
- New SearchResult.retrieval_signal optional field tags chain-augmented
  rows so observability and any future ranker can distinguish them from
  similarity-ranked candidates. (review v2 #1, plumbed through v4)

Refactor

- Extracted maybeExpandViaTLL, hydrateChainMemories, appendTllAugmentation
  out of memory-search.ts into a sibling tll-augmentation.ts module. The
  search file dropped from 551 → 385 LOC, back under the 400-LOC project
  cap. Shared internal types (PostProcessedSearch, RelevanceFilterSummary)
  pulled into memory-search-types.ts so the two consumers don't duplicate.
  (review v5 #4)

Test coverage

- New integration test (services/__tests__/tll-augmentation-integration.test.ts)
  drives performSearch end-to-end through appendTllAugmentation, with
  cases for: rendering augmented rows through buildInjection without
  crashing, the SQL contract (unnest ORDINALITY + ORDER BY req.ord),
  workspace-leak prevention, the relevance-1.0 bypass invariant, and
  no-augmentation for non-TLL queries.
- New repository tests for backfill chronological ordering of chain()
  and chainEventsForEntities() and for chainEventsForEntities workspace
  isolation.
- New route test asserts 400 (not 500) on non-UUID memory_ids_by_turn_id.

Verification: `npx tsc --noEmit` clean · 1286/1286 vitest pass against the
test DB · `npx fallow audit --no-cache` exit 0.

Deferrals (parked durably in the research repo's tech-debt log at
Atomicmemory-research/docs/core-repo/tech-debt.md):
- predecessor_memory_id ON DELETE CASCADE vs SET NULL — design call,
  contested between two reviewers; current default kept.
- process.env.ALLOWED_ORIGINS direct read in src/routes/memories.ts —
  pre-existing CLAUDE.md violation, out of scope for this PR.
- shouldUseTLL non-adjacent "after did" — low operational risk;
  tightening the regex was the explicit goal of review #9 and
  re-broadening risks reintroducing false positives.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant