feat(extraction): EXP-06 — generic event anchors for As-of facts by moralespanitz · Pull Request #7 · atomicmemory/atomicmemory-core

moralespanitz · 2026-04-29T20:34:35Z

Summary

When a fact begins with As of <date>, ... and no DESCRIPTOR_RULE matches but a subject is recoverable, emit a generic event.occurred anchor with the date and subject. Behind a new feature flag genericEventAnchorEnabled (default false).

This is EXP-06 from the Sprint 2 phase-2 implementation plan.

Why this exists

event-anchor-facts.ts ships a list of LoCoMo-style DESCRIPTOR_RULES (mentorship, internship, networking, Paris/Rome trips, etc.). Any BEAM fact with an As of <date>, prefix that doesn't match one of those rules is emitted without an anchor and is invisible to the temporal-anchor retrieval path.

The Stage 7 dry-run on iter 7 v3 measured TR 1/2 and EO 0/2; manual inspection of the failing facts showed clear temporal phrasing (As of January 2026, user is using PostgreSQL, As of March 15 2025, user completed the API migration) that the rules silently dropped.

The fall-through anchor restores those facts at retrieval time without any new LLM call.

The rule

In inferDescriptors:

Run the existing DESCRIPTOR_RULES loop.
If descriptors.length === 0 AND options.genericEventAnchorEnabled === true, push { label: 'event.occurred', subject, eventDateIso } and let the rest of the pipeline build the anchor fact.
The subject is recovered through the existing inferSubject helper, which already returns null (no anchor) when neither a person entity nor \buser\b is present in the fact text.

The recorded-date prefix parser is widened to accept Month Year (e.g. January 2026) in addition to the existing Month Day Year form. When only month-year is present, the synthesized event date is the first day of the month — sufficient for retrieval keying.

Risks

Anchor inflation. With the flag on, every As of <date>, fact becomes at least one anchor. The flag is off by default to bound this.
Subject collapse to User. inferSubject falls back to User when no person entity is present. This weakens multi-event ordering (an EO concern, addressed by EXP-13). Subject extraction returns null rather than guessing for ambiguous inputs.
Co-existence with DESCRIPTOR_RULES. The descriptors.length === 0 guard ensures we never double-emit on a single fact. Existing LoCoMo regression tests still pass with the flag on.
Co-existence with EXP-13 boundary fields. Anchors are post-extraction synthetic artifacts and intentionally do not carry the event_boundary / boundary_prob fields the LLM-judged extraction adds — anchors get their own retrieval boost.

Test cases

src/services/__tests__/event-anchor-facts.test.ts — extended:

Generic anchor emitted for As of January 2026, user is using PostgreSQL. (month-year prefix, flag on).
Generic anchor emitted for As of March 15 2025, user completed the API migration. (full-date prefix, flag on).
No anchor emitted when flag is off.
No anchor emitted for facts without an As of <date> prefix (flag on).
DESCRIPTOR_RULE regression: mentorship.received still fires for the existing LoCoMo fixture, and the generic fall-through does not also fire on the same source fact when the flag is on.
Subject-extraction fallback returns [] (no anchor) on As of January 2026, the situation continues. rather than crashing or guessing.
Non-prefixed weird input (Random unstructured text without temporal prefix.) returns [] without throwing.

All 12 tests pass (5 existing regression + 7 new). Related runtime-config tests for consensusExtractFacts were updated to thread the new field through and all 18 of those continue to pass.

Config override

To enable for a single ingest call without restarting the server:

{
  "config_override": {
    "genericEventAnchorEnabled": true
  }
}

Or via env:

GENERIC_EVENT_ANCHOR_ENABLED=true

The field is also added to INTERNAL_POLICY_CONFIG_FIELDS so PUT /v1/memories/config accepts it on dev/test deployments.

Wiring

RuntimeConfig.genericEventAnchorEnabled: boolean (default false).
IngestRuntimeConfig extended with the same field; MemoryServiceDeps.config already pulls it through & IngestRuntimeConfig.
ConsensusExtractionConfig.genericEventAnchorEnabled: boolean — buildExtractionOptions forwards it into extractFacts(...).
ExtractionOptions.genericEventAnchorEnabled?: boolean (in observation-date-extraction.ts, the existing pattern).
extraction.ts:323 forwards the flag into enrichExtractedFacts(..., { genericEventAnchorEnabled }).
enrichExtractedFacts and inferEventAnchorFacts accept the new option; default-off preserves bit-identical output.
quickExtractFacts accepts an optional EnrichmentOptions; memory-ingest.ts:performQuickIngest threads deps.config.genericEventAnchorEnabled through so the quick path also benefits.

Test plan

npx tsc --noEmit — exit 0
npx vitest run src/services/__tests__/event-anchor-facts.test.ts — 12/12 passing
npx vitest run src/services/__tests__/consensus-extraction-runtime-config.test.ts observation-date-extraction.test.ts quick-extraction-assistant.test.ts — 18/18 passing
npx vitest run src/services/__tests__/extraction.test.ts extraction-enrichment.test.ts extraction-cache.test.ts — 64/64 passing
npx vitest run src/services/__tests__/memory-ingest-runtime-config.test.ts ingest-trace-branches.test.ts — 12/12 passing
BEAM TR/EO sweep with the flag on (follow-up: dispatcher run after merge)

When a fact starts with 'As of <date>, ...' and no DESCRIPTOR_RULE matches, emit a generic event.occurred anchor with the date and subject recovered from the prefix. Behind new flag genericEventAnchorEnabled (default false). Targets BEAM TR. Stage 7 dry-run on iter 7 v3 had TR 1/2 and EO 0/2; much of the variance was on facts that had clear temporal phrasing but didn't match LoCoMo-style descriptors. The fall-through anchor restores them at retrieval time. Risks: anchor inflation (new flag is off by default to bound this); subject collapse on User-only facts (subject extractor returns null in ambiguous cases rather than emitting a wrong subject). New config keys (defaults-off): - genericEventAnchorEnabled: false Behind feature flag. Defaults preserve current behavior.

…view #7) `positionInConversation` was set directly to `turn_id`. That looked correct for a single extraction, but the (user_id, memory_id) UNIQUE on `first_mention_events` means a re-run of `extractAndStore` for the same conversation silently keeps the FIRST inserted row — including its position. If the LLM's turn_id assignment drifted between runs (which it does in practice — non-deterministic decoding even at temperature=0 plus prompt-cache-state variation), readers would see position values that depend on which run happened to write first, breaking deterministic chronological ordering. Fix: `positionInConversation` is now the 0-based index in the FINAL turn-id-sorted output, NOT `turn_id` itself. Sort first, then enumerate. Re-runs produce identical (position, topic) tuples regardless of any turn_id drift, so the post-write read is stable. Updated `mapToEvents` in `src/services/first-mention-service.ts`: - Build candidates first (without position). - Sort by `turnId` ASC. - Assign `positionInConversation = index` during the final map. Tests: - `src/services/__tests__/first-mention-service.test.ts`: * existing happy-path / sort tests now assert position 0/1/... instead of position == turn_id. * new `produces stable positionInConversation across re-runs even when LLM turn_id drifts` test runs `extractAndStore` twice with drifted turn_ids and confirms both runs produce the same `[0, 1]` position sequence. - `src/db/__tests__/repository-first-mentions.test.ts` (new): integration test seeding a memory + running `store()` twice with drifted turn_ids — asserts only 2 rows survive, position sequence is `[0, 1]`, and the first-write turn_id (5) is what the read-back returns (ON CONFLICT DO NOTHING semantics).

…xtract (review #5) The two PR #18 read endpoints had no HTTP-level tests — only the underlying repository / service unit tests existed, which left the schema-validation middleware and route-level wiring uncovered. A schema rename or a route-handler regression could ship green. New file `src/routes/__tests__/event-chains-and-first-mentions.test.ts` mirrors the route-test pattern from `src/__tests__/route-validation.test.ts`: spin up an Express app on `app.listen(0, ...)`, wire `createMemoryRouter` against a real `MemoryService` backed by the test DB, drive endpoints via `fetch`. The MemoryService gets a real `FirstMentionRepository` plus a stubbed `chatFn` so the LLM call returns a deterministic JSON array. Coverage: GET /v1/memories/event-chains - 400 when `user_id` is missing - 400 when `entity_ids` is missing - 400 when `entity_ids` contains an invalid UUID - 400 when `entity_ids` exceeds the 100-entry cap (review #6) - 400 when `entity_ids` is present but holds only empty tokens - happy path: seed memory + entity + TLL row, hit the route, parse the response with `EventChainsResponseSchema` POST /v1/memories/first-mentions/extract - 400 when `user_id` is missing - 400 when `conversation_text` is empty - 400 when `conversation_text` exceeds MAX_CONVERSATION_LENGTH (100_000 chars) - 400 when `memory_ids_by_turn_id` is missing entirely - 400 when `source_site` is missing - happy path: stub LLM returns 2 events, route stores+returns them, response parsed with `FirstMentionsExtractResponseSchema`, `position_in_conversation` is the post-sorted [0, 1] sequence from review #7. 12/12 new tests pass.

moralespanitz requested a review from ethanj as a code owner April 29, 2026 20:34

moralespanitz marked this pull request as draft April 30, 2026 05:19

ethanj mentioned this pull request May 6, 2026

feat(core): productize first-mention events + TLL EO read-path #18

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(extraction): EXP-06 — generic event anchors for As-of facts#7

feat(extraction): EXP-06 — generic event anchors for As-of facts#7
moralespanitz wants to merge 1 commit intomainfrom
feature/exp-06-generic-event-anchors

moralespanitz commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant