Skip to content

feat(backend): MemoryEnvelope metadata model, scoped retrieval, and memory hardening#12765

Merged
ntindle merged 58 commits intodevfrom
feat/graphiti-memory-envelope
Apr 15, 2026
Merged

feat(backend): MemoryEnvelope metadata model, scoped retrieval, and memory hardening#12765
ntindle merged 58 commits intodevfrom
feat/graphiti-memory-envelope

Conversation

@ntindle
Copy link
Copy Markdown
Member

@ntindle ntindle commented Apr 13, 2026

Why / What / How

Why: CoPilot's Graphiti memory system needed structured metadata to distinguish memory types (rules, procedures, facts, preferences), support scoped retrieval, enable targeted deletion, and track memory costs under the AutoPilot billing account separately from the platform.

What: Adds the MemoryEnvelope metadata model, structured rule/procedure memory types, a derived-finding lane for assistant-distilled knowledge, two-step forget tools, scope-aware retrieval filtering, AutoPilot-dedicated API key routing, and several reliability fixes (streaming socket leaks, event-loop-scoped caches, ingestion hardening).

How: MemoryEnvelope wraps every stored episode with typed metadata (source_kind, memory_kind, scope, status, confidence) serialized as JSON. Retrieval filters by scope at the context layer. The forget flow uses a search-then-confirm two-step pattern. Ingestion queues and client caches are scoped per event loop via WeakKeyDictionary to prevent cross-loop RuntimeErrors in multi-worker deployments. API key resolution falls back to AutoPilot-dedicated keys (CHAT_API_KEY, CHAT_OPENAI_API_KEY) before platform-wide keys.

Changes 🏗️

New: MemoryEnvelope metadata model (memory_model.py)

  • Typed memory categories: fact, preference, rule, finding, plan, event, procedure
  • Source tracking: user_asserted, assistant_derived, tool_observed
  • Scope namespacing: real:global, project:<name>, book:<title>, session:<id>
  • Status lifecycle: active, tentative, superseded, contradicted
  • Structured RuleMemory and ProcedureMemory models for complex instructions

New: Targeted forget tools (graphiti_forget.py)

  • memory_forget_search: returns candidate facts with UUIDs for user confirmation
  • memory_forget_confirm: deletes specific edges by UUID after confirmation

New: Architecture test (architecture_test.py)

  • Validates no new @cached(...) usage around event-loop-bound async clients
  • Allowlists pre-existing violations for future cleanup

Enhanced: memory_store tool (graphiti_store.py)

  • Accepts MemoryEnvelope metadata fields (source_kind, scope, memory_kind, rule, procedure)
  • Wraps content in MemoryEnvelope before ingestion

Enhanced: memory_search tool (graphiti_search.py)

  • Scope-aware retrieval with hard filtering on group_id

Enhanced: Ingestion pipeline (ingest.py)

  • Derived-finding lane: distills substantive assistant responses into tentative findings
  • Event-loop-scoped queues and workers via WeakKeyDictionary (fixes multi-worker RuntimeError)
  • Improved error handling and dropped-episode reporting

Enhanced: Client cache (client.py)

  • Per-loop client cache and lock via WeakKeyDictionary (fixes "Future attached to a different loop")

Enhanced: Warm context (context.py)

  • Filters out non-global-scope episodes from warm context

Fix: Streaming socket leak (baseline/service.py)

  • try/finally around async stream iteration to release httpx connections on early exit

Config: AutoPilot key routing (config.py, .env.default)

  • LLM key fallback: GRAPHITI_LLM_API_KEY → CHAT_API_KEY → OPEN_ROUTER_API_KEY
  • Embedder key fallback: GRAPHITI_EMBEDDER_API_KEY → CHAT_OPENAI_API_KEY → OPENAI_API_KEY
  • Backwards-compatible: existing behavior unchanged until new keys are provisioned

Checklist 📋

For code changes:

  • I have clearly listed my changes in the PR description
  • I have made a test plan
  • I have tested my changes according to the test plan:
    • poetry run pytest backend/copilot/graphiti/config_test.py — 16 tests pass (key fallback priority)
    • poetry run pytest backend/copilot/tools/graphiti_store_test.py — store envelope tests pass
    • poetry run pytest backend/copilot/graphiti/ingest_test.py — ingestion tests pass
    • poetry run pytest backend/util/architecture_test.py — structural validation passes
    • Verify memory store/retrieve/forget cycle via copilot chat
    • Run AgentProbe multi-session memory benchmark (31 scenarios x3 repeats)
    • Confirm no CLOSE_WAIT socket accumulation under sustained streaming load
    • Verify multi-worker deployment doesn't produce loop-binding errors

For configuration changes:

  • .env.default is updated or already compatible with my changes
  • docker-compose.yml is updated or already compatible with my changes
  • Configuration changes:
    • New optional env var CHAT_OPENAI_API_KEY — AutoPilot-dedicated OpenAI key for Graphiti embeddings (falls back to OPENAI_API_KEY if not set)
    • CHAT_API_KEY now used as first fallback for Graphiti LLM calls (was OPEN_ROUTER_API_KEY)
    • Infra action needed: add CHAT_OPENAI_API_KEY sealed secret in autogpt-shared-config values (dev + prod)

🤖 Generated with Claude Code


Note

Medium Risk
Touches Graphiti memory ingestion/retrieval and introduces hard-delete capabilities plus event-loop–scoped caching/queues; failures could affect memory correctness or delete the wrong edges. Also changes streaming resource cleanup and key routing, which could surface as connection or billing/cost attribution issues if misconfigured.

Overview
Graphiti memory is upgraded from plain text episodes to a structured JSON MemoryEnvelope. memory_store now wraps content with typed metadata (source, kind, scope, status) and optional structured rule/procedure payloads, and ingestion supports JSON episodes.

Memory retrieval and lifecycle controls are expanded. memory_search adds optional scope hard-filtering to prevent cross-scope leakage, warm-context formatting drops non-global scoped episodes (and avoids empty wrappers), and new two-step tools (memory_forget_searchmemory_forget_confirm) enable targeted soft- or hard-deletion of specific graph edges by UUID.

Reliability and multi-worker safety improvements. Graphiti client caching and ingestion worker registries are now per-event-loop (avoiding cross-loop Future errors), streaming chat completions explicitly close async streams to prevent CLOSE_WAIT socket leaks, warm-context is injected into the first user message to keep the system prompt cacheable, and a new architecture_test.py blocks future process-wide caching of event-loop–bound async clients. Config updates route Graphiti LLM/embedder keys to AutoPilot-specific env vars first, and OpenAPI schema exports include the new memory response types.

Reviewed by Cursor Bugbot for commit 5fb4bd0. Bugbot is set up for automated code reviews on this repo. Configure here.

ntindle and others added 30 commits April 7, 2026 05:23
Integrate graphiti-core as an in-process temporal knowledge graph for
persistent cross-session memory in AutoPilot. Works in both SDK and
baseline/fast execution paths via the existing BaseTool → TOOL_REGISTRY
→ create_copilot_mcp_server() bridge.

Infrastructure:
- FalkorDB added to Docker Compose as graph database backend
- graphiti-core + cachetools + falkordb added to Python deps
- Per-group_id client isolation with LRU/TTL cache
- Custom FalkorDB driver with fulltext query fix

Tools (3 new BaseTool implementations):
- graphiti_store: save memories with EpisodeType.text + custom
  extraction instructions to suppress meta-entity pollution
- graphiti_search: hybrid search (edges + episodes) with ep.content
  surfacing and 500-char episode bodies
- graphiti_delete_user_data: GDPR deletion via clear_data()

Memory quality:
- Episode body uses "Speaker: content" format (not JSON blobs)
- Only user messages ingested into graph (Zep Cloud approach)
- custom_extraction_instructions block meta-entities (assistant,
  human, tool names, block names)
- small_model set to match main model (avoids gpt-4.1-nano dedup
  hallucination bug #760)
- Per-user asyncio.Queue serializes add_episode() calls

Integration:
- Warm context pre-loaded at session start (8s timeout, graceful
  degradation)
- System prompt supplement with ALWAYS SEARCH instruction
- Fire-and-forget episode ingestion after each turn via
  _background_tasks pattern
- MemoryEpisodeLog replay table (append-only, full user+assistant
  turn for migration safety)
- LaunchDarkly flag "graphiti-memory" for per-user rollout
- OpenRouter for extraction LLM, direct OpenAI for embeddings

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Gate graphiti prompt supplement behind feature flag (baseline + SDK)
- Gate ingestion behind feature flag (was running for all users)
- Fix fulltext query length check to measure final string, not token count
- Remove erroneous GIN index drop from migration
- Remove GDPR compliance claim from delete tool
- Write replay log from graphiti_store tool
- Add FalkorDB to app-network and declare falkordb_data volume

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Missed the SDK path — enqueue_conversation_turn was running for all
authenticated users. Now gated behind is_enabled_for_user() like the
baseline path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…add tests

- Remove MemoryEpisodeLog table, migration, and all related code
- Fix TOCTOU race in get_graphiti_client (hold lock through init)
- Guard queue/worker creation with asyncio.Lock in ingest.py
- Handle CancelledError in ingestion workers
- Add worker idle timeout (60s) to prevent unbounded memory leak
- Fix derive_group_id to raise on sanitized input (prevent collisions)
- Route graphiti_store through ingestion queue (no more inline blocking)
- Fix falkordb_port default to 6380 (was 6379, mismatched .env.default)
- Make GraphitiConfig lazy (prevent import-time crash on bad .env)
- Consolidate duplicate is_enabled_for_user imports in service files
- Use explicit typed params in tool _execute methods
- Extract shared edge/episode formatters into _format.py
- Fix broken import path in graphiti_store.py (was crashing tool registry)
- Add 65 tests covering all graphiti modules (was 3 tests)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…seline service

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove blank GRAPHITI_* entries that fall back to OPEN_ROUTER/OPENAI keys
- Keep PASSWORD, HOST, PORT, model defaults, and semaphore limit
- Add web UI check to FalkorDB healthcheck (redis-cli + wget)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…stion

- Fix permissions.py: graphiti_* → memory_* to match TOOL_REGISTRY
- Fix prompting.py: graphiti_search/store → memory_search/store in LLM prompt
- Remove unused assistant_msg param from enqueue_conversation_turn
- Guard derive_group_id ValueError in search/delete tools
- Only ingest user messages (skip system/assistant turns)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- FalkorDB container now consumes GRAPHITI_FALKORDB_PASSWORD via REDIS_ARGS
- Healthcheck passes password to redis-cli
- enqueue_conversation_turn and enqueue_episode catch ValueError from
  derive_group_id and log+return instead of silently crashing the task

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Prevents queued episodes from re-populating the graph after clear_data.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove MemoryDeleteTool, its test, response model, permission entry,
and openapi.json enum value.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- TTL eviction now closes the driver via _EvictingTTLCache subclass
- evict_client is now async and explicitly closes the driver
- Prevents leaked FalkorDB connections on TTL expiry or manual eviction

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
enqueue_episode now returns bool. MemoryStoreTool returns ErrorResponse
when the queue is full instead of a false success confirmation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Was duplicated as a deferred import in two functions. The module
already imports graphiti-core transitively via .client, so deferring
added no benefit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Override expire() instead of __delitem__ per cachetools maintainer
  guidance (github.com/tkem/cachetools/issues/205) — __delitem__ is
  bypassed by the internal TTL expiry path, so connections were silently
  leaked. expire() is the correct hook for TTL-expired items.
- Remove __delitem__ override that caused double-close in evict_client
- Add try/except around graphiti calls in MemorySearchTool so FalkorDB
  failures return ErrorResponse instead of crashing the tool-call round

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
_ensure_worker now returns the queue directly. Callers use the returned
reference instead of re-looking up _user_queues[user_id], which could
KeyError if the worker timed out and cleaned up between ensure and put.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ervices

The lazy config proxy makes this import lightweight — no graphiti-core
pulled in, no .env parsed until first attribute access.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tools

Phase 2 of Graphiti memory: structured explicit memories with
domain-agnostic metadata and two-step targeted deletion.

MemoryEnvelope (memory_model.py):
- source_kind: user_asserted / assistant_derived / tool_observed
- scope: real:global, project:<name>, book:<title>, session:<id>
- memory_kind: fact / preference / rule / finding / plan / event / procedure
- status: active / tentative / superseded / contradicted
- Optional confidence and provenance fields

memory_store tool updated:
- Accepts source_kind, scope, memory_kind optional params
- Wraps content in MemoryEnvelope, ingests as EpisodeType.json
- Preserves backward compat (all new params have defaults)

Targeted forget (two-step flow):
- memory_forget_search: NL query → returns candidate edges with UUIDs
- memory_forget_confirm: deletes specific edges by UUID via Cypher
- Agent shows candidates, user confirms before deletion

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tion

Phase 3 of Graphiti memory: substantive assistant findings are
distilled into structured MemoryEnvelope episodes tagged with
source_kind=assistant_derived and status=tentative.

Heuristic gate (_is_finding_worthy):
- Skip short acknowledgments (<150 chars)
- Skip workflow chatter ("Done", "Here's", "I've created", etc.)
- Only pass through substantive responses likely containing research
  results, analysis, or conclusions

Distillation (_distill_finding):
- Simple truncation for now (first 500 chars)
- Queued as EpisodeType.json with MemoryEnvelope metadata
- Best-effort: if queue is full, user canonical episode takes priority

Both SDK and baseline paths now pass assistant_msg to
enqueue_conversation_turn for distillation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Phase 4 of Graphiti memory: search results and warm context now
support scope-based filtering to prevent cross-domain memory bleed.

memory_search tool:
- New optional `scope` parameter for hard filtering
- When set (e.g. scope="real:global"), only episodes whose
  MemoryEnvelope JSON matches that scope are returned
- Plain conversation episodes (no JSON envelope) default to
  real:global scope
- Omit scope to search all scopes (backward compatible)

Warm context (context.py):
- Filters episodes to real:global scope by default
- Fiction/project-scoped episodes excluded from session-start
  context to prevent bleed

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When memory_kind=rule, the agent can now pass a structured `rule`
object with instruction, actor, trigger, and negation fields.  This
preserves exact user intent instead of relying on LLM extraction to
reconstruct operational rules from prose.

Example: "CC Sarah on client stuff" is stored as:
  rule.instruction = "CC Sarah on client communications"
  rule.actor = "Sarah"
  rule.trigger = "client-related communications"

When memory_kind=procedure, a structured `procedure` object captures
multi-step workflows with ordered steps, tools, conditions, and
negations that don't decompose into fact triples.

Both are optional extensions of MemoryEnvelope — backward compatible
with existing fact/preference/finding storage.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace raw RELATES_TO-only Cypher with correct deletion that handles
all graph artifacts.

Soft delete (default):
- Sets invalid_at + expired_at on RELATES_TO edges
- Edges excluded from default search but history preserved
- Reversible — appropriate for "I don't want to see this"

Hard delete (hard_delete=true, for GDPR):
- Deletes edges across all types (MENTIONS|RELATES_TO|HAS_MEMBER)
- Cleans up EpisodicNode.entity_edges back-references
- Does NOT delete orphaned entity nodes — they may have summaries,
  embeddings, or future connections

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Wrap streaming response iteration in try/finally to prevent httpx
  connection leaks (CLOSE_WAIT sockets) on early exit or tool-call break
- Scope graphiti client cache per event loop using WeakKeyDictionary to
  fix "Future attached to a different loop" errors in multi-worker setups
- Harden ingest pipeline with improved error handling and test coverage
- Add architecture_test.py for structural validation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts with the base branch, please resolve those so we can evaluate the pull request.

@ntindle
Copy link
Copy Markdown
Member Author

ntindle commented Apr 15, 2026

@copilot resolve the merge conflicts in this pull request

…nvelope

# Conflicts:
#	autogpt_platform/backend/backend/copilot/sdk/service.py

Co-authored-by: ntindle <8845353+ntindle@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 15, 2026

@copilot resolve the merge conflicts in this pull request

Resolved in merge commit 8455a13. There was one conflict in service.py — the warm-context injection (our branch) overlapped with the transcript seeding addition from dev. Both changes are independent and have been kept in the correct order.

@github-actions github-actions bot removed the conflicts Automatically applied to PRs with merge conflicts label Apr 15, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Conflicts have been resolved! 🎉 A maintainer will review the pull request shortly.

@github-actions github-actions bot added the conflicts Automatically applied to PRs with merge conflicts label Apr 15, 2026
@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts with the base branch, please resolve those so we can evaluate the pull request.

- sdk/service.py: keep both pre_attempt_msg_count init and
  sdk_model/model_cost_multiplier defaults; take dev's warm_ctx=""
  (always string, never None) and `or ""` fallback.
- service.py: take dev's MEMORY_CONTEXT_TAG + ENV_CONTEXT_TAG
  instructions, replacing our simpler temporal_context line.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot removed the conflicts Automatically applied to PRs with merge conflicts label Apr 15, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Conflicts have been resolved! 🎉 A maintainer will review the pull request shortly.

Comment thread autogpt_platform/backend/backend/copilot/sdk/service.py Outdated
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit e7ae7b5. Configure here.

Comment thread autogpt_platform/backend/backend/copilot/sdk/service.py Outdated
Dev merged inject_user_context(warm_ctx=warm_ctx) which handles
warm_ctx injection on first turn. Our standalone append at line 2797
and retry reattach at line 2945 caused double injection. Removed both
since current_message already carries warm_ctx after inject_user_context.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ntindle ntindle merged commit ab3221a into dev Apr 15, 2026
45 checks passed
@ntindle ntindle deleted the feat/graphiti-memory-envelope branch April 15, 2026 14:40
@github-project-automation github-project-automation bot moved this to Done in Frontend Apr 15, 2026
@github-project-automation github-project-automation bot moved this from 🆕 Needs initial review to ✅ Done in AutoGPT development kanban Apr 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

platform/backend AutoGPT Platform - Back end platform/frontend AutoGPT Platform - Front end size/xl

Projects

Status: ✅ Done
Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants