feat(backend): MemoryEnvelope metadata model, scoped retrieval, and memory hardening by ntindle · Pull Request #12765 · Significant-Gravitas/AutoGPT

ntindle · 2026-04-13T14:38:12Z

Why / What / How

Why: CoPilot's Graphiti memory system needed structured metadata to distinguish memory types (rules, procedures, facts, preferences), support scoped retrieval, enable targeted deletion, and track memory costs under the AutoPilot billing account separately from the platform.

What: Adds the MemoryEnvelope metadata model, structured rule/procedure memory types, a derived-finding lane for assistant-distilled knowledge, two-step forget tools, scope-aware retrieval filtering, AutoPilot-dedicated API key routing, and several reliability fixes (streaming socket leaks, event-loop-scoped caches, ingestion hardening).

How: MemoryEnvelope wraps every stored episode with typed metadata (source_kind, memory_kind, scope, status, confidence) serialized as JSON. Retrieval filters by scope at the context layer. The forget flow uses a search-then-confirm two-step pattern. Ingestion queues and client caches are scoped per event loop via WeakKeyDictionary to prevent cross-loop RuntimeErrors in multi-worker deployments. API key resolution falls back to AutoPilot-dedicated keys (CHAT_API_KEY, CHAT_OPENAI_API_KEY) before platform-wide keys.

Changes 🏗️

New: MemoryEnvelope metadata model (memory_model.py)

Typed memory categories: fact, preference, rule, finding, plan, event, procedure
Source tracking: user_asserted, assistant_derived, tool_observed
Scope namespacing: real:global, project:<name>, book:<title>, session:<id>
Status lifecycle: active, tentative, superseded, contradicted
Structured RuleMemory and ProcedureMemory models for complex instructions

New: Targeted forget tools (graphiti_forget.py)

memory_forget_search: returns candidate facts with UUIDs for user confirmation
memory_forget_confirm: deletes specific edges by UUID after confirmation

New: Architecture test (architecture_test.py)

Validates no new @cached(...) usage around event-loop-bound async clients
Allowlists pre-existing violations for future cleanup

Enhanced: memory_store tool (graphiti_store.py)

Accepts MemoryEnvelope metadata fields (source_kind, scope, memory_kind, rule, procedure)
Wraps content in MemoryEnvelope before ingestion

Enhanced: memory_search tool (graphiti_search.py)

Scope-aware retrieval with hard filtering on group_id

Enhanced: Ingestion pipeline (ingest.py)

Derived-finding lane: distills substantive assistant responses into tentative findings
Event-loop-scoped queues and workers via WeakKeyDictionary (fixes multi-worker RuntimeError)
Improved error handling and dropped-episode reporting

Enhanced: Client cache (client.py)

Per-loop client cache and lock via WeakKeyDictionary (fixes "Future attached to a different loop")

Enhanced: Warm context (context.py)

Filters out non-global-scope episodes from warm context

Fix: Streaming socket leak (baseline/service.py)

try/finally around async stream iteration to release httpx connections on early exit

Config: AutoPilot key routing (config.py, .env.default)

LLM key fallback: GRAPHITI_LLM_API_KEY → CHAT_API_KEY → OPEN_ROUTER_API_KEY
Embedder key fallback: GRAPHITI_EMBEDDER_API_KEY → CHAT_OPENAI_API_KEY → OPENAI_API_KEY
Backwards-compatible: existing behavior unchanged until new keys are provisioned

Checklist 📋

For code changes:

For configuration changes:

.env.default is updated or already compatible with my changes
docker-compose.yml is updated or already compatible with my changes
Configuration changes:
- New optional env var CHAT_OPENAI_API_KEY — AutoPilot-dedicated OpenAI key for Graphiti embeddings (falls back to OPENAI_API_KEY if not set)
- CHAT_API_KEY now used as first fallback for Graphiti LLM calls (was OPEN_ROUTER_API_KEY)
- Infra action needed: add CHAT_OPENAI_API_KEY sealed secret in autogpt-shared-config values (dev + prod)

🤖 Generated with Claude Code

Note

Medium Risk
Touches Graphiti memory ingestion/retrieval and introduces hard-delete capabilities plus event-loop–scoped caching/queues; failures could affect memory correctness or delete the wrong edges. Also changes streaming resource cleanup and key routing, which could surface as connection or billing/cost attribution issues if misconfigured.

Overview
Graphiti memory is upgraded from plain text episodes to a structured JSON MemoryEnvelope. memory_store now wraps content with typed metadata (source, kind, scope, status) and optional structured rule/procedure payloads, and ingestion supports JSON episodes.

Memory retrieval and lifecycle controls are expanded. memory_search adds optional scope hard-filtering to prevent cross-scope leakage, warm-context formatting drops non-global scoped episodes (and avoids empty wrappers), and new two-step tools (memory_forget_search → memory_forget_confirm) enable targeted soft- or hard-deletion of specific graph edges by UUID.

Reliability and multi-worker safety improvements. Graphiti client caching and ingestion worker registries are now per-event-loop (avoiding cross-loop Future errors), streaming chat completions explicitly close async streams to prevent CLOSE_WAIT socket leaks, warm-context is injected into the first user message to keep the system prompt cacheable, and a new architecture_test.py blocks future process-wide caching of event-loop–bound async clients. Config updates route Graphiti LLM/embedder keys to AutoPilot-specific env vars first, and OpenAPI schema exports include the new memory response types.

^{Reviewed by Cursor Bugbot for commit 5fb4bd0. Bugbot is set up for automated code reviews on this repo. Configure here.}

Integrate graphiti-core as an in-process temporal knowledge graph for persistent cross-session memory in AutoPilot. Works in both SDK and baseline/fast execution paths via the existing BaseTool → TOOL_REGISTRY → create_copilot_mcp_server() bridge. Infrastructure: - FalkorDB added to Docker Compose as graph database backend - graphiti-core + cachetools + falkordb added to Python deps - Per-group_id client isolation with LRU/TTL cache - Custom FalkorDB driver with fulltext query fix Tools (3 new BaseTool implementations): - graphiti_store: save memories with EpisodeType.text + custom extraction instructions to suppress meta-entity pollution - graphiti_search: hybrid search (edges + episodes) with ep.content surfacing and 500-char episode bodies - graphiti_delete_user_data: GDPR deletion via clear_data() Memory quality: - Episode body uses "Speaker: content" format (not JSON blobs) - Only user messages ingested into graph (Zep Cloud approach) - custom_extraction_instructions block meta-entities (assistant, human, tool names, block names) - small_model set to match main model (avoids gpt-4.1-nano dedup hallucination bug #760) - Per-user asyncio.Queue serializes add_episode() calls Integration: - Warm context pre-loaded at session start (8s timeout, graceful degradation) - System prompt supplement with ALWAYS SEARCH instruction - Fire-and-forget episode ingestion after each turn via _background_tasks pattern - MemoryEpisodeLog replay table (append-only, full user+assistant turn for migration safety) - LaunchDarkly flag "graphiti-memory" for per-user rollout - OpenRouter for extraction LLM, direct OpenAI for embeddings Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Gate graphiti prompt supplement behind feature flag (baseline + SDK) - Gate ingestion behind feature flag (was running for all users) - Fix fulltext query length check to measure final string, not token count - Remove erroneous GIN index drop from migration - Remove GDPR compliance claim from delete tool - Write replay log from graphiti_store tool - Add FalkorDB to app-network and declare falkordb_data volume Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Missed the SDK path — enqueue_conversation_turn was running for all authenticated users. Now gated behind is_enabled_for_user() like the baseline path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…add tests - Remove MemoryEpisodeLog table, migration, and all related code - Fix TOCTOU race in get_graphiti_client (hold lock through init) - Guard queue/worker creation with asyncio.Lock in ingest.py - Handle CancelledError in ingestion workers - Add worker idle timeout (60s) to prevent unbounded memory leak - Fix derive_group_id to raise on sanitized input (prevent collisions) - Route graphiti_store through ingestion queue (no more inline blocking) - Fix falkordb_port default to 6380 (was 6379, mismatched .env.default) - Make GraphitiConfig lazy (prevent import-time crash on bad .env) - Consolidate duplicate is_enabled_for_user imports in service files - Use explicit typed params in tool _execute methods - Extract shared edge/episode formatters into _format.py - Fix broken import path in graphiti_store.py (was crashing tool registry) - Add 65 tests covering all graphiti modules (was 3 tests) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…seline service Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Remove blank GRAPHITI_* entries that fall back to OPEN_ROUTER/OPENAI keys - Keep PASSWORD, HOST, PORT, model defaults, and semaphore limit - Add web UI check to FalkorDB healthcheck (redis-cli + wget) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…stion - Fix permissions.py: graphiti_* → memory_* to match TOOL_REGISTRY - Fix prompting.py: graphiti_search/store → memory_search/store in LLM prompt - Remove unused assistant_msg param from enqueue_conversation_turn - Guard derive_group_id ValueError in search/delete tools - Only ingest user messages (skip system/assistant turns) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- FalkorDB container now consumes GRAPHITI_FALKORDB_PASSWORD via REDIS_ARGS - Healthcheck passes password to redis-cli - enqueue_conversation_turn and enqueue_episode catch ValueError from derive_group_id and log+return instead of silently crashing the task Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Prevents queued episodes from re-populating the graph after clear_data. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Remove MemoryDeleteTool, its test, response model, permission entry, and openapi.json enum value. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- TTL eviction now closes the driver via _EvictingTTLCache subclass - evict_client is now async and explicitly closes the driver - Prevents leaked FalkorDB connections on TTL expiry or manual eviction Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

enqueue_episode now returns bool. MemoryStoreTool returns ErrorResponse when the queue is full instead of a false success confirmation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Was duplicated as a deferred import in two functions. The module already imports graphiti-core transitively via .client, so deferring added no benefit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Override expire() instead of __delitem__ per cachetools maintainer guidance (github.com/tkem/cachetools/issues/205) — __delitem__ is bypassed by the internal TTL expiry path, so connections were silently leaked. expire() is the correct hook for TTL-expired items. - Remove __delitem__ override that caused double-close in evict_client - Add try/except around graphiti calls in MemorySearchTool so FalkorDB failures return ErrorResponse instead of crashing the tool-call round Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

_ensure_worker now returns the queue directly. Callers use the returned reference instead of re-looking up _user_queues[user_id], which could KeyError if the worker timed out and cleaned up between ensure and put. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ervices The lazy config proxy makes this import lightweight — no graphiti-core pulled in, no .env parsed until first attribute access. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…into branch12

…tools Phase 2 of Graphiti memory: structured explicit memories with domain-agnostic metadata and two-step targeted deletion. MemoryEnvelope (memory_model.py): - source_kind: user_asserted / assistant_derived / tool_observed - scope: real:global, project:<name>, book:<title>, session:<id> - memory_kind: fact / preference / rule / finding / plan / event / procedure - status: active / tentative / superseded / contradicted - Optional confidence and provenance fields memory_store tool updated: - Accepts source_kind, scope, memory_kind optional params - Wraps content in MemoryEnvelope, ingests as EpisodeType.json - Preserves backward compat (all new params have defaults) Targeted forget (two-step flow): - memory_forget_search: NL query → returns candidate edges with UUIDs - memory_forget_confirm: deletes specific edges by UUID via Cypher - Agent shows candidates, user confirms before deletion Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…tion Phase 3 of Graphiti memory: substantive assistant findings are distilled into structured MemoryEnvelope episodes tagged with source_kind=assistant_derived and status=tentative. Heuristic gate (_is_finding_worthy): - Skip short acknowledgments (<150 chars) - Skip workflow chatter ("Done", "Here's", "I've created", etc.) - Only pass through substantive responses likely containing research results, analysis, or conclusions Distillation (_distill_finding): - Simple truncation for now (first 500 chars) - Queued as EpisodeType.json with MemoryEnvelope metadata - Best-effort: if queue is full, user canonical episode takes priority Both SDK and baseline paths now pass assistant_msg to enqueue_conversation_turn for distillation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Phase 4 of Graphiti memory: search results and warm context now support scope-based filtering to prevent cross-domain memory bleed. memory_search tool: - New optional `scope` parameter for hard filtering - When set (e.g. scope="real:global"), only episodes whose MemoryEnvelope JSON matches that scope are returned - Plain conversation episodes (no JSON envelope) default to real:global scope - Omit scope to search all scopes (backward compatible) Warm context (context.py): - Filters episodes to real:global scope by default - Fiction/project-scoped episodes excluded from session-start context to prevent bleed Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When memory_kind=rule, the agent can now pass a structured `rule` object with instruction, actor, trigger, and negation fields. This preserves exact user intent instead of relying on LLM extraction to reconstruct operational rules from prose. Example: "CC Sarah on client stuff" is stored as: rule.instruction = "CC Sarah on client communications" rule.actor = "Sarah" rule.trigger = "client-related communications" When memory_kind=procedure, a structured `procedure` object captures multi-step workflows with ordered steps, tools, conditions, and negations that don't decompose into fact triples. Both are optional extensions of MemoryEnvelope — backward compatible with existing fact/preference/finding storage. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace raw RELATES_TO-only Cypher with correct deletion that handles all graph artifacts. Soft delete (default): - Sets invalid_at + expired_at on RELATES_TO edges - Edges excluded from default search but history preserved - Reversible — appropriate for "I don't want to see this" Hard delete (hard_delete=true, for GDPR): - Deletes edges across all types (MENTIONS|RELATES_TO|HAS_MEMBER) - Cleans up EpisodicNode.entity_edges back-references - Does NOT delete orphaned entity nodes — they may have summaries, embeddings, or future connections Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Wrap streaming response iteration in try/finally to prevent httpx connection leaks (CLOSE_WAIT sockets) on early exit or tool-call break - Scope graphiti client cache per event loop using WeakKeyDictionary to fix "Future attached to a different loop" errors in multi-worker setups - Harden ingest pipeline with improved error handling and test coverage - Add architecture_test.py for structural validation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-15T04:53:16Z

This pull request has conflicts with the base branch, please resolve those so we can evaluate the pull request.

ntindle · 2026-04-15T07:50:25Z

@copilot resolve the merge conflicts in this pull request

…nvelope # Conflicts: # autogpt_platform/backend/backend/copilot/sdk/service.py Co-authored-by: ntindle <8845353+ntindle@users.noreply.github.com>

Copilot · 2026-04-15T08:00:18Z

@copilot resolve the merge conflicts in this pull request

Resolved in merge commit 8455a13. There was one conflict in service.py — the warm-context injection (our branch) overlapped with the transcript seeding addition from dev. Both changes are independent and have been kept in the correct order.

github-actions · 2026-04-15T08:00:31Z

Conflicts have been resolved! 🎉 A maintainer will review the pull request shortly.

github-actions · 2026-04-15T08:41:29Z

This pull request has conflicts with the base branch, please resolve those so we can evaluate the pull request.

- sdk/service.py: keep both pre_attempt_msg_count init and sdk_model/model_cost_multiplier defaults; take dev's warm_ctx="" (always string, never None) and `or ""` fallback. - service.py: take dev's MEMORY_CONTEXT_TAG + ENV_CONTEXT_TAG instructions, replacing our simpler temporal_context line. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-15T14:19:12Z

Conflicts have been resolved! 🎉 A maintainer will review the pull request shortly.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit e7ae7b5. Configure here.}

Dev merged inject_user_context(warm_ctx=warm_ctx) which handles warm_ctx injection on first turn. Our standalone append at line 2797 and retry reattach at line 2945 caused double injection. Removed both since current_message already carries warm_ctx after inject_user_context. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ntindle and others added 30 commits April 7, 2026 05:23

Merge remote-tracking branch 'origin/dev' into branch12

79abf85

fix: poetry lock

9d252ce

Merge branch 'dev' into branch12

957d5c2

fix(backend): hoist get_graphiti_supplement to top-level import in ba…

36be0f2

…seline service Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix(backend): cancel pending ingestion worker before memory deletion

b1da695

Prevents queued episodes from re-populating the graph after clear_data. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix(backend): remove memory_delete_user_data tool

4f88958

Remove MemoryDeleteTool, its test, response model, permission entry, and openapi.json enum value. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix(backend): remove trailing blank lines in models.py

3eaea83

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix(backend): report error when memory store episode is dropped

a69fa8c

enqueue_episode now returns bool. MemoryStoreTool returns ErrorResponse when the queue is full instead of a false success confirmation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix(backend): restore alphabetical order in permissions tool list

2d087f7

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix(backend): hoist EpisodeType import to top-level in ingest.py

089d3e8

Was duplicated as a deferred import in two functions. The module already imports graphiti-core transitively via .client, so deferring added no benefit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

style(backend): auto-format client.py and graphiti_search.py

a835981

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix(backend): hoist is_enabled_for_user import to top-level in both s…

7f953ab

…ervices The lazy config proxy makes this import lightweight — no graphiti-core pulled in, no .env parsed until first attribute access. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: lint

81eb14e

Merge branch 'dev' of https://github.com/Significant-Gravitas/AutoGPT …

2ffc847

…into branch12

This was referenced Apr 15, 2026

fix(copilot): abort SSE stream and disconnect backend listeners on session switch #12766

Merged

[TMP] [TESTING] merge(preview): consolidated preview of 5 active PRs #12783

Closed

github-actions bot added the conflicts Automatically applied to PRs with merge conflicts label Apr 15, 2026

Copilot started work on behalf of ntindle April 15, 2026 07:50 View session

Merge remote-tracking branch 'origin/dev' into feat/graphiti-memory-e…

8455a13

…nvelope # Conflicts: # autogpt_platform/backend/backend/copilot/sdk/service.py Co-authored-by: ntindle <8845353+ntindle@users.noreply.github.com>

github-actions bot removed the conflicts Automatically applied to PRs with merge conflicts label Apr 15, 2026

Copilot finished work on behalf of ntindle April 15, 2026 08:02

github-actions bot added the conflicts Automatically applied to PRs with merge conflicts label Apr 15, 2026

github-actions bot removed the conflicts Automatically applied to PRs with merge conflicts label Apr 15, 2026

sentry bot reviewed Apr 15, 2026

View reviewed changes

Comment thread autogpt_platform/backend/backend/copilot/sdk/service.py Outdated

cursor bot reviewed Apr 15, 2026

View reviewed changes

Comment thread autogpt_platform/backend/backend/copilot/sdk/service.py Outdated

ntindle merged commit ab3221a into dev Apr 15, 2026
45 checks passed

ntindle deleted the feat/graphiti-memory-envelope branch April 15, 2026 14:40

github-project-automation bot moved this to Done in Frontend Apr 15, 2026

github-project-automation bot moved this from 🆕 Needs initial review to ✅ Done in AutoGPT development kanban Apr 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(backend): MemoryEnvelope metadata model, scoped retrieval, and memory hardening#12765

feat(backend): MemoryEnvelope metadata model, scoped retrieval, and memory hardening#12765
ntindle merged 58 commits intodevfrom
feat/graphiti-memory-envelope

ntindle commented Apr 13, 2026 •

edited by cursor bot

Loading

Uh oh!

github-actions bot commented Apr 15, 2026

Uh oh!

ntindle commented Apr 15, 2026

Uh oh!

Copilot AI commented Apr 15, 2026

Uh oh!

github-actions bot commented Apr 15, 2026

Uh oh!

github-actions bot commented Apr 15, 2026

Uh oh!

github-actions bot commented Apr 15, 2026

Uh oh!

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ntindle commented Apr 13, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why / What / How

Changes 🏗️

Checklist 📋

For code changes:

For configuration changes:

Uh oh!

github-actions bot commented Apr 15, 2026

Uh oh!

ntindle commented Apr 15, 2026

Uh oh!

Copilot AI commented Apr 15, 2026

Uh oh!

github-actions bot commented Apr 15, 2026

Uh oh!

github-actions bot commented Apr 15, 2026

Uh oh!

github-actions bot commented Apr 15, 2026

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ntindle commented Apr 13, 2026 •

edited by cursor bot

Loading