Production-Ready Memory Service for LLM Agents via Model Context Protocol (MCP)
SimpleMem MCP Server is a cloud-hosted long-term memory service for LLM agents, implementing the Streamable HTTP transport (MCP 2025-03-26 spec). It enables AI assistants like Claude, Cursor, and other MCP-compatible clients to store, retrieve, and query conversational memories with ease.
- Semantic Lossless Compression: Converts dialogues into atomic, self-contained facts
- Coreference Resolution: Automatically replaces pronouns (he/she/it) with actual names
- Temporal Anchoring: Converts relative times (tomorrow, next week) to absolute timestamps
- Hybrid Retrieval: Semantic search + keyword matching + metadata filtering
- Intelligent Planning: Automatic query decomposition and reflection for complex queries
- Multi-tenant Isolation: Per-user data tables with token authentication
- OpenRouter Integration: Powered by OpenRouter's LLM and Embedding services
- Production Optimized: Faster response times compared to the academic reference implementation
┌─────────────────────────────────────────────────────────────────┐
│ SimpleMem MCP Server │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ HTTP Server (FastAPI) │ │
│ │ ┌────────────┐ ┌────────────┐ ┌────────────────────┐ │ │
│ │ │ Web UI │ │ REST API │ │ MCP Streamable │ │ │
│ │ │ (/) │ │ (/api/*) │ │ HTTP (/mcp) │ │ │
│ │ └────────────┘ └────────────┘ └────────────────────┘ │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Token Authentication │ │
│ │ (JWT + AES-256 Encrypted API Keys) │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌────────────────────┼────────────────────┐ │
│ ▼ ▼ ▼ │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ │
│ │ User A │ │ User B │ │ User C │ │
│ │ Table │ │ Table │ │ Table │ │
│ └────────────┘ └────────────┘ └────────────┘ │
│ └─────────────────── LanceDB ──────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ OpenRouter API Integration │ │
│ │ LLM: openai/gpt-4.1-mini Embed: qwen/qwen3-embed-4b │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
The easiest way to use SimpleMem is via our hosted service at https://mcp.simplemem.cloud
- Visit
https://mcp.simplemem.cloud - Enter your OpenRouter API Key
- Get your authentication token
- Configure your MCP client (see below)
cd MCP
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt# Production environment recommended settings
export JWT_SECRET_KEY="your-secure-random-secret-key"
export ENCRYPTION_KEY="your-32-byte-encryption-key!!"python run.pyOutput:
============================================================
SimpleMem MCP Server
Multi-tenant Memory Service for LLM Agents
============================================================
Web UI: http://localhost:8000/
REST API: http://localhost:8000/api/
MCP: http://localhost:8000/mcp
------------------------------------------------------------
| Item | Value |
|---|---|
| Protocol Version | 2025-03-26 |
| Transport | Streamable HTTP |
| Message Format | JSON-RPC 2.0 |
| Authentication | Bearer Token |
| Endpoint | Method | Description |
|---|---|---|
/mcp |
POST | Send JSON-RPC messages (requests, notifications) |
/mcp |
GET | Server-to-client SSE stream |
/mcp |
DELETE | Terminate session |
All MCP requests require a Bearer token in the Authorization header:
Authorization: Bearer <your-token>
After initialization, include the session ID header:
Mcp-Session-Id: <session-id>
| Tool | Description |
|---|---|
memory_add |
Add a single dialogue to memory (auto-extracts facts, resolves pronouns, anchors timestamps) |
memory_add_batch |
Add multiple dialogues at once |
memory_query |
Query memories and generate AI-synthesized answers (with planning + hybrid retrieval + reflection) |
memory_retrieve |
Retrieve relevant memory entries (returns raw data) |
memory_stats |
Get memory statistics |
memory_clear |
Clear all memories (irreversible) |
Add to your MCP JSON settings:
{
"mcpServers": {
"simplemem": {
"url": "https://mcp.simplemem.cloud/mcp",
"headers": {
"Authorization": "Bearer YOUR_TOKEN"
}
}
}
}Dialogue Input Processing Memory Storage
───────────────────────────────────────────────────────────────────────────
"I'll meet Bob ┌─────────────────┐
at Starbucks │ LLM Processing │
tomorrow at 3pm" ──▶│ │ ──────────────▶ Atomic Fact
└─────────────────┘
│
▼
┌─────────────────────────┐
│ Atomic Fact: │
│ "User will meet Bob at │
│ Starbucks on │
│ 2025-01-15 at 15:00" │
│ │
│ persons: [User, Bob] │
│ location: Starbucks │
│ timestamp: 2025-01-15 │
│ topic: Meeting │
└───────────┬─────────────┘
│
▼
┌─────────────────────────┐
│ Embedding │
│ (qwen3-embed-4b) │
└───────────┬─────────────┘
│
▼
┌─────────────────────────┐
│ LanceDB Vector Store │
└─────────────────────────┘
User Question: "When am I meeting Bob?"
│
▼
┌───────────────────────────────┐
│ 1. Query Complexity Analysis │
│ - Type: Temporal query │
│ - Entity: Bob │
│ - Complexity: 0.3 (simple)│
└───────────────┬───────────────┘
│
▼
┌───────────────────────────────┐
│ 2. Generate Search Queries │
│ → "Bob meeting time" │
└───────────────┬───────────────┘
│
▼
┌───────────────────────────────┐
│ 3. Hybrid Retrieval │
│ - Semantic (vector) │
│ - Keyword (BM25) │
│ - Metadata (persons) │
└───────────────┬───────────────┘
│
▼
┌───────────────────────────────┐
│ 4. Answer Generation │
│ Context + Question → LLM │
└───────────────┬───────────────┘
│
▼
┌───────────────────────────────┐
│ Response: │
│ { │
│ "answer": "15 January 2025 │
│ at 3:00 PM at │
│ Starbucks", │
│ "confidence": "high", │
│ "contexts_used": 1 │
│ } │
└───────────────────────────────┘
| Option | Default | Description |
|---|---|---|
window_size |
20 | Number of dialogues per processing batch |
semantic_top_k |
25 | Semantic search result count |
keyword_top_k |
5 | Keyword search result count |
enable_planning |
true | Enable query planning |
enable_reflection |
true | Enable reflection iteration |
max_reflection_rounds |
2 | Maximum reflection rounds |
llm_model |
openai/gpt-4.1-mini | LLM model |
embedding_model |
qwen/qwen3-embedding-4b | Embedding model |
# Development mode (auto-reload)
python run.py --reload
# Specify port
python run.py --port 3000
# View help
python run.py --helpMIT License
Built upon SimpleMem research implementation, refactored and optimized for production deployment with multi-tenant support, faster processing, and comprehensive user isolation.