Summary
Concurrent walker invocations that modify the same graph node cause silent data loss due to a last-writer-wins race condition in the persistence layer. The entire node document (including its edge list) is replaced unconditionally at commit time, so the last walker to commit overwrites changes made by earlier concurrent walkers.
Reproduction
- Start a Jac application with
jac start (or jac start --scale)
- Create a walker that adds an edge to the user root:
profile ++> MyNode(...)
- Immediately fire 10+ concurrent walker requests that read/mutate the same
profile node (e.g., updating a last_accessed timestamp)
- After all requests complete, query
[profile -->(?:MyNode)]
- Result: The newly created
MyNode edge is missing — it was overwritten by a concurrent walker's stale snapshot
Concrete example (Jac Builder IDE)
# User creates a project (adds edge to profile node)
POST /walker/project_ops {action: "create", name: "my-project"}
# → success, Project node created, edge added to profile
# Frontend navigates to IDE, fires ~12 concurrent requests:
POST /walker/project_ops {action: "open", project_id: "prj_123"} # mutates profile.updated_at
POST /walker/git_ops {action: "status", project_id: "prj_123"}
POST /walker/ide_file_ops {action: "setup", project_id: "prj_123"}
POST /walker/ai_chat {action: "load_history", ...}
POST /walker/git_ops {action: "branches", ...}
# ... 7 more concurrent requests
# User navigates back to dashboard
POST /walker/project_ops {action: "list"}
# → Project is GONE. Only previously-existing projects remain.
This is 100% reproducible. The more concurrent requests, the higher the chance of data loss.
Root Cause
The read-modify-write cycle (no isolation)
Each HTTP request gets its own ExecutionContext with a private L1 memory dict (__mem__). At request end, commit() flushes changed anchors to the persistent store. There is no concurrency control at any level.
The race
Request A (creates edge) Request B (updates timestamp)
──────────────────────── ──────────────────────────────
1. Load profile from DB
edges = [existing_proj]
2. Load profile from DB
edges = [existing_proj] ← SAME snapshot
3. profile ++> NewProject()
L1 edges = [existing_proj, new_proj]
4. commit() → WRITE to DB
edges = [existing_proj, new_proj] ✓
5. profile.updated_at = now
L1 edges = [existing_proj] ← STALE
6. commit() → WRITE to DB
edges = [existing_proj] ← OVERWRITES step 4
Result: new_proj edge is permanently LOST
Affected persistence backends
MongoDB (jac-scale):
# memory_hierarchy.mongo.impl.jac, MongoBackend.put(), lines 71-87
self.collection.update_one(
{'_id': str(_id)},
{'$set': {'data': data, 'type': type(anchor).__name__}},
upsert=True # ← unconditional full-document replace
)
SQLite (jaclang):
# memory.impl.jac, SqliteMemory.sync(), line ~380
INSERT OR REPLACE INTO anchors (id, type, data) VALUES (?, ?, ?)
# ← unconditional full-row replace
Both backends replace the entire serialized anchor (including embedded edge list) with whatever the committing request had in its private L1 memory. No version check, no merge, no conflict detection.
Why there's no protection
- No optimistic locking: No version field on anchors, no conditional write (
WHERE version = ?)
- No pessimistic locking: No
threading.Lock, asyncio.Lock, or database-level advisory locks anywhere in the persistence path
- No atomic edge operations: Edges are embedded in the parent node's serialized blob, so any edge mutation requires replacing the entire document
- Double commit (doesn't help): Both the middleware (
jfast_api.impl.jac:865) and spawn_walker (server.impl.jac:305) call commit(), but both use the same stale L1 snapshot
Relevant Files
| File |
What |
jac/jaclang/runtimelib/impl/memory.impl.jac (line ~340) |
SqliteMemory.sync — INSERT OR REPLACE with no version check |
jac-scale/jac_scale/impl/memory_hierarchy.mongo.impl.jac (line 71) |
MongoBackend.put — unconditional $set upsert |
jac-scale/jac_scale/jserver/impl/jfast_api.impl.jac (line 846) |
request_context_middleware — creates per-request context, commits at end |
jac/jaclang/runtimelib/impl/server.impl.jac (line 257) |
spawn_walker — loads root, runs walker, commits |
jac/jaclang/jac0core/runtime.jac (line 82) |
_request_context ContextVar — per-coroutine isolation (correct, but insufficient) |
Minimal Reproducing Example + Validated Result
Files
server.jac — 3 walkers: add_item (creates edge), touch (concurrent read-modify-write), list_items (verify)
"""Minimal reproduction of graph persistence race condition."""
import from datetime { datetime }
node Item {
has name: str = "";
has created_at: str = "";
}
"""Create a new Item node connected to user root."""
walker add_item {
has name: str = "test";
can do with Root entry {
now = datetime.utcnow().isoformat() + "Z";
item = here ++> Item(name=self.name, created_at=now);
report {"success": True, "name": self.name, "created_at": now};
}
}
"""List all Item nodes on user root."""
walker list_items {
can do with Root entry {
items: list = [];
for item in [-->(?:Item)] {
items.append({"name": item.name, "created_at": item.created_at});
}
report {"success": True, "count": len(items), "items": items};
}
}
node Counter {
has value: int = 0;
}
"""Simulate a concurrent read-modify-write on the root node.
This walker reads children (loading root's edge list into L1),
mutates a counter node, then commits — writing the stale edge list back."""
walker touch {
has ts: str = "";
can do with Root entry {
items = [-->(?:Item)];
counters = [-->(?:Counter)];
if len(counters) == 0 {
c = here ++> Counter(value=1);
} else {
counters[0].value = counters[0].value + 1;
}
report {"success": True, "item_count": len(items), "ts": self.ts};
}
}
test_race.py — registers a user, runs 5 rounds of (create item + 15 concurrent touch requests), checks for data loss
import requests, concurrent.futures, time, sys, uuid
BASE = "http://localhost:8000" # adjust port if needed
CONCURRENT_TOUCHES = 15
ROUNDS = 5
def register_and_login():
username = f"test_{uuid.uuid4().hex[:8]}@example.com"
password = "TestPass123!"
requests.post(f"{BASE}/user/register", json={"username": username, "password": password})
r = requests.post(f"{BASE}/user/login", json={"username": username, "password": password})
data = r.json()
return data.get("token") or data.get("access_token") or ""
def walker_call(token, walker, payload=None):
r = requests.post(f"{BASE}/walker/{walker}", json=payload or {},
headers={"Authorization": f"Bearer {token}"})
return r.json()
def extract_report(body):
if isinstance(body, dict):
reports = (body.get("data") or {}).get("reports") or body.get("reports") or []
if reports: return reports[0]
if "success" in body: return body
return body
def main():
print("=" * 60)
print("Graph Persistence Race Condition Reproducer")
print("=" * 60)
token = register_and_login()
print(f"Token: {token[:20]}...")
r = extract_report(walker_call(token, "list_items"))
initial_count = r.get("count", 0)
lost_count = 0
total_created = 0
for round_num in range(1, ROUNDS + 1):
item_name = f"item_round_{round_num}"
print(f"\n[Round {round_num}] add_item('{item_name}') + {CONCURRENT_TOUCHES} concurrent touches...")
total_created += 1
with concurrent.futures.ThreadPoolExecutor(max_workers=CONCURRENT_TOUCHES + 1) as pool:
create_future = pool.submit(walker_call, token, "add_item", {"name": item_name})
touch_futures = [pool.submit(walker_call, token, "touch", {"ts": str(i)})
for i in range(CONCURRENT_TOUCHES)]
concurrent.futures.wait([create_future] + touch_futures)
create_resp = extract_report(create_future.result())
print(f" add_item: {create_resp}")
time.sleep(0.5)
list_resp = extract_report(walker_call(token, "list_items"))
current_count = list_resp.get("count", 0)
expected_count = initial_count + round_num
items = [i.get("name") for i in list_resp.get("items", [])]
if item_name not in items:
lost_count += 1
print(f" *** LOST! '{item_name}' not in list ***")
else:
print(f" OK - '{item_name}' survived")
if current_count < expected_count:
print(f" Expected {expected_count} items, got {current_count} — {expected_count - current_count} missing!")
print(f"\n{'=' * 60}")
print(f"RESULTS: {total_created} created, {lost_count} lost")
if lost_count > 0:
print(f"*** BUG CONFIRMED: {lost_count}/{total_created} items silently lost ***")
print(f"Final: {extract_report(walker_call(token, 'list_items'))}")
if __name__ == "__main__":
main()
How to reproduce
1. Create directory with the 3 files above
mkdir race-repro && cd race-repro
2. Start server
jac start server.jac
3. In another terminal, run test (adjust BASE port if needed)
pip install requests
python test_race.py
Validated result
============================================================
Graph Persistence Race Condition Reproducer
============================================================
[Round 1] OK - 'item_round_1' survived
[Round 2] OK - 'item_round_2' survived
[Round 3] OK - 'item_round_3' survived
[Round 4] *** LOST! 'item_round_4' not in list ***
Expected 4 items, got 3 — 1 missing!
[Round 5] OK - 'item_round_5' survived
Expected 5 items, got 4 — 1 missing!
============================================================
RESULTS: 5 created, 1 lost
*** BUG CONFIRMED: 1/5 items silently lost ***
item_round_4 was created successfully (server returned success: True) but was permanently erased from the graph by a concurrent touch walker's commit overwriting the root
node's edge list with a stale snapshot.
Suggested Fixes
Option 1: Per-user-root async lock (quick fix)
Add an asyncio.Lock keyed by user root ID so two walkers for the same user cannot commit simultaneously. Low complexity, prevents the race entirely for single-pod deployments.
Option 2: Optimistic locking with retry (proper fix)
Add a _version field to every anchor. Change writes to:
# MongoDB
result = collection.update_one(
{'_id': str(_id), '_v': old_version},
{'$set': {'data': data}, '$inc': {'_v': 1}}
)
if result.modified_count == 0:
# Conflict — reload, re-merge, retry
-- SQLite
UPDATE anchors SET data = ?, version = version + 1
WHERE id = ? AND version = ?
Option 3: Separate edge storage (best fix)
Move edges out of the parent node's serialized blob into a separate collection/table. Use atomic operations ($addToSet, $pull in MongoDB; separate INSERT/DELETE in SQLite) for edge mutations. This eliminates the need to replace the entire parent document when edges change.
Impact
- Data loss: Graph nodes and edges silently disappear
- Silent: No errors, no warnings — the overwriting commit succeeds
- Affects all applications: Any Jac app that handles concurrent requests touching the same graph nodes
- Worse at scale: Multi-worker deployments (
--scale with Gunicorn) have completely independent in-memory state with zero inter-process coordination
Environment
- jaclang: installed from
jaseci-labs/jaseci@main
- jac-scale: installed from
jaseci-labs/jaseci@main
- Python: 3.12
- OS: Linux (Ubuntu)
- Server: uvicorn (single worker, async)
Summary
Concurrent walker invocations that modify the same graph node cause silent data loss due to a last-writer-wins race condition in the persistence layer. The entire node document (including its edge list) is replaced unconditionally at commit time, so the last walker to commit overwrites changes made by earlier concurrent walkers.
Reproduction
jac start(orjac start --scale)profile ++> MyNode(...)profilenode (e.g., updating alast_accessedtimestamp)[profile -->(?:MyNode)]MyNodeedge is missing — it was overwritten by a concurrent walker's stale snapshotConcrete example (Jac Builder IDE)
This is 100% reproducible. The more concurrent requests, the higher the chance of data loss.
Root Cause
The read-modify-write cycle (no isolation)
Each HTTP request gets its own
ExecutionContextwith a private L1 memory dict (__mem__). At request end,commit()flushes changed anchors to the persistent store. There is no concurrency control at any level.The race
Affected persistence backends
MongoDB (jac-scale):
SQLite (jaclang):
Both backends replace the entire serialized anchor (including embedded edge list) with whatever the committing request had in its private L1 memory. No version check, no merge, no conflict detection.
Why there's no protection
WHERE version = ?)threading.Lock,asyncio.Lock, or database-level advisory locks anywhere in the persistence pathjfast_api.impl.jac:865) andspawn_walker(server.impl.jac:305) callcommit(), but both use the same stale L1 snapshotRelevant Files
jac/jaclang/runtimelib/impl/memory.impl.jac(line ~340)SqliteMemory.sync—INSERT OR REPLACEwith no version checkjac-scale/jac_scale/impl/memory_hierarchy.mongo.impl.jac(line 71)MongoBackend.put— unconditional$setupsertjac-scale/jac_scale/jserver/impl/jfast_api.impl.jac(line 846)request_context_middleware— creates per-request context, commits at endjac/jaclang/runtimelib/impl/server.impl.jac(line 257)spawn_walker— loads root, runs walker, commitsjac/jaclang/jac0core/runtime.jac(line 82)_request_contextContextVar — per-coroutine isolation (correct, but insufficient)Minimal Reproducing Example + Validated Result
Files
server.jac — 3 walkers: add_item (creates edge), touch (concurrent read-modify-write), list_items (verify)
test_race.py — registers a user, runs 5 rounds of (create item + 15 concurrent touch requests), checks for data loss
How to reproduce
1. Create directory with the 3 files above
mkdir race-repro && cd race-repro
2. Start server
jac start server.jac
3. In another terminal, run test (adjust BASE port if needed)
pip install requests
python test_race.py
item_round_4 was created successfully (server returned success: True) but was permanently erased from the graph by a concurrent touch walker's commit overwriting the root
node's edge list with a stale snapshot.
Suggested Fixes
Option 1: Per-user-root async lock (quick fix)
Add an
asyncio.Lockkeyed by user root ID so two walkers for the same user cannot commit simultaneously. Low complexity, prevents the race entirely for single-pod deployments.Option 2: Optimistic locking with retry (proper fix)
Add a
_versionfield to every anchor. Change writes to:Option 3: Separate edge storage (best fix)
Move edges out of the parent node's serialized blob into a separate collection/table. Use atomic operations (
$addToSet,$pullin MongoDB; separateINSERT/DELETEin SQLite) for edge mutations. This eliminates the need to replace the entire parent document when edges change.Impact
--scalewith Gunicorn) have completely independent in-memory state with zero inter-process coordinationEnvironment
jaseci-labs/jaseci@mainjaseci-labs/jaseci@main