GHOST

Grid Homeostasis & Orchestrated Self-healing Topology

Autonomous detection and remediation for container-style failures — proven on a deterministic, audit-friendly loop.

Executive snapshot: External replay recall improved from 6/11 to 11/11 on the same captured lab run, while holding 0 false positives.

Repository · Specification · Help & FAQ · Governance template · Quick start

Progress snapshot (front and center)

What has been delivered so far, with published evidence:

Core deterministic loop is operational and validated across five harness experiments.
Optional local Kubernetes lab pipeline is operational (bootstrap -> inject -> collect/normalize/replay).
Published external replay report shows concrete improvement:
- before normalization refinement: 6/11 detected/resolved, 0 false positives
- after normalization refinement: 11/11 detected/resolved, 0 false positives
Full report: docs/LAB_RUN_REPORT_20260331.md

Data flow: where we started -> where we are now

flowchart LR
  subgraph then["Start (initial PoC scope)"]
    A1[data/seed.py synthetic JSON]
    A2[Watcher -> Healer]
    A3[harness.py experiments 1-3]
    A4[metrics/results.db]
    A1 --> A2 --> A3 --> A4
  end

  subgraph now["Current state (expanded scope)"]
    B1[seed + k8s_clean_signals + near_real_stream]
    B2[harness.py experiments 1-5 + integrations/validate.py gate]
    B3[Optional lab pipeline: bootstrap -> inject -> collect]
    B4[tools/normalize_external_capture.py]
    B5[external replay scoring]
    B6[published report + metrics]
    B1 --> B2 --> B6
    B3 --> B4 --> B5 --> B6
  end

Progress snapshot (front and center)
Data flow: where we started -> where we are now
Overview
Why GHOST exists
What we built
How it works
Detection design (reducing bias)
Kubernetes-style structured signals
Near-real stream & local adapters
Validation & results
Published reports
Command reference
Use cases
Data: synthetic vs real logs
Production & mission-critical systems
Research: layered failures & learning
Quick start
Quick start by persona
Help, FAQ & troubleshooting
References & credits
Project structure
Documentation index
License

Overview

GHOST is a reference implementation of a closed control loop:

log signal → structured event → policy lookup → corrective action → measured outcome

It targets workload-agnostic container runtime failure modes (OOM-style kills, crash loops, probe failures, latency thresholds) using explicit patterns and decision tables — not an LLM and not a third-party agent framework. Phase 1 runs entirely on your machine: synthetic logs, an in-memory service model, and a reproducible harness with SQLite metrics.

Capability	Phase 1
Real cloud / cluster APIs	No (simulated state)
LLM reasoning	No (deterministic matching)
External Python packages	No (standard library)
Repeatable experiment suite	Yes (`harness.py` — five experiments + SQLite metrics)
Policy separated from agent code	Yes (`skills/` modules)
Integration contract gate	Yes (`integrations/validate.py` at start of `harness.py`)
Local file adapters (observe / lab)	Optional (`adapters/` — not in CI)

Why GHOST exists

Containers fail when operators are not staring at dashboards. Logs often already contain the diagnosis; runbooks describe the fix. The weak link is frequently the latency and variance of the human chain: page → wake → context switch → manual execution.

GHOST answers one precise question from our engineering specification:

Can a lightweight system detect a known container runtime failure from a log stream and execute the correct corrective action faster and more reliably than a human — with zero human input after start?

We care because MTTR under automation is measurable. This repository isolates the autonomous loop so we can prove behavior and regression-test it before attaching real infrastructure, identity systems, or richer reasoning layers.

What we built

Concretely, this repository delivers:

Layer	Implementation
Detection policy	`skills/watcher_skills.py` — substring sets per failure type, watched severities, event schema, explicit `CANNOT_DO` boundaries.
Remediation policy	`skills/healer_skills.py` — decision table `(failure_type → action, params)`, timeouts, default unknown handler, outcome schema.
Watcher agent	`agents/watcher.py` — imports patterns only from watcher skills; emits validated events on `ERROR` / `WARNING` lines.
K8s signal policy	`skills/k8s_signal_skills.py` — ordered declarative rules on a `signal` object (`record_type`, `phase`, `reason`, etc.).
K8s signal agent	`agents/k8s_watcher.py` — imports only `k8s_signal_skills`; same event envelope as the log Watcher so the Healer stays unified.
Healer agent	`agents/healer.py` — imports the decision table only from healer skills; executes registered actions against shared state.
Event fabric	`blackboard/event_bus.py` — `asyncio.Queue` with schema validation (typed handoff between agents).
Simulated platform	`simulator/infra_state.py` — `app-service` baseline dict; container actions plus K8s-shaped fields (`image`, `replicas_*`, `scheduling_blocked`, `node_ready`) and matching heal actions.
Synthetic data	`data/seed.py` — log datasets, `k8s_clean_signals.json`, plus `near_real_stream.json` (200 noisy multi-line / kube-prefixed lines, 20 failures); outputs are gitignored.
Streaming	`data/generator.py` — async replay of JSON records for experiments.
Experiments	`run_experiment1.py` … `run_experiment5.py` — through mixed stream, K8s signals, and near-real noisy stream stress test.
Adapters (optional)	`adapters/observe.py` (Watcher-only file tail), `adapters/lab_run.py` (`--dry-run` or full loop on simulator). Not run in CI.
Lab data pipeline (optional)	`lab/` + `tools/` scripts: bootstrap/inject/collect/normalize/replay for external datasets; local only, not in CI.
Harness & metrics	`harness.py` + `metrics/recorder.py` — orchestrates all scenarios, prints a summary, persists rows to `metrics/results.db`.

Design rule: agents never duplicate patterns or decision tables inline — skills are the single source of truth for review, diff, and compliance-style audits.

How it works

Watcher scans each log record (optionally tagged with a stream index). If severity is in scope, it walks DETECTABLE_PATTERNS in order and publishes one event on the first substring hit in message.
Healer awaits an event, resolves (action, params) via DECISION_TABLE (or DEFAULT_ACTION), runs the matching function in ACTION_REGISTRY on infra_state, then runs POST_HEAL_VERIFIERS from healer_skills.py on the updated state (unless dry_run or log_unknown). If the predicate fails, success is false even when the action raised no exception. Timing uses asyncio.wait_for per skill timeouts. For shadow / lab, heal_once(..., dry_run=True) skips mutation and skips verification.
Harness resets metrics DB, runs integrations/validate.py (required paths + Hermes policy shape), then drives five experiments: log detection, log full loop, mixed stream (100/10), structured K8s-style signals (k8s_clean_signals.json), and near-real noisy stream (200/20, near_real_stream.json). On failure it exits non-zero (CI uses the same path).

flowchart TB
  subgraph policy [Policy layer]
    WSK[skills/watcher_skills.py]
    HSK[skills/healer_skills.py]
  end
  subgraph runtime [Runtime loop]
    JSON[Generated JSON logs]
    W[Watcher]
    Q[asyncio Queue]
    H[Healer]
    INFRA[infra_state]
    DB[(metrics/results.db)]
  end
  WSK -.-> W
  HSK -.-> H
  JSON --> W
  W --> Q
  Q --> H
  H --> INFRA
  H --> DB

Detection design (broader coverage, less bias)

Case-insensitive matching — Log lines are matched with Unicode casefold, and severities accept any casing (e.g. error / ERROR). That avoids favoring one vendor’s capitalization (Kubernetes vs Docker vs PaaS logs).
Vendor-neutral phrases — DETECTABLE_PATTERNS includes multiple paraphrases per class (OOM / cgroup wording, crash-loop and backoff wording, probe and health-check failures, latency and timeout phrasing) so the PoC is not tuned to a single message shape.
Diverse synthetic failures — data/seed.py picks among several templates per failure type for clean and mixed datasets, so experiments are not overfit to four fixed strings.
Shared healthy check — The seed script uses the same any_pattern_matches_message() helper as policy in watcher_skills.py, so “no false patterns in healthy logs” is evaluated with the same rules as the Watcher (healthy lines were adjusted so phrases like “response time … within threshold” do not collide with latency rules once matching is case-insensitive).

First matching failure type in DETECTABLE_PATTERNS iteration order wins; patterns are ordered so higher-signal phrases are considered in a stable priority.

Kubernetes-style structured signals (Experiment 4)

This is not a live cluster client: it is the same Watcher → Healer loop fed by JSON that resembles what you would derive from kube-apiserver watches (Pod/Node/Deployment-shaped objects).

Synthetic class	Typical real-world analogue	Simulated heal
`ImagePullBackOff` / `ErrImagePull`	Bad image tag, registry auth	Roll back to `image_previous`
`SchedulingBlocked`	`FailedScheduling` (resources, taints)	Clear `scheduling_blocked`
`NodeNotReady`	Node condition NotReady	Set `node_ready`
`ReplicaMismatch`	Deployment ready ≠ desired	`sync_replicas`
`PodDown` (Evicted)	Pod `Failed` + evicted / node pressure	`restore_workload`

Why this matters: log substring matching alone is biased toward whatever format your app prints. Production agents usually combine typed API objects + events + metrics. Experiment 4 is a stdlib-only stepping stone: swap signal ingestion for an informer later without changing the Healer contract.

Near-real stream (Experiment 5) & local adapters

Experiment 5 replays near_real_stream.json (from seed.py): 200 records with kube-style timestamps, optional multi-line / stack-ish prefixes, and sometimes JSON-shaped log lines; 20 failures are shuffled among 180 healthy records. It applies the same scoring rules as Experiment 3 (detect / false positives / resolve vs near_real_ground_truth.json). This is still synthetic text — it stress-tests the current substring policy, not your production corpus.

Adapters (under adapters/) are optional tools for local workflows and are not executed in CI:

Script	Purpose
`adapters/observe.py`	JSON array file → Watcher only → JSONL detection lines (no Healer).
`adapters/lab_run.py`	Same file → Watcher + Healer on the simulator; use `--dry-run` to skip `ACTION_REGISTRY` side effects.

For rollout tiers, charter, and game-day checklist (process only), see docs/GOVERNANCE.md.

External lab replay pipeline (optional)

For higher-fidelity local data without expanding CI scope, this repo includes a minimal pipeline:

Bootstrap lab and deploy a test workload (lab/bootstrap_lab.ps1).
Inject deterministic failures (lab/inject_failures.ps1).
Collect events/logs (tools/collect_k8s_lab_data.py).
Normalize to GHOST replay shape + ground truth (tools/normalize_external_capture.py).
Score with the same Watcher/Healer loop (tools/run_external_replay.py → experiments/run_experiment_external.py).

One-command wrapper (PowerShell): lab/collect_and_normalize.ps1.

This path is local-only and not wired into harness.py or CI.

Latest published local lab run report: docs/LAB_RUN_REPORT_20260331.md.

Validation & results

Continuous integration: every push and pull request to main runs seed.py and harness.py on Python 3.11 via GitHub Actions (see the CI badge at the top). harness.py first runs integrations/validate.py (stdlib check for contract files and core paths).

Locally, the same commands execute:

Experiment	What it proves	Expected outcome
1 — Detection	Watcher finds all four failure types on clean logs	4 / 4 scenarios PASS
2 — Full loop	Healer applies correct mutations after each clean failure (infra reset per scenario)	4 / 4 assertions PASS (memory, port, instances, restart semantics)
3 — Mixed stream	100 lines: 90 healthy + 10 injected failures	10 / 10 detected, 0 false positives on healthy lines, 10 / 10 resolved vs ground truth
4 — K8s signals	6 structured `signal` records (2× image pull paths + scheduling + node + replicas + evicted pod)	6 / 6 PASS
5 — Near-real noisy stream	200 lines: 180 healthy + 20 injected failures (noisy envelopes)	20 / 20 detected, 0 false positives on healthy lines, 20 / 20 resolved vs ground truth

Timing: On fast local hardware, reported detect/decide/act milliseconds may round to 0 ms; correctness is enforced by assertions, not wall-clock drama. Add delays in the generator or real I/O when you need representative latency distributions.

All runs append structured rows to metrics/results.db for downstream reporting or dashboards. Each successful harness run also appends a JSON summary to feedback_rows (policy versions, per-experiment pass flags, Experiment 3 and 5 counts) via metrics/feedback.py.

Published reports

The repo now includes an executed lab report with concrete artifact paths and replay metrics:

docs/LAB_RUN_REPORT_20260331.md

Latest published highlights from that report:

Initial external replay on captured lab data: detected 6/11, resolved 6/11, false_positives 0.
Follow-up normalization fix (BackOff pull-image mapping) on the same run: detected 11/11, resolved 11/11, false_positives 0.
Production meaning: recall bottleneck was in normalization semantics, not in the core Watcher/Healer execution path.

Command reference

Run all commands from repository root.

Goal	Command
Generate all synthetic datasets	`python data/seed.py`
Run full CI-equivalent harness	`python harness.py`
Watcher-only on a file stream	`python adapters/observe.py data/mixed_stream.json`
Full loop in dry-run mode	`python adapters/lab_run.py --dry-run data/near_real_stream.json`
Full loop with simulator mutation	`python adapters/lab_run.py data/mixed_stream.json`
Bootstrap local K8s lab	`./lab/bootstrap_lab.ps1`
Inject deterministic lab failures	`./lab/inject_failures.ps1`
Collect + normalize + replay lab data	`./lab/collect_and_normalize.ps1`
Manual external replay	`python tools/run_external_replay.py --data data/external/runs/<run-id>/normalized.json --ground-truth data/external/runs/<run-id>/ground_truth.json --record`
Validate integration contract files only	`python integrations/validate.py`

Use cases

Use case	What to run	Output / decision value
Validate policy correctness before any infra work	`python data/seed.py` then `python harness.py`	Reproducible pass/fail across five experiments; blocks policy regressions early.
Observe-only triage on captured logs	`python adapters/observe.py <path-to-json-array>`	Detection events only; no state mutation; safe for shadow analysis.
Dry-run autonomous response rehearsal	`python adapters/lab_run.py --dry-run <path-to-json-array>`	End-to-end detect/decide trace without applying actions.
Evaluate action correctness in simulator	`python adapters/lab_run.py <path-to-json-array>`	Simulated state transitions + post-heal verification outcomes.
Reproduce Kubernetes-style incidents locally	`./lab/bootstrap_lab.ps1`, `./lab/inject_failures.ps1`, `./lab/collect_and_normalize.ps1`	Captured artifacts + replay score on near-real local signals.
Measure external replay quality over time	`python tools/run_external_replay.py ... --record`	Detection/precision/resolution metrics appended for trend tracking.
Author policy updates safely	Edit `skills/` + run `python harness.py` + `python integrations/validate.py`	Enforces skills-as-policy boundary and integration contract completeness.
Prepare production rollout process	Fill `docs/GOVERNANCE.md`	Defines autonomy tiers, blast radius, and change control before live execution.

Data: synthetic vs real-world samples

Nothing stops you from using real or open-source log data — the project ships synthetic JSON by default for four practical reasons:

Reason	Detail
Reproducibility	CI and contributors need identical inputs; pinned synthetic output from `seed.py` guarantees that.
Safety	Production logs routinely contain secrets, PII, and internal hostnames — they must not land in a public git history.
Licensing	Public “open” log corpora still carry terms (attribution, research-only, no redistribution). Compliance is your obligation when you import them.
Schema & labels	GHOST experiments expect structured records and (for scoring) known failure classes. Raw downloads need ETL and often manual or semi-automatic labeling.

“Training” in this PoC does not mean neural-network training. The agents are explicit policies (substring / ordered rules + decision tables). Improving them means engineering: extend skills/watcher_skills.py and skills/k8s_signal_skills.py, validate with harness.py. A future ML layer would be a separate pipeline with its own data governance.

Where to put optional real or redacted samples locally: data/external/README.md — files there stay out of git by default (except that README). See the full operational guide in docs/HELP.md (FAQ: Can we download real scenarios from open-source log providers?).

Production & mission-critical systems

GHOST Phase 1 is a laboratory instrument, not a production controller. The ideas it embodies, however, map directly to how serious teams introduce automation safely.

What transfers well

Explicit policy (versioned patterns + action tables) with separation from execution code — supports review, RBAC on changes, and post-incident audit (“what could the robot do?”).
Closed-loop tests before prod: the same structure you see in Exp 2–3 should eventually run against staging APIs with frozen golden logs and expected state transitions.
Fast, bounded remediation for known classes: restarts within caps, scale-out within limits, cache clears — actions that are reversible and idempotent when designed well.

What production must add

Risk in naive automation	Mitigation in mission-critical environments
Log substring false positives	Structured signals, alert correlation, rate limits, dry-run / canary, human approval for destructive classes
Blast radius	Hard quotas, multi-account isolation, circuit breakers, automatic rollback hooks
Unknown / correlated failures	Escalation paths, SLO-based policy, runbook coverage; LLM/heuristics after guardrails and retrieval — not instead of deterministic paths
Governance	IAM-bound actions, change windows, immutable audit trail, integration with ticketing and postmortems

Practical tiers (how organizations usually evolve)

Assisted ops — automation gathers context and proposes steps; humans execute risky changes.
Guardrailed autonomy — small set of low-blast, reversible actions with hard caps and shadow mode first.
Expanded policy — broader coverage only where harnesses and game days prove safety.

A fill-in template aligned to these ideas (charter, tier definitions, blast radius, drills) lives in docs/GOVERNANCE.md.

Bottom line: GHOST demonstrates that a deterministic autonomous loop can be built clearly and tested. For mission-critical workloads, the long-term value is shorter MTTR on known paths and less cognitive load on operators — provided automation is constrained, observable, and never the only line of defense.

Research: layered failures & learning

Today’s PoC is intentionally small. The next step toward human-like troubleshooting under incomplete information is to reason across layers (logs, manifests, network, APIs, data) with specialist agents and a coordinator, not a single log grep.

docs/VISION_LAYERED_LEARNING.md — layered failure model, partial observability, swarm-style roles (Hermes-like orchestration without claiming a product), topology-aware bias, an honest taxonomy of feedback loops, and how external development tooling (e.g. gstack) fits next to GHOST as policy/code authoring support—not unguarded prod operators.
metrics/feedback.py — after each harness.py run, an append-only feedback_rows record is stored in metrics/results.db with pass/fail flags for all five experiments, layer tags (including log_near_real_noisy), and policy (skills) versions so batch jobs can correlate outcomes with policy state (hook for offline policy improvement — not online learning in agents).

Agents here do not perform online gradient descent; “learning” means closing the loop from verified outcomes into policy updates you promote through tests.

Quick start

git clone https://github.com/beejak/GHOST-PoC.git
cd GHOST-PoC
python data/seed.py
python harness.py

Important: Generated JSON under data/ is not committed (see .gitignore). Always run seed.py after a fresh clone before harness.py.

Optional: python data/seed.py --seed 123 — different shuffle of failures inside the mixed stream.

Runtime: Python 3.11+ recommended; 3.9+ may work with the current codebase. Phase 1 requires no pip install.

Quick start by persona

Persona	Fastest path	Why this path
Operator / SRE evaluator	`python data/seed.py` -> `python harness.py`	Confirms baseline policy correctness before touching any lab tooling.
Policy author (skills editor)	Edit `skills/` -> `python data/seed.py` -> `python harness.py`	Ensures every rule change is validated across all five experiments.
Shadow-mode reviewer	`python adapters/observe.py data/mixed_stream.json`	Lets you inspect detections without mutation side effects.
Autonomy rehearsal owner	`python adapters/lab_run.py --dry-run data/near_real_stream.json`	Exercises full detect/decide flow while staying non-destructive.
Lab pipeline engineer	`./lab/bootstrap_lab.ps1` -> `./lab/inject_failures.ps1` -> `./lab/collect_and_normalize.ps1`	Produces replayable local K8s-derived data with measurable outcomes.
Governance / risk lead	Read `docs/GOVERNANCE.md` + `docs/LAB_RUN_REPORT_20260331.md`	Maps technical results to rollout tiers, blast radius, and controls.

Help, FAQ & troubleshooting

Question	Short answer
Why no real logs in the repo?	Reproducibility, CI, licensing, and secret/PII risk — see Data: synthetic vs real above.
Can we download open-source log datasets to “train” the agents?	Yes, locally, if the license fits your use case. Today’s agents are rule-based; you refine skills and re-run the harness, not a model trainer. Normalize into the same JSON shape as generated `clean_failures.json`.
Harness failed on experiment N	Re-run `python data/seed.py`. If it persists, open `docs/HELP.md` → Quick troubleshooting and match the error pattern. Integration contract validation failed means `integrations/validate.py` exited non-zero (missing contract file or expected path).
Where is detailed help?	`docs/HELP.md` — troubleshooting table, extended FAQ, extension patterns, support pointers.
How do I change detection or healing?	Only via `skills/` and `simulator/infra_state.py`; never duplicate tables inside `agents/`.

Common fixes

FileNotFoundError on data/*.json → run python data/seed.py from the repository root.
Healthy baseline assertion failed → template overlap with patterns; adjust data/seed.py or skills/watcher_skills.py.
All timings 0 ms → expected on fast CPUs; assertions still prove correctness.

For incident-style walkthroughs, licensing notes, and a path to optional data/external/ workflows, read docs/HELP.md.

References & credits

Attribution for external repositories and upstream projects referenced by this PoC:

Project	Link	How it is used here
GHOST-PoC (this repository)	beejak/GHOST-PoC	Primary implementation, experiments, docs, and CI.
gstack	garrytan/gstack	Referenced for skill-oriented AI development workflows; integrated via compatibility docs and maintainer skill patterns under `integrations/gstack/`.
Hermes Agent (Nous Research)	NousResearch/hermes-agent	Referenced as an optional external agent runtime; this repo ships only integration contracts/policies in `integrations/hermes/`, not Hermes runtime code.

Notes on scope and credit:

This repo does not vendor third-party runtime source from gstack or Hermes.
Integration is contract-based (policy files, prompts, maintainer guidance), with explicit upstream links for install and licensing.
If additional external repos are adopted later, add them here with link, license, and exact usage boundary.

Project structure

GHOST-PoC/
├── docs/
│   ├── HELP.md             # In-depth help, FAQ, real-log guidance, troubleshooting
│   └── GOVERNANCE.md       # Template: tiers, charter, game days (org process; not enforced in code)
├── adapters/               # Optional observe / lab_run (local files → agents)
├── lab/                    # Optional K8s lab scripts/manifests (bootstrap/inject/pipeline wrapper)
├── tools/                  # External data pipeline scripts (collect/normalize/replay)
├── integrations/           # Hermes + gstack-compatible contracts; validate.py (no LLM in CI)
├── skills/                 # Policy: log patterns, K8s signal rules, decision table
├── agents/                 # Watcher, K8s watcher & Healer (import skills only)
├── blackboard/             # Event bus (asyncio queue + validation)
├── simulator/              # Fake infra state + action implementations
├── data/
│   ├── seed.py             # Synthetic dataset generator
│   ├── generator.py        # Async JSON stream for harness
│   ├── scenarios.json      # Scenario metadata
│   └── external/           # Gitignored drops for redacted real samples (README only in git)
├── experiments/            # Experiment 1–5 runners
├── metrics/                # SQLite recorder, reporter, harness feedback ledger
├── harness.py              # Single entrypoint: all experiments
├── Ghost PoC.md.txt        # Full build specification
├── README.md
├── LICENSE
└── requirements.txt        # Phase 2 placeholders only

Documentation index

Document	When to read it
README.md (this file)	First-time orientation, architecture, validation summary, quick start.
docs/HELP.md	World-class operational help: troubleshooting matrix, full FAQ (including real vs synthetic logs), extension guide, support.
docs/VISION_LAYERED_LEARNING.md	Research architecture: layered failures, partial info, runtime swarm pattern, feedback roadmap, virtual dev team vs GHOST boundary.
docs/GOVERNANCE.md	Rollout template: autonomy tiers, policy change control, blast radius, game days (fill in for your org).
docs/LAB_RUN_REPORT_20260331.md	First executed local lab pipeline report (artifacts + replay metrics).
Ghost PoC.md.txt	Formal specification, definition of done, build order, synthetic vs real appendix.
data/external/README.md	Where optional local / redacted corpora go and what not to commit.
lab/README.md	Minimal lab workflow to generate external data and replay it locally.
tools/README.md	Collect / normalize / replay scripts for external datasets.
integrations/README.md	Hermes (Nous) tool policy + maintainer skill aligned with gstack; `validate.py` runs inside `harness.py`.
integrations/hermes/README.md	Installing Hermes upstream; mapping `TOOL_POLICY.json` to your tool config.
integrations/gstack/README.md	Vendoring / using the gstack-compatible maintainer skill next to upstream gstack.

License

Licensed under the Apache License 2.0 — see LICENSE.

GHOST · Prove the loop in the lab. Earn the right to run it in production.

_{If you extend this work, preserve the skills-as-policy pattern — it is the primary maintainability and auditability lesson from Phase 1.}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GHOST

Grid Homeostasis & Orchestrated Self-healing Topology

Progress snapshot (front and center)

Data flow: where we started -> where we are now

Table of contents

Overview

Why GHOST exists

What we built

How it works

Detection design (broader coverage, less bias)

Kubernetes-style structured signals (Experiment 4)

Near-real stream (Experiment 5) & local adapters

External lab replay pipeline (optional)

Validation & results

Published reports

Command reference

Use cases

Data: synthetic vs real-world samples

Production & mission-critical systems

Research: layered failures & learning

Quick start

Quick start by persona

Help, FAQ & troubleshooting

References & credits

Project structure

Documentation index

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github/workflows		.github/workflows
adapters		adapters
agents		agents
blackboard		blackboard
data		data
docs		docs
experiments		experiments
integrations		integrations
lab		lab
metrics		metrics
simulator		simulator
skills		skills
tools		tools
.gitignore		.gitignore
Ghost PoC.md.txt		Ghost PoC.md.txt
LICENSE		LICENSE
README.md		README.md
harness.py		harness.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

GHOST

Grid Homeostasis & Orchestrated Self-healing Topology

Progress snapshot (front and center)

Data flow: where we started -> where we are now

Table of contents

Overview

Why GHOST exists

What we built

How it works

Detection design (broader coverage, less bias)

Kubernetes-style structured signals (Experiment 4)

Near-real stream (Experiment 5) & local adapters

External lab replay pipeline (optional)

Validation & results

Published reports

Command reference

Use cases

Data: synthetic vs real-world samples

Production & mission-critical systems

Research: layered failures & learning

Quick start

Quick start by persona

Help, FAQ & troubleshooting

References & credits

Project structure

Documentation index

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages