Continuous Operation Playbook (1h → 30d)

This playbook shows how to run an agent continuously while using ACP as the governance layer.

Core pattern

Run the worker continuously, but rotate ACP sessions in windows.

Worker lifetime: long-lived (days to weeks).
Session lifetime: bounded control window (hourly or daily).
All mutating calls use command_id for replay-safe retries.
state_bearing=True events stay fail-closed.

This gives you unattended execution without leaving one giant unbounded session open for a month.

Time horizons

First hour

Run one worker and one session.
Enable budgets and approval gates on risky actions.
Verify restart replay safety with repeated command_id.
Confirm no state-bearing write failures.

Example:

uv run python examples/long_running_autonomous_agent.py --horizon hour

First day

Keep worker continuous.
Rotate sessions every 1-4 hours.
Add periodic checkpoints.
Alert on pending-approval age and budget exhaustion.

Example:

uv run python examples/long_running_autonomous_agent.py --horizon day

First week

Move to Postgres and multi-worker deployment.
Run recovery and timeout escalation on startup and on schedule.
Run kill-switch and approval-backlog drills.
Review denial reasons and adjust policy thresholds.

First month

Run daily session windows for audit and blast-radius control.
Keep low-risk actions auto-approved and high-risk actions denied/escalated.
Track scorecard trends: approvals, denials, budget exhaustions, recovery counts.
Trigger operator response only on alerts or incident runbooks.

Different operating angles

Reliability

command_id on all mutating operations.
Short session windows with deterministic close/open boundaries.
Recovery scans before taking traffic.

Safety

Risk-based decisioning: deny/escalate sensitive actions.
Kill at smallest safe scope first (session before system).
Treat state-bearing write failures as hard failures.

Cost and throughput

Check budget before execution; increment after successful execution.
Tune per-session limits from observed usage.
Alert on rate changes, not just absolute counts.

Token governance

Create TokenBudgetConfig entries per identity scope (user, team, org) and period (daily, weekly, monthly).
Call ModelGovernor.check_access() before routing to enforce model tier restrictions.
Call TokenBudgetTracker.check_budget() before execution and record_usage() after.
Monitor TOKEN_BUDGET_EXHAUSTED and MODEL_ACCESS_DENIED events for policy tuning.
Review token usage summaries weekly to validate budget thresholds against actual consumption.
Use identity overrides sparingly — they bypass tier restrictions for specific users.

Audit and recovery

Persist event trail for replay.
Use periodic checkpoints for fast rollback points.
Keep correlation IDs and idempotency keys in logs.

Recommended defaults

Local/demo: SQLite.
Production: Postgres.
Session rotation: start hourly, move to daily once stable.
Checkpoint cadence: every 12-24 cycles.
Approval timeout: deny-by-default on expiration.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Continuous Operation Playbook (1h → 30d)

Core pattern

Time horizons

First hour

First day

First week

First month

Different operating angles

Reliability

Safety

Cost and throughput

Token governance

Audit and recovery

Recommended defaults

FilesExpand file tree

continuous_operation_playbook.md

Latest commit

History

continuous_operation_playbook.md

File metadata and controls

Continuous Operation Playbook (1h → 30d)

Core pattern

Time horizons

First hour

First day

First week

First month

Different operating angles

Reliability

Safety

Cost and throughput

Token governance

Audit and recovery

Recommended defaults