| name | webview |
|---|---|
| description | Show a native macOS webview UI to the user and get structured input back. Use when an interactive macOS session needs human-in-the-loop for multi-field input, option selection from 5+ choices, approval with context, or content review — NOT for yes/no questions. Handles the full flow — spawning the webview, generating A2UI or HTML content, parsing the user's response. Triggers on "show UI", "ask user", "approval needed with context", "let me pick from these options", "fill in these fields", or any situation where a structured GUI materially beats asking in the terminal. Skip in CI/non-interactive environments. |
Open a native webview window from an agent workflow, let the user interact, get structured JSON back. Replaces terminal Q&A with a proper UI when it helps.
You — the agent — always run webview-cli via the Bash tool. Never ask the user to paste a command. The binary blocks until the user interacts with the window, then prints JSON to stdout. Your Bash call returns when the user clicks a button (or cancels/times out). Parse the result and proceed.
If running the command feels risky, ask for permission to run it — not to paste it. In auto mode, just run it.
| Situation | Use this skill? |
|---|---|
| Agent needs approval with context (diff, summary, risk) before a destructive action | Yes — richer than [y/N] |
| Collecting 3+ related fields (name + role + options) | Yes — one form beats N questions |
| Presenting 5-20 options to choose from | Yes — radio/select picker |
| Showing a diff/preview/report before user confirms | Yes — formatted content is readable |
| Reviewing a markdown doc (spec, PR description, draft) with comments/edits | Yes — use --markdown mode (Mode 3) |
| Simple yes/no question | No — just ask in terminal |
| Agent-side computation taking >30s before UI can render | No — show a text status instead |
| Non-interactive environment (CI, piped output) | No — preflight will fail, fall back to terminal |
# Binary on PATH?
if ! command -v webview-cli >/dev/null 2>&1; then
echo "ERROR: webview-cli not installed. Run: brew tap giannimassi/tap && brew install webview-cli" >&2
exit 1
fi
# macOS-only?
if [ "$(uname -s)" != "Darwin" ]; then
echo "ERROR: webview-cli requires macOS (detected $(uname -s))." >&2
exit 1
fiIf preflight fails, fall back to terminal Q&A — don't try to be clever. Report why the webview path isn't available and continue in text.
Exactly one of --a2ui, --url, or --markdown must be specified. Omitting all three is the most common failure — the binary exits 3 with usage.
Best for forms, approvals, selection. The skill generates A2UI JSONL; webview-cli renders it natively.
cat /tmp/ui.jsonl | webview-cli --a2ui --title "..." --width 520 --height 460 --timeout 300Best for OAuth flows, external services, pre-built pages. The page must call window.webkit.messageHandlers.complete.postMessage({...}) to signal done. See references/examples.md Example 3 for the HTML-side bridge pattern.
webview-cli --url "https://auth.example.com/oauth" --timeout 120Best for spec review, PR-description review, draft approval — any flow where the agent produced markdown and the human needs to read, comment on, or edit it. Feed markdown on stdin. Rendered with CommonMark + GFM (tables, fenced code).
Three independent toggles:
| Flag | Adds |
|---|---|
| (none) | Read-only preview + OK/Cancel. Returns {"action":"acknowledge"} |
--comments |
Clickable blocks → inline comment sidebar + doc-level comment field. Returns {comments:[...], doc_comment:"..."} |
--edits |
Preview/Source tabs (Cmd+/ to toggle). Returns {edited_text:"...", modified:bool} |
--allow-html |
Opt out of HTML sanitization (default strips <script>, <iframe>, event handlers, javascript: URLs). Use only for trusted content. |
--comments and --edits compose; turn both on to get all four fields back.
cat /tmp/spec.md | webview-cli --markdown --comments --title "Review spec v1" --width 900 --height 720 --timeout 540Comment object shape (each entry in the comments array):
{"source_line_start": 5, "source_line_end": 5, "quoted_text": "...", "body": "..."}Line numbers are 1-indexed against the markdown source.
Best when you need visual fidelity beyond A2UI and markdown (charts, custom diffs, diagrams). The HTML page uses the same window.webkit.messageHandlers.complete.postMessage({...}) bridge.
Before reaching for Mode 4, ask whether Mode 3 (markdown) covers the case. Markdown renders GFM tables, fenced code, links, and lists — enough for ~80% of "show the user some formatted content" needs, with no HTML to author or debug.
Two separate Bash calls (don't chain — the load command and the webview invocation sequence matters):
Step 1 — write the load command to a temp file:
HTML_B64=$(base64 < custom.html)
printf '%s' "{\"type\":\"load\",\"resources\":{\"index.html\":\"$HTML_B64\"},\"url\":\"agent://host/index.html\"}" > /tmp/wv-load.jsonStep 2 — invoke with --url flag + stdin redirect:
webview-cli --url "agent://host/index.html" --timeout 120 < /tmp/wv-load.jsonThe --url CLI flag is required — without it the binary exits with usage. The stdin load command populates the in-memory resource map that agent://host/... reads from.
Common failure — NSURLErrorDomain error -1100: the resource key in the load command doesn't match the path in --url. If you use --url "agent://host/index.html", the resources map must contain the key "index.html" (exact match, no leading slash). When this persistently fails, fall back to Mode 3 — pipe the rendered content as markdown instead of hand-rolling HTML.
Pick Mode 4 over Mode 3 only when you need charts, syntax-highlighted diffs, images, or custom interactions. If you're writing >200 lines of HTML, consider splitting the work into a proper tool instead.
All 12 components in the built-in renderer:
| Component | Purpose | Key props |
|---|---|---|
Column |
Vertical stack | children.explicitList (array of IDs) |
Row |
Horizontal arrangement | children.explicitList, alignment ("center" | "end" | "space-between") |
Card |
Visual grouping with padding | child (single ID) or children.explicitList |
Text |
Typography (plain text only — no markdown parsing) | text.literalString, usageHint ("h1" | "h2" | "h3" | "subtitle" | "body" | "caption") |
TextInput |
Single-line or multiline input | label.literalString, placeholder.literalString, fieldName, multiline (bool) |
Select |
Dropdown | label.literalString, fieldName, options (array of {value, label} or strings) |
Checkbox |
Single checkbox | label.literalString, fieldName, checked (bool, default false) |
RadioGroup |
Mutually-exclusive set | label.literalString, fieldName, options (array of {value, label} or strings) |
Image |
Inline image | url (literal or data ref), optional alt, width, height |
Button |
Action button | label.literalString, variant ("primary" | "secondary" | "danger" | "success"), action.name, optional action.context |
Divider |
Horizontal rule | — |
MarkdownDoc |
Rendered markdown inside an A2UI form (CommonMark + GFM) | fieldName (required), text (required markdown source), allowComments (bool), allowEdits (bool), allowHtml (bool), title (string) |
Form data collection: every component with fieldName is collected when a Button is clicked. The response data contains {fieldName: value, ...} plus the button's action.name (and action.context if set).
MarkdownDoc vs --markdown mode: use MarkdownDoc when the markdown review is one part of a larger form (e.g. spec review + "approve/reject" radio + notes field). Use --markdown mode (Mode 3) when the window is only about the markdown doc — simpler invocation, no JSONL to generate.
Canonical stdout shape on successful completion (exit 0):
{
"status": "completed",
"data": {
"action": "<button's action.name>",
"data": {"<fieldName>": "<value>", ...},
"context": { /* from button's action.context, if set */ }
}
}Other statuses:
| Exit | Stdout | Meaning |
|---|---|---|
| 0 | {"status":"completed","data":{...}} |
User clicked a button |
| 1 | {"status":"cancelled"} |
User closed window or pressed Escape |
| 2 | {"status":"timeout"} |
No response within --timeout seconds |
| 3 | {"status":"error","message":"..."} |
Load failure, invalid URL, JS error |
The context field is only present when a Button's definition included action.context. Use it to pass metadata that doesn't come from form fields (e.g. the ID of the item being approved).
Every A2UI flow follows this structure:
{"surfaceUpdate": {"components": [{"id": "root", "component": {"Column": {"children": {"explicitList": ["card"]}}}}]}}
{"surfaceUpdate": {"components": [{"id": "card", "component": {"Card": {"child": "content"}}}]}}
{"surfaceUpdate": {"components": [{"id": "content", "component": {"Column": {"children": {"explicitList": ["title", "<FIELDS>", "buttons"]}}}}]}}
{"surfaceUpdate": {"components": [{"id": "title", "component": {"Text": {"usageHint": "h2", "text": {"literalString": "<TITLE>"}}}}]}}
<!-- field components with fieldName here -->
{"surfaceUpdate": {"components": [{"id": "buttons", "component": {"Row": {"alignment": "end", "children": {"explicitList": ["btn_cancel", "btn_submit"]}}}}]}}
{"surfaceUpdate": {"components": [{"id": "btn_cancel", "component": {"Button": {"label": {"literalString": "Cancel"}, "variant": "secondary", "action": {"name": "cancel"}}}}]}}
{"surfaceUpdate": {"components": [{"id": "btn_submit", "component": {"Button": {"label": {"literalString": "Submit"}, "variant": "primary", "action": {"name": "submit"}}}}]}}
{"beginRendering": {"root": "root"}}Rules:
- Every line is a self-contained JSON object (JSONL format — one per line, no commas between)
- Component IDs must be unique within the surface
rootis the conventional root component IDbeginRenderingmust be the last line- Referenced IDs must all be declared before
beginRendering(order within doesn't matter)
The JSONL is fed to webview-cli via stdin. Each line is parsed as JSON, so:
- Backslashes in content: escape as
\\ - Quotes in content: escape as
\"(e.g."It\"s done") - Newlines in content: use
\ninside the string; do not put literal newlines inside a JSON object - Every
{...}must be one physical line — even long strings
For anything beyond trivial substitution, write the JSONL in a Python generator (/tmp/<name>-gen.py) and run it to produce /tmp/<name>.jsonl. Heredoc-escaping breaks on the first user-supplied value with quotes.
- Run preflight (above). If it fails, fall back to terminal Q&A with a one-line note.
- Pick the mode:
- A2UI (forms/approvals/selection) — most common
--markdown— reviewing a spec / PR description / draft; simpler than A2UI when the window is only about the doc- URL — existing page (OAuth, external service)
- HTML via
agent://— last resort for rich visuals that neither A2UI nor markdown can render
- Window size: small forms 520×460, mid-size 620×580, full content 720×700. Tight height prevents dead space at the bottom.
- Pick a
--timeout:- 120s — simple approval (user is present)
- 300s — multi-field form (user composes input)
- 600s — content review or complex decisions
- Never >540s — Bash tool caps total time at 600s.
- Set the Bash tool timeout deterministically:
bash_timeout_ms = (webview_timeout_sec + 30) × 1000. The +30s buffer covers WKWebView startup, process teardown, and OS scheduling variance. - Generate the JSONL:
- ≤5 static components + no user data: heredoc or inline string is fine
- Anything dynamic (user names, lists, long text, content with quotes): write a Python generator to
/tmp/<name>-gen.py, run it to produce/tmp/<name>.jsonl, then pipe that file
- Invoke (two separate Bash calls when using a generator):
# Call A: run generator python3 /tmp/<name>-gen.py > /tmp/<name>.jsonl # Call B: invoke webview cat /tmp/<name>.jsonl | webview-cli --a2ui --title "..." --width N --height N --timeout N
- Parse the response from stdout (see Response format reference above):
- Exit 0: extract
.data.action(button name) and.data.data(form fields) - Exit 1 (cancelled) / Exit 2 (timeout): acknowledge in conversation, ask how to proceed. Don't retry silently, especially in auto mode — the user's non-response is itself a signal
- Exit 3: read stderr, diagnose, either fix and retry once or fall back to terminal
- Exit 0: extract
- Sanitize before acting: treat
.data.data.*as untrusted user input. Shell-escape before Bash commands (printf '%q' "$val"), JSON-escape before further JSON, validate format (email, URL) before use. Form fields can contain;, backticks, shell metacharacters. - Act on the result — don't just summarize. The webview answered a question; your job is to execute the consequence.
- Don't use it for yes/no questions that would take 1 sentence in terminal — window open/close ritual is heavier than the answer
- Don't spawn multiple webviews in quick succession — if you need a multi-step flow, put all fields in one form, OR accept that the second webview is a new user decision (don't chain automatically)
- Don't assume the user will submit — always handle cancel/timeout as first-class outcomes
- Don't omit
fieldNameon input components you need data from — without it, the field's value is lost - Don't pass unsanitized form data to Bash — see step 9
- Don't loop the skill without explicit user confirmation — one webview per decision
Copy-paste JSONL templates for approval, single-select, multi-field form, and confirmation with content preview: see references/templates.md.
Complete end-to-end invocations including Python generators and response parsing: see references/examples.md.