MoosicBox

Playbook System ReferencePlaybooks are headless, scriptable bmux sessions. A playbook defines a sequence of actions (create sessions, send keystrokes, assert screen content) that bmux executes against an ephemeral sandbox server and reports pass/fail results as structured JSON.
Primary use cases:
LLM-driven validation: generate playbooks from bug descriptions, run them to reproduce and verify fixes without manual screen recordings.
CI regression tests: deterministic, repeatable terminal interaction tests.
Recording conversion: turn a captured bmux session into a re-runnable test.
Execution model: By default, bmux playbook run spawns an isolated sandbox server in a temp directory, executes all steps, reports results, and tears down the server. Use --target-server to run against a live server instead.
Two input formats parse into the same internal representation:
FormatExtensionTypical use
Line-oriented DSL.dsl or stdinQuick authoring, LLM generation, piping
TOML.playbook.tomlStructured config, version control
CLI Commandsbmux playbook runRun a playbook and report results.
bmux playbook run <source> [flags]
Argument/FlagTypeDefaultDescription
<source>stringrequiredPath to playbook file, or - for stdin
--jsonboolfalseOutput results as JSON to stdout
--interactiveboolfalsePause before each step for interactive control
--target-serverboolfalseRun against the live server instead of a sandbox
--recordboolfalseRecord the execution (overrides playbook config)
--export-gif <path>stringnoneExport recording as GIF (implies --record)
--viewport <COLSxROWS>stringnoneOverride viewport dimensions (e.g. 120x40)
--timeout <secs>u64noneOverride max playbook timeout in seconds
--shell <path>stringnoneOverride shell binary
--var KEY=VALUEstringnoneDefine a variable (repeatable, overrides @var)
--verbose / -vboolfalsePrint step-by-step progress to stderr
Note: global recording auto-export settings (recording.auto_export or --recording-auto-export) do not auto-export playbook recordings. Use --export-gif <path> for playbook runs.
Exit codes: 0 = all steps passed, 1 = one or more steps failed or error.
Stdin example:
echo 'new-session\nsend-keys keys="echo hi\\r"\nwait-for pattern="hi"' | bmux playbook run - --json
Interactive live tour:
Use --interactive from a real terminal (TTY) to enter a full-screen live tour that continuously renders pane output while the playbook runs.
The tour starts paused so you can immediately choose step-by-step (n) or switch to live mode (c / l).
space: pause/resume live playback
n: single-step one playbook step (when paused)
c / l: return to live running mode
:<dsl>: run an ad-hoc DSL action at step boundaries
q: abort run (remaining scheduled steps are marked skipped)
?: show control help in the status line
If stdin/stdout are not TTYs (for example piped input in CI), --interactive automatically falls back to the line-prompt controls.
bmux playbook validateParse and validate a playbook without executing it.
bmux playbook validate <source> [--json]
Returns validation errors (missing new-session as first step, unknown actions, etc.).
bmux playbook dry-runParse, validate, and print the execution plan without running.
bmux playbook dry-run <source> [--json]
Argument/FlagTypeDefaultDescription
<source>stringrequiredPath to playbook file, or - for stdin
--jsonboolfalseOutput as structured JSON
Exit codes: 0 = playbook is valid, 1 = validation errors found.
JSON output:
{
  "valid": true,
  "config": {
    "name": "my-test",
    "viewport": "80x24",
    "shell": "sh",
    "timeout_ms": 30000,
    "env_mode": "default",
    "record": false
  },
  "steps": [
    { "index": 0, "action": "new-session", "dsl": "new-session" },
    { "index": 1, "action": "send-keys", "dsl": "send-keys keys='echo hi\\r'" },
    { "index": 2, "action": "wait-for", "dsl": "wait-for pattern='hi'" }
  ],
  "step_count": 3,
  "errors": []
}
Each step’s dsl field contains the round-trip DSL serialization of the action, which is valid DSL syntax that can be copy-pasted.
bmux playbook diffCompare results from two playbook runs. Produces a structured diff covering step status changes, screen text differences, timing comparison, and failure capture comparison.
bmux playbook diff <left.json> <right.json> [flags]
Argument/FlagTypeDefaultDescription
<left.json>stringrequiredPath to baseline/left playbook result JSON
<right.json>stringrequiredPath to new/right playbook result JSON
--jsonboolfalseOutput diff as structured JSON
--timing-threshold <pct>u6450Flag steps that slowed by more than this percent
Exit codes: 0 = no changes detected, 1 = changes or regressions found.
JSON output includes:
summary – outcome change, step/snapshot counts, total timing delta
step_diffs – per-step status changes, timing deltas, detail/expected/actual on failures
snapshot_diffs – per-snapshot pane text diffs (unified diff format via Myers algorithm)
failure_capture_diffs – screen state diffs from auto-snapshots on failure
timing_regressions – steps that exceeded the timing threshold
Usage pattern for before/after verification:
# Run before fix
bmux playbook run --json test.dsl > before.json
# Apply fix...
bmux playbook run --json test.dsl > after.json
# Compare
bmux playbook diff --json before.json after.json
bmux playbook cleanupClean up sandbox temp directories from previous playbook runs. Useful after SIGKILL or crashes that prevent normal cleanup.
This command now uses the shared sandbox cleanup engine with source=playbook under the hood, so behavior stays aligned with bmux sandbox cleanup.
bmux playbook cleanup [--dry-run] [--json]
FlagTypeDefaultDescription
--dry-runboolfalseList orphaned dirs without deleting
--jsonboolfalseOutput as JSON
For advanced filters (for example --older-than or --failed-only), use:
bmux sandbox cleanup --source playbook [flags]
bmux playbook interactiveStart an interactive playbook session with a socket for agent control.
bmux playbook interactive [flags]
FlagTypeDefaultDescription
--socket <path>stringautoSocket path override
--recordboolfalseRecord the session
--viewport <COLSxROWS>string80x24Viewport dimensions
--shell <path>stringsystem defaultShell binary
--timeout <secs>u64no limitMax session lifetime
See Interactive Mode Protocol for the wire format.
bmux playbook from-recordingGenerate a playbook from an existing recording.
bmux playbook from-recording <recording-id-or-name> [--output <path>]
If --output is omitted, writes to stdout. The generated playbook includes wait-for barriers and assert-screen checks derived from the recorded output. See Recording to Playbook Conversion.
DSL FormatEach line is one of:
Line typePrefixExample
Blank / whitespace(empty)Ignored
Comment## this is a comment
Config directive@@viewport cols=80 rows=24
Actionaction namesend-keys keys='echo hi\r'
Argument FormatActions and directives use key=value pairs separated by whitespace:
action-name key1=value1 key2='value with spaces' key3="also quoted"
Quoting rules:
FormExampleNotes
Barekey=valueTerminated by next whitespace
Single-quotedkey='hello world'Supports C-style escapes
Double-quotedkey="hello world"Supports C-style escapes
C-style escape sequences (inside quoted values and send-keys keys=):
EscapeByteName
\r0x0DCarriage return
\n0x0ALine feed
\t0x09Tab
\00x00Null
\a0x07Bell
\b0x08Backspace
\e0x1BEscape (ESC)
\\0x5CLiteral backslash
\'0x27Literal single quote
\"0x22Literal double quote
\xNN0xNNArbitrary hex byte
Config DirectivesDirectives set playbook-wide configuration. They must appear before any action lines (or be interspersed; order relative to actions does not matter since directives are processed in a first pass).
DirectiveSyntaxDefaultDescription
@viewport@viewport cols=<u16> rows=<u16>80x24Terminal viewport dimensions
@driver@driver sandbox|attach-simsandboxExecution backend; attach-sim runs deterministic attach UI simulation without a server/PTY
@shell@shell <path>system defaultShell binary for the sandbox
@timeout@timeout <ms>30000Max playbook execution time in milliseconds
@record@record true|falsefalseEnable recording of the execution
@render-trace@render-trace true|falsefalseEnable per-step normalized render summaries
@name@name <string>nonePlaybook name (included in JSON output)
@description@description <string>nonePlaybook description
@plugin@plugin enable=<id> or @plugin disable=<id>all enabledEnable/disable specific plugins
@var@var NAME=VALUEnoneDefine a static variable for ${NAME} substitution
@env@env NAME=VALUEnoneSet an environment variable in the sandbox process
@env-mode@env-mode inherit|cleaninheritSandbox environment isolation mode
@include@include <path>noneInclude another playbook file (recursive, max depth 10)
Environment ModesModeBehavior
inheritSandbox inherits the full parent environment, then overlays deterministic defaults for TERM (xterm-256color), LANG (C.UTF-8), LC_ALL (C.UTF-8), and HOME (sandbox temp dir). @env entries are applied on top.
cleanSandbox starts with an empty environment. Only PATH, USER, and SHELL are inherited from the parent. All other variables use deterministic defaults or explicit @env entries.
Resolution chain: @env-mode in playbook (if set) > BMUX_PLAYBOOK_ENV_MODE environment variable (if set) > inherit.
Actions ReferenceDeterministic Attach SimulationUse @driver attach-sim for lightweight attach UI tests that do not start a server or PTY. The driver feeds normalized terminal events into the same attach UI reducer used by production, applies effects to fake state, and renders with the real status-line renderer.
@driver attach-sim
@viewport cols=100 rows=24

seed-window-list names='one,two,three' active='one'
render
assert-rendered contains='1:one'

locate id='one' text='1:one'
locate id='three' text='3:three'

terminal-event kind=mouse phase=down button=left col='${one.center_col}' row='${one.row}'
terminal-event kind=mouse phase=move button=left col='${three.end_col}' row='${three.row}'
terminal-event kind=mouse phase=up button=left col='${three.end_col}' row='${three.row}'

assert-effect operation='move-window'
assert-state path='windows.names' equals='["two","three","one"]'
Supported attach-sim actions:
ActionPurpose
seed-window-listSeed fake windows: names='one,two' active='one'
seed-pane-textSeed fake focused-pane text for scrollback/selection scenarios: lines='one|two' cursor_row=2 cursor_col=1
seed-pane-layoutSeed fake pane layout for mouse/layout scenarios, currently split='vertical' or split='floating'
set-configSet supported sim config, currently status_bar.tab_order=mru|stable and appearance.status_position=top|bottom
renderRe-render fake attach status UI
snapshotCapture the current attach-sim render in the playbook result snapshots
locateLocate rendered text and define ${id.start_col}, ${id.center_col}, ${id.end_col}, ${id.row}
terminal-eventSend normalized terminal input; currently mouse events are supported
send-attachSend an attach key chord through the attach keybinding processor in simulation
assert-renderedAssert rendered output contains or matches text
assert-effectAssert an effect such as move-window, resize-pane, focus-pane, or move-floating-pane was emitted
assert-no-effectAssert an effect was not emitted
assert-stateAssert fake state; currently supports windows.names, windows.active_name, scrollback.active, scrollback.cursor, selection.active, selection.text, help_overlay.open, help_overlay.scroll, and prompt.active
This driver is intentionally generic around terminal events, rendering, effects, and state assertions. Feature fixtures are allowed, but the input and assertion primitives should remain reusable for future attach UI behavior.
When adding attach-sim coverage for another UI feature, prefer this pattern:
Put production behavior behind a reducer/effect path that accepts normalized terminal input and explicit geometry/config.
Extend the simulation harness fake state only enough to execute those effects.
Add generic actions or assertions only when the new behavior needs reusable terminal/render/effect/state vocabulary.
Keep feature-specific setup in narrowly named seed/config actions or fixtures, not in duplicated test-only UI logic.
Session Lifecyclenew-sessionCreate a new session. Must be the first action in a sandbox playbook.
new-session [name=<string>]
ArgTypeRequiredDefaultDescription
namestringnoautoSession name
Sets ${SESSION_ID}, ${SESSION_NAME}, ${PANE_COUNT} (=1), ${FOCUSED_PANE} (=1).
kill-sessionKill a session by name.
kill-session name=<string>
ArgTypeRequiredDefaultDescription
namestringyes-Session name
Pane Managementsplit-paneSplit the current pane.
split-pane [direction=vertical|horizontal|v|h] [ratio=<f64>]
ArgTypeRequiredDefaultDescription
directionstringnoverticalSplit direction. v/vertical or h/horizontal
ratiof64nonone (server default)Split ratio (0.0-1.0)
Increments ${PANE_COUNT}.
focus-paneChange the focused pane.
focus-pane target=<u32>
ArgTypeRequiredDefaultDescription
targetu32yes-Pane index to focus (1-based)
Updates ${FOCUSED_PANE}.
close-paneClose a pane.
close-pane [target=<u32>]
ArgTypeRequiredDefaultDescription
targetu32nofocused panePane index to close (1-based)
Decrements ${PANE_COUNT}.
Inputsend-keysSend input bytes to a pane. This is the primary way to type commands.
send-keys keys=<escaped-string> [pane=<u32>]
ArgTypeRequiredDefaultDescription
keysstringyes-Input bytes with C-style escapes. Use \r for Enter.
paneu32nofocused paneTarget pane index (1-based). Uses PaneDirectInput for race-free delivery.
Examples:
send-keys keys='echo hello\r'
send-keys keys='ls -la\r' pane=2
send-keys keys='\x03'                  # Ctrl+C
send-keys keys='\e[A'                  # Up arrow
send-bytesSend raw bytes specified as a hex string.
send-bytes hex=<hex-string>
ArgTypeRequiredDefaultDescription
hexstringyes-Hex-encoded bytes (e.g. 1b5b41 for ESC [ A)
send-attachSend a key chord through the attach keybinding runtime (same path as interactive attach mode). Use this for UI-mode behaviors like scrollback/copy-mode, keybinding-driven pane focus, and runtime/plugin commands.
send-attach key=<chord>
ArgTypeRequiredDefaultDescription
keystringyes-Key chord string (e.g. ctrl+a [, k, esc)
prefix-keyCompatibility alias that sends Ctrl-A plus one key via send-attach.
prefix-key key=<char>
ArgTypeRequiredDefaultDescription
keycharyes-Single character to send after the prefix
Do not mix attach UI-mode entry with send-keys for follow-up navigation keys. send-keys writes bytes to the pane shell; send-attach runs attach key handling.
# Bad: enters scrollback, then types into shell
prefix-key key='['
send-keys keys='k\r'

# Good: all UI-mode keys use attach path
send-attach key='ctrl+a ['
send-attach key='k'
send-attach key='enter'
Synchronizationwait-forPoll the screen until a regex pattern matches. This is the primary synchronization mechanism – use it after send-keys to wait for output before proceeding.
wait-for pattern=<regex> [pane=<u32>] [timeout=<ms>] [retry=<u32>]
ArgTypeRequiredDefaultDescription
patternregexyes-Regex pattern to match against screen text
paneu32nofocused panePane index (1-based)
timeoutu64no5000Max wait time in milliseconds
retryu32no1Number of attempts (1 = no retry)
Polling behavior: Exponential backoff starting at 10ms, doubling up to 200ms max (10, 20, 40, 80, 160, 200, 200…). Each poll drains output and refreshes the screen.
On timeout: The step fails with an error message that includes the first 200 characters of the current screen text for debugging.
Pattern tips:
Use \\d+ to match any sequence of digits (PIDs, line numbers, etc.)
Use \\$ to match a literal $ (common in shell prompts)
The pattern is tested against the full visible screen text of the target pane.
sleepPause execution for a fixed duration. Prefer wait-for when possible.
sleep ms=<u64>
ArgTypeRequiredDefaultDescription
msu64yes-Duration in milliseconds
wait-for-eventWait for a server-side event.
wait-for-event event=<name> [timeout=<ms>]
ArgTypeRequiredDefaultDescription
eventstringyes-Event name (exact match)
timeoutu64no5000Max wait time in milliseconds
Supported event names:
Event nameTriggered when
server_startedServer finishes startup
server_stoppingServer begins shutdown
session_createdA new session is created
session_removedA session is destroyed
client_attachedA client attaches to a session
client_detachedA client detaches
attach_view_changedThe attached view layout changes
Assertionsassert-screenAssert conditions on the visible screen text. At least one of contains, not_contains, or matches is required.
assert-screen [pane=<u32>] [contains=<string>] [not_contains=<string>] [matches=<regex>]
ArgTypeRequiredDefaultDescription
paneu32nofocused panePane index (1-based)
containsstringno-Substring that must be present
not_containsstringno-Substring that must NOT be present
matchesregexno-Regex pattern that must match
Checks are evaluated in order: contains first, then not_contains, then matches. All specified checks must pass.
On failure: The error detail includes the full screen text of the target pane, allowing the caller to see what was actually on screen.
Examples:
assert-screen contains='hello world'
assert-screen not_contains='error' pane=1
assert-screen matches='total \\d+ files'
assert-screen contains='success' not_contains='failure'
assert-layoutAssert the number of panes.
assert-layout pane_count=<u32>
ArgTypeRequiredDefaultDescription
pane_countu32yes-Expected number of panes
assert-cursorAssert the cursor position in a pane.
assert-cursor [pane=<u32>] row=<u16> col=<u16>
ArgTypeRequiredDefaultDescription
paneu32nofocused panePane index (1-based)
rowu16yes-Expected cursor row (0-based)
colu16yes-Expected cursor column (0-based)
render-mark / assert-renderWhen @render-trace true is enabled, playbooks attach a normalized render summary to each step result. Use render-mark to name the current trace position, then assert-render to verify bounded render work since that mark. The step summary is derived from normalized pane/cell deltas and does not store raw ANSI bytes or pane text. Exact trace snapshots use compact semantic ops such as full-frame and pane-row-segment:<pane>:<row>:<start_col>:<cells>; the same compact format also covers actual attach-render trace ops such as status-line, help-overlay, and extension-cached-replay:<surface> for trace-backed summaries.
@render-trace true
render-mark id='baseline'
sleep ms=10
assert-render since='baseline' max_frames=0 max_rows_emitted=0 max_cells_emitted=0 full_frame=false
ArgTypeRequiredDefaultDescription
sincestryes-Existing render-mark ID
min_framesu64no-Minimum observed frames
max_framesu64no-Maximum observed frames
full_frameboolno-Whether any full-frame render is allowed/expected
max_full_frame_framesu64no-Maximum full-frame render count
max_full_surface_fallbacksu64no-Maximum full-surface fallback count
max_damage_rectsu64no-Maximum damage rect count
max_damage_area_cellsu64no-Maximum damaged cell area
max_rows_emittedu64no-Maximum changed/emitted rows
max_row_segments_emittedu64no-Maximum emitted row segment count
max_cells_emittedu64no-Maximum changed/emitted cells
max_frame_bytesu64no-Maximum estimated frame bytes
status_renderedboolno-Whether status rendering was observed
overlay_renderedboolno-Whether overlay rendering was observed
expected_emitted_rowsstrno-Exact normalized pane rows as pane:row,pane:row
expected_emitted_row_segmentsstrno-Exact normalized row segments as pane:row:start_col:cells
expected_trace_opsstrno-Exact semantic trace ops, comma-separated (full-frame, clear-row:row:cells, pane-row-full:pane:row:cells, pane-row-segment:pane:row:start_col:cells, pane-row-cache-skip:pane:row, pane-rows-sync-deferred:pane:rows, extension-ops:surface:regions:full_surface, extension-cached-replay:surface, extension-imperative:surface:regions:full_surface, status-line, help-overlay, prompt-overlay, damage-overlay:rects:cells, cursor:pane:visible, overlay)
InspectionsnapshotCapture the current screen state of all panes. Snapshots are included in the PlaybookResult.snapshots array and in interactive mode responses.
snapshot id=<string>
ArgTypeRequiredDefaultDescription
idstringyes-Label for this snapshot (used to identify it in results)
Each snapshot captures every pane’s visible text, cursor position, focus state, and index.
screenCapture and return the current screen state. In batch mode, the step detail contains JSON-serialized pane captures. In interactive mode, the response panes field is populated.
screen
No arguments. Useful for LLM debugging – inspect screen state without asserting.
statusQuery the current session status. Returns session ID, pane count, and focused pane index in the step detail.
status
No arguments.
Layoutresize-viewportChange the terminal viewport dimensions.
resize-viewport cols=<u16> rows=<u16>
ArgTypeRequiredDefaultDescription
colsu16yes-New column count
rowsu16yes-New row count
Servicesinvoke-serviceInvoke a plugin service.
invoke-service capability=<cap> interface=<id> operation=<op> [kind=query|command] [payload=<json>]
ArgTypeRequiredDefaultDescription
capabilitystringyes-Plugin capability name
interfacestringyes-Service interface ID
operationstringyes-Operation name
kindstringnocommandquery/q or command/cmd
payloadstringno""JSON payload string
Step Modifiers!continue — Continue on ErrorAppend !continue to any action line to prevent the playbook from stopping if that step fails. The step is still recorded as fail in the results, and pass will be false, but execution continues to the next step.
assert-screen contains='optional_check' !continue
assert-screen contains='required_check'
In TOML format, use continue_on_error = true on the step:
[[step]]
action = "assert-screen"
contains = "optional_check"
continue_on_error = true
This is useful for diagnostic playbooks that want to check multiple conditions and report all failures, not just the first one.
Variable SubstitutionPlaybook values support ${NAME} variable references. Variables are resolved at execution time, not parse time.
Variable SourcesVariables are resolved in this order (first match wins):
Runtime variables – dynamic values set during execution
Static variables – defined via @var directives
Environment variables – from the process environment
Unresolved – if no match, ${NAME} is left as-is (with a warning logged)
Literal ${ EscapingUse $${...} to produce a literal ${...} without variable expansion:
send-keys keys='echo $${HOME}\r'   # sends literal ${HOME} to the terminal
The first $ acts as an escape character. After resolution, $${HOME} becomes the literal string ${HOME}.
Runtime VariablesVariableTypeSet byDescription
${SESSION_ID}UUID stringnew-sessionCurrent session UUID
${SESSION_NAME}stringnew-sessionCurrent session name
${PANE_COUNT}integer stringnew-session, split-pane, close-paneNumber of panes
${FOCUSED_PANE}integer stringnew-session, focus-paneFocused pane index
Static VariablesDefined with @var:
@var BASE_DIR=/tmp/test
@var MARKER=test_marker_42

send-keys keys='cd ${BASE_DIR}\r'
wait-for pattern='${MARKER}'
Static variables take priority over environment variables with the same name.
TOML FormatTOML playbooks use [playbook] for config and [[step]] for actions.
[playbook] SectionFieldTypeDefaultDescription
namestringnonePlaybook name
descriptionstringnoneDescription
viewport.colsu1680Viewport columns
viewport.rowsu1624Viewport rows
shellstringsystem defaultShell binary
timeout_msu6430000Max execution time in ms
recordboolfalseEnable recording
plugins.enablestring[][]Plugin IDs to enable
plugins.disablestring[][]Plugin IDs to disable
varstable{}Static variables (NAME = "VALUE")
envtable{}Environment variables
env_modestringnone"inherit" or "clean"
includestring[][]Paths to include
[[step]] EntriesEach step requires an action field. Other fields are action-specific:
[[step]]
action = "new-session"
name = "my-session"

[[step]]
action = "send-keys"
keys = "echo hello\r"
pane = 1

[[step]]
action = "wait-for"
pattern = "hello"
timeout = 5000

[[step]]
action = "wait-for"
pattern = "flaky_output"
retry = 3

[[step]]
action = "assert-screen"
contains = "hello"

[[step]]
action = "assert-screen"
contains = "optional"
continue_on_error = true
TOML ExampleEquivalent to the DSL example in Example 1:
[playbook]
name = "echo-test"
viewport = { cols = 80, rows = 24 }
shell = "sh"

[[step]]
action = "new-session"

[[step]]
action = "send-keys"
keys = "echo hello_world\r"

[[step]]
action = "wait-for"
pattern = "hello_world"

[[step]]
action = "assert-screen"
contains = "hello_world"
Sandbox EnvironmentHow It Worksbmux playbook run (without --target-server) creates an ephemeral sandbox:
Creates a temp directory (/tmp/bpb-<hex>) with isolated config, runtime, data, and state subdirectories.
Writes a minimal bmux.toml config with shell and plugin overrides.
Spawns a bmux server start process pointing at the temp directories.
Waits for the server to accept connections (up to 15 seconds).
Executes all playbook steps against the sandbox.
Stops the server and cleans up the temp directory.
Plugin ConfigurationBy default, all bundled plugins are available. Use @plugin to control this:
@plugin disable=bmux.windows        # disable a specific plugin
@plugin enable=bmux.permissions     # only enable specific plugins
When any enable is specified, all other plugins are implicitly disabled.
Assertions and SynchronizationBest Practices for Deterministic AssertionsAlways use wait-for before assert-screen. Output arrives asynchronously – without a sync barrier, assertions may check stale screen content.
Match on distinctive output, not prompts. Shell prompts vary across machines and shells. Match on your command’s output instead:
send-keys keys='echo UNIQUE_MARKER_123\r'
wait-for pattern='UNIQUE_MARKER_123'
Use \d+ for non-deterministic numbers. PIDs, line counts, timestamps:
wait-for pattern='process started, pid=\d+'
Use @env-mode clean for maximum determinism. This prevents the sandbox from inheriting unpredictable environment variables.
Use @shell sh for portable playbooks. sh behavior is more predictable across systems than bash/zsh.
Prefer contains over matches when possible. Substring matching is simpler and less fragile than regex.
Interactive Mode ProtocolInteractive mode provides a socket-based REPL for LLM agents to control bmux dynamically.
Startupbmux playbook interactive --viewport 80x24
On startup, bmux prints a JSON ready message to stdout:
{
  "status": "ready",
  "socket": "/tmp/bpb-xxx/r/playbook.sock",
  "sandbox_root": "/tmp/bpb-xxx"
}
The LLM agent connects to the socket path and communicates via line-delimited JSON.
Wire ProtocolInteractive mode is JSON-op only in v2: one JSON object per line (\n-delimited).
JSON op examples:
{"op":"hello","protocol_version":1,"client":"llm-agent"}
{"op":"command","request_id":"r1","dsl":"new-session"}
{"op":"subscribe","event_types":["pane_output","cursor_delta","screen_delta"],"screen_delta_format":"line_ops"}
Response: one JSON object per \n.
Response Schema{
  "type": "response" | "event" | "error",
  "seq": 1,
  "mono_ns": 1000000,
  "request_id": "optional-correlation-id",
  "status": "ok" | "fail" | "error",
  "action": "send-keys",
  "elapsed_ms": 12,
  "detail": "optional detail string",
  "error": "error message on failure",
  "snapshot": { "id": "...", "panes": [...] },
  "panes": [{ "index": 1, "focused": true, "screen_text": "...", "cursor_row": 0, "cursor_col": 5 }],
  "session_id": "uuid-string",
  "pane_count": 2,
  "focused_pane": 1
}
All fields except status are optional and omitted when not applicable.
FieldPresent whenType
statusalways"ok", "fail", or "error"
actionaction executedstring
elapsed_msaction executedu64
detailaction has detail outputstring
errorstatus is "fail" or "error"string
snapshotsnapshot action executedobject
panesscreen command executedarray of PaneCapture
session_idstatus command executedUUID string
pane_countstatus command executedu32
focused_panestatus command executedu32
typealwaysmessage class (response, event, error)
seqalwaysmonotonic message sequence number
mono_nsalwaysmonotonic nanoseconds since interactive session start
request_idJSON command/op requestscorrelation id echoed in response
Special CommandsOpDescription
helloOptional capability handshake.
commandExecute one DSL action line via dsl field (for example new-session, send-keys, assert-screen).
statusReturn session metadata (session_id, pane_count, focused_pane).
hydrateHydrate detailed data (screen_full, event_window, incident).
subscribeStart live event streaming with filters and budgets.
unsubscribeStop live event streaming.
set_watchpointRegister anomaly watchpoint (kind: "event_burst").
clear_watchpointRemove a watchpoint by id.
quitEnd the interactive session.
Push Output EventsAfter sending subscribe, the server pushes events as they arrive.
Pane output event:
{
  "type": "event",
  "status": "ok",
  "event_type": "pane_output",
  "pane_index": 1,
  "output_data": "hello world\n"
}
Cursor delta event:
{
  "type": "event",
  "status": "ok",
  "event_type": "cursor_delta",
  "cursor_delta": {
    "pane_index": 1,
    "from": { "row": 10, "col": 1 },
    "to": { "row": 10, "col": 12 },
    "distance": 11
  }
}
Screen delta event (LLM-friendly line ops):
{
  "type": "event",
  "status": "ok",
  "event_type": "screen_delta",
  "screen_delta": {
    "pane_index": 1,
    "format": "line_ops",
    "base_hash": "9f1b2c3d4e5f6a70",
    "new_hash": "4f8e1d3ab2c04910",
    "ops": [
      { "op": "set_line", "row": 12, "text": "fn main() {" },
      { "op": "cursor", "row": 12, "col": 11 }
    ]
  }
}
Screen delta event (human-readable unified diff):
{
  "type": "event",
  "status": "ok",
  "event_type": "screen_delta",
  "screen_delta": {
    "pane_index": 1,
    "format": "unified_diff",
    "base_hash": "9f1b2c3d4e5f6a70",
    "new_hash": "4f8e1d3ab2c04910",
    "diff": "@@ -13,1 +13,1 @@\n-fn mian() {\n+fn main() {\n"
  }
}
Push events have event_type set (e.g. "output"), which distinguishes them from command responses. They may arrive between commands or interleaved with command responses.
FieldTypeDescription
event_typestringPush event type (pane_output, pane_input, cursor_delta, screen_delta, server_event, request_lifecycle, watchpoint_hit)
pane_indexu32The pane that produced the output
output_datastringThe new output text (UTF-8, may contain escape sequences)
Watchpoint hit event:
{
  "type": "event",
  "status": "ok",
  "event_type": "watchpoint_hit",
  "watchpoint_hit": {
    "id": "cursor-delta-burst-1",
    "kind": "event_burst",
    "watch_event_type": "cursor_delta",
    "pane_index": 1,
    "summary": "event burst detected: event_type=cursor_delta hits=3 min_hits=3 pane=1",
    "window_ms": 500,
    "min_hits": 3,
    "observed_hits": 3,
    "peak_distance": 12,
    "evidence_seq_start": 42,
    "evidence_seq_end": 42
  }
}
subscribe JSON options:
event_types: array of event names (pane_output, cursor_delta, screen_delta, watchpoint_hit).
pane_indexes: optional pane-index filter.
screen_delta_format: line_ops, unified_diff, or auto.auto resolves to line_ops for machine-readable clients (e.g. client: "llm-agent") and unified_diff otherwise.
max_events_per_sec: optional streaming event budget.
max_bytes_per_sec: optional streaming byte budget.
coalesce_ms: optional per-event-type coalescing interval.
set_watchpoint JSON options:
id: required watchpoint id.
kind: event_burst.
event_type: required watched stream event (pane_output, pane_input, cursor_delta, screen_delta, server_event, request_lifecycle).
pane_index: optional pane scope (defaults to any pane).
window_ms: burst window in milliseconds (default 500).
min_hits: required hit count inside window_ms (default 3).
contains_regex: optional regex predicate (v1: supported for event_type: "pane_output" only).
Example (only trigger on pane output that matches):
{"op":"set_watchpoint","id":"errors-only","kind":"event_burst","event_type":"pane_output","contains_regex":"(?i)error|panic","min_hits":1,"window_ms":1000}
watchpoint_hit cannot be watched in v1 (recursive watchpoint loops are blocked).
hydrate JSON options:
kind: "screen_full" for full pane snapshot.
kind: "event_window" with start_seq and end_seq.
kind: "incident" with id (watchpoint id) or around_seq, plus optional window_radius.
Use unsubscribe to stop receiving push events.
Example Session→ new-session
← {"status":"ok","action":"new-session","elapsed_ms":150,"detail":"session_id=a1b2c3..."}

→ send-keys keys='echo hello\r'
← {"status":"ok","action":"send-keys","elapsed_ms":5}

→ screen
← {"status":"ok","action":"screen","panes":[{"index":1,"focused":true,"screen_text":"$ echo hello\nhello\n$ ","cursor_row":2,"cursor_col":2}]}

→ assert-screen contains='hello'
← {"status":"ok","action":"assert-screen","elapsed_ms":10}

→ quit
← {"status":"ok","action":"quit"}
Recording to Playbook Conversionbmux playbook from-recording converts a recorded bmux session into a runnable playbook.
What Gets GeneratedElementSourceHow
new-sessionNewSession request in recordingDirect mapping
split-paneSplitPane requestDirect mapping with direction
focus-paneFocusPane requestDirect mapping with target index
send-keysAttachInput / PaneDirectInput eventsConsecutive inputs within 100ms are coalesced. pane=N added when input targets a non-focused pane.
wait-forPaneOutputRaw events after a commandLast non-empty line of structured-grid-parsed output becomes the barrier pattern. Digit sequences are collapsed to \d+.
assert-screenPaneOutputRaw eventsUp to 3 distinctive content lines per response window become contains= checks.
sleepGaps > 200ms with no input/outputMapped to sleep ms=N
@viewportFirst AttachSetViewport requestEmitted as a directive
Pattern RobustnessGenerated patterns are made robust to non-deterministic content:
Digit sequences (12345) are replaced with \d+
Regex metacharacters (., *, +, $, etc.) are escaped
Structural text (command names, paths, error messages) is preserved as literal matches
LimitationsMulti-client recordings produce playbooks from a single client’s perspective.
Very long outputs (>256KB ring buffer) may have incomplete screen reconstruction.
Some manual editing may be needed for complex workflows (e.g., interactive programs, timing-sensitive sequences).
JSON Output SchemaWhen using --json, bmux playbook run outputs a PlaybookResult:
PlaybookResult{
  "playbook_name": "my-test",
  "pass": true,
  "steps": [ ... ],
  "snapshots": [ ... ],
  "recording_id": "uuid-string",
  "recording_path": "/path/to/recording",
  "total_elapsed_ms": 1234,
  "error": "top-level error message"
}
FieldTypeAlways presentDescription
playbook_namestringnullyes
passboolyestrue if all steps passed
stepsStepResult[]yesPer-step results
snapshotsSnapshotCapture[]yesCaptured snapshots (may be empty)
recording_idstringnullno
recording_pathstringnullno
total_elapsed_msu64yesWall-clock execution time
errorstringnullno
sandbox_rootstringnullno
StepResult{
  "index": 0,
  "action": "send-keys",
  "status": "pass",
  "elapsed_ms": 5,
  "detail": "optional detail"
}
On failure, additional structured fields are included:
{
  "index": 3,
  "action": "assert-screen",
  "status": "fail",
  "elapsed_ms": 12,
  "detail": "assert-screen: pane 1 does not contain 'expected_output'",
  "expected": "expected_output",
  "actual": "$ echo something_else\nsomething_else\n$ ",
  "failure_captures": [
    {
      "index": 1,
      "focused": true,
      "screen_text": "$ echo something_else\nsomething_else\n$ ",
      "cursor_row": 2,
      "cursor_col": 2
    }
  ]
}
FieldTypeDescription
indexu64Step index (0-based)
actionstringAction name
statusstring"pass", "fail", or "skip"
elapsed_msu64Step execution time
detailstringnull
expectedstringnull
actualstringnull
failure_capturesPaneCapture[]null
The expected and actual fields allow machine consumers (LLMs) to compare expected vs actual values without parsing the detail string. The failure_captures array provides the full screen state of every pane at the moment of failure, regardless of which pane was being asserted on.
SnapshotCapture{
  "id": "after_echo",
  "panes": [ ... ]
}
PaneCapture{
  "index": 1,
  "focused": true,
  "screen_text": "$ echo hello\nhello\n$ ",
  "cursor_row": 2,
  "cursor_col": 2
}
FieldTypeDescription
indexu32Pane index (1-based)
focusedboolWhether this pane has focus
screen_textstringVisible text, trailing whitespace trimmed per line
cursor_rowu16Cursor row (0-based)
cursor_colu16Cursor column (0-based)
ExamplesExample 1: Basic echo + assertThe simplest useful playbook: run a command, wait for output, verify it.
@viewport cols=80 rows=24
@shell sh
new-session
send-keys keys='echo hello_world\r'
wait-for pattern='hello_world'
assert-screen contains='hello_world'
Example 2: Multi-pane workflowSplit the terminal, send different commands to each pane, verify both.
@viewport cols=120 rows=40
@shell sh
new-session
split-pane direction=vertical
send-keys keys='echo left_pane\r' pane=1
sleep ms=500
assert-screen contains='left_pane' pane=1
send-keys keys='echo right_pane\r' pane=2
sleep ms=500
assert-screen contains='right_pane' pane=2
Example 3: Regex wait-for patternsUse regex to match output with non-deterministic content.
@shell sh
new-session
send-keys keys='echo "pid=$$, count=42"\r'
wait-for pattern='pid=\d+, count=\d+'
Example 4: Clean environment for determinismUse @env-mode clean to ensure the sandbox has a predictable environment.
@viewport cols=80 rows=24
@shell sh
@env-mode clean
new-session
send-keys keys='echo $TERM\r'
wait-for pattern='xterm-256color'
assert-screen contains='xterm-256color'
Example 5: Variables and environment overridesUse @var for playbook-scoped constants and @env for process environment.
@shell sh
@var MARKER=unique_test_id_987
@env MY_APP_MODE=testing
new-session
send-keys keys='echo ${MARKER} $MY_APP_MODE\r'
wait-for pattern='${MARKER}'
assert-screen contains='unique_test_id_987 testing'
Example 6: Snapshot inspectionCapture a named snapshot and inspect its content in the JSON output.
@shell sh
new-session
send-keys keys='ls /etc\r'
wait-for pattern='\$'
snapshot id=etc_listing
Run with --json and inspect result.snapshots[0].panes[0].screen_text to see the directory listing.
Example 7: Screen and status for debuggingUse screen and status to inspect state mid-playbook. Useful when developing a playbook to understand what the terminal shows.
@shell sh
new-session
send-keys keys='echo step1\r'
sleep ms=300
screen
status
send-keys keys='echo step2\r'
sleep ms=300
screen
Each screen step’s detail in the JSON output contains the full pane text at that point in execution.
Example 8: Expected failure testingVerify that a specific error condition is detected.
@shell sh
new-session
send-keys keys='echo real_output\r'
wait-for pattern='real_output'
assert-screen contains='nonexistent_string'
This playbook is expected to fail. Run with --json and check result.pass == false and the failing step’s detail field for the actual screen content.
Example 9: Recording conversion workflowRecord a session (manual start/stop):
bmux recording start --name startup-repro
# ... do things in bmux ...
bmux recording stop

# inspect manual recording storage and defaults
bmux recording path
bmux recording status
Or use a rolling capture cut without stopping the hidden rolling recorder:
# ~/.config/bmux/config.toml
[recording]
enabled = true
rolling_window_secs = 300

# rolling capture categories (optional)
rolling_capture_input = true
rolling_capture_output = true
rolling_capture_events = true
rolling_capture_protocol_replies = false
rolling_capture_images = false

# or explicit allowlist (takes precedence over categories when non-empty)
# rolling_event_kinds = ["pane_output_raw", "protocol_reply_raw", "pane_image"]
# default cut window = full rolling window (300s in this example)
bmux recording cut --name startup-snapshot
# optional explicit window
bmux recording cut --last-seconds 90
You can override rolling behavior at server boot:
# force on for this boot (and optionally override window)
bmux server start --rolling-recording --rolling-window-secs 300

# choose exact kinds for this boot
bmux server start --rolling-window-secs 300 --rolling-event-kind-all
bmux server start --rolling-window-secs 300 --rolling-event-kind pane-output-raw --rolling-event-kind protocol-reply-raw

# category overrides for this boot
bmux server start --rolling-capture-input --no-rolling-capture-events --rolling-capture-protocol-replies

# force off for this boot
bmux server start --no-rolling-recording

# kill switch on a running server (and restart with runtime overrides)
bmux server recording stop
bmux server recording start --rolling-window-secs 120 --rolling-event-kind-all

# inspect rolling storage path + status/usage
bmux server recording path
bmux server recording status

# clear rolling data (default: restart if active)
bmux server recording clear
# clear and keep rolling stopped
bmux server recording clear --no-restart
Convert to a playbook:
bmux playbook from-recording <recording-id-or-name> --output repro.dsl
Review and edit the generated playbook. The auto-generated wait-for patterns may need adjustment for your environment.
Run it:
bmux playbook run repro.dsl --json
Example 10: CLI variable overridesPass variables from the command line to override @var defaults:
# The playbook uses ${MARKER} which defaults to "test"
bmux playbook run test.dsl --var MARKER=production_check --json
Example 11: Retry flaky operationsUse retry= on wait-for for operations that may not succeed immediately:
@shell sh
new-session
send-keys keys='./flaky_server.sh &\r'
wait-for pattern='server ready' timeout=3000 retry=3
Example 12: Continue on error for diagnosticsUse !continue to check multiple conditions and report all failures:
@shell sh
new-session
send-keys keys='run_diagnostics\r'
wait-for pattern='\$'
assert-screen contains='check_1_ok' !continue
assert-screen contains='check_2_ok' !continue
assert-screen contains='check_3_ok' !continue
snapshot id=diagnostic_results
Example 13: Literal variable referencesUse $${...} to send literal ${...} to the terminal:
@shell sh
new-session
send-keys keys='echo $${HOME}\r'
wait-for pattern='\$\{HOME\}'
Example 14: LLM-generated playbook patternAn LLM generating a playbook from a bug description should follow this pattern:
# 1. Set up a deterministic environment
@viewport cols=80 rows=24
@shell sh
@env-mode clean

# 2. Create a session
new-session

# 3. For each command:
#    a. send-keys with \r to execute
#    b. wait-for on distinctive output (not the prompt)
#    c. assert-screen to verify expected behavior

send-keys keys='mkdir -p /tmp/test_dir\r'
wait-for pattern='\$'
send-keys keys='ls /tmp/test_dir\r'
wait-for pattern='\$'

# 4. Assert the expected outcome
assert-screen not_contains='No such file'

# 5. Use snapshot for evidence capture
snapshot id=final_state
Key principles:
Always use @env-mode clean and @shell sh for reproducibility.
Always wait-for after send-keys before asserting.
Match on command output, not shell prompts.
Use \d+ in patterns for numbers that may vary.
Capture a snapshot at the end for debugging if the playbook fails.

Format	Extension	Typical use
Line-oriented DSL	.dsl or stdin	Quick authoring, LLM generation, piping
TOML	.playbook.toml	Structured config, version control

Argument/Flag	Type	Default	Description
<source>	string	required	Path to playbook file, or - for stdin
--json	bool	false	Output results as JSON to stdout
--interactive	bool	false	Pause before each step for interactive control
--target-server	bool	false	Run against the live server instead of a sandbox
--record	bool	false	Record the execution (overrides playbook config)
--export-gif <path>	string	none	Export recording as GIF (implies --record)
--viewport <COLSxROWS>	string	none	Override viewport dimensions (e.g. 120x40)
--timeout <secs>	u64	none	Override max playbook timeout in seconds
--shell <path>	string	none	Override shell binary
--var KEY=VALUE	string	none	Define a variable (repeatable, overrides @var)
--verbose / -v	bool	false	Print step-by-step progress to stderr

Flag	Type	Default	Description
--dry-run	bool	false	List orphaned dirs without deleting
--json	bool	false	Output as JSON

Line type	Prefix	Example
Blank / whitespace	(empty)	Ignored
Comment	#	# this is a comment
Config directive	@	@viewport cols=80 rows=24
Action	action name	send-keys keys='echo hi\r'

Form	Example	Notes
Bare	key=value	Terminated by next whitespace
Single-quoted	key='hello world'	Supports C-style escapes
Double-quoted	key="hello world"	Supports C-style escapes

Escape	Byte	Name
\r	0x0D	Carriage return
\n	0x0A	Line feed
\t	0x09	Tab
\0	0x00	Null
\a	0x07	Bell
\b	0x08	Backspace
\e	0x1B	Escape (ESC)
\\	0x5C	Literal backslash
\'	0x27	Literal single quote
\"	0x22	Literal double quote
\xNN	0xNN	Arbitrary hex byte

Directive	Syntax	Default	Description
@viewport	@viewport cols=<u16> rows=<u16>	80x24	Terminal viewport dimensions
@driver	@driver sandbox\|attach-sim	sandbox	Execution backend; attach-sim runs deterministic attach UI simulation without a server/PTY
@shell	@shell <path>	system default	Shell binary for the sandbox
@timeout	@timeout <ms>	30000	Max playbook execution time in milliseconds
@record	@record true\|false	false	Enable recording of the execution
@render-trace	@render-trace true\|false	false	Enable per-step normalized render summaries
@name	@name <string>	none	Playbook name (included in JSON output)
@description	@description <string>	none	Playbook description
@plugin	@plugin enable=<id> or @plugin disable=<id>	all enabled	Enable/disable specific plugins
@var	@var NAME=VALUE	none	Define a static variable for ${NAME} substitution
@env	@env NAME=VALUE	none	Set an environment variable in the sandbox process
@env-mode	@env-mode inherit\|clean	inherit	Sandbox environment isolation mode
@include	@include <path>	none	Include another playbook file (recursive, max depth 10)

Mode	Behavior
inherit	Sandbox inherits the full parent environment, then overlays deterministic defaults for TERM (xterm-256color), LANG (C.UTF-8), LC_ALL (C.UTF-8), and HOME (sandbox temp dir). @env entries are applied on top.
clean	Sandbox starts with an empty environment. Only PATH, USER, and SHELL are inherited from the parent. All other variables use deterministic defaults or explicit @env entries.

Action	Purpose
seed-window-list	Seed fake windows: names='one,two' active='one'
seed-pane-text	Seed fake focused-pane text for scrollback/selection scenarios: lines='one\|two' cursor_row=2 cursor_col=1
seed-pane-layout	Seed fake pane layout for mouse/layout scenarios, currently split='vertical' or split='floating'
set-config	Set supported sim config, currently status_bar.tab_order=mru\|stable and appearance.status_position=top\|bottom
render	Re-render fake attach status UI
snapshot	Capture the current attach-sim render in the playbook result snapshots
locate	Locate rendered text and define ${id.start_col}, ${id.center_col}, ${id.end_col}, ${id.row}
terminal-event	Send normalized terminal input; currently mouse events are supported
send-attach	Send an attach key chord through the attach keybinding processor in simulation
assert-rendered	Assert rendered output contains or matches text
assert-effect	Assert an effect such as move-window, resize-pane, focus-pane, or move-floating-pane was emitted
assert-no-effect	Assert an effect was not emitted
assert-state	Assert fake state; currently supports windows.names, windows.active_name, scrollback.active, scrollback.cursor, selection.active, selection.text, help_overlay.open, help_overlay.scroll, and prompt.active

Event name	Triggered when
server_started	Server finishes startup
server_stopping	Server begins shutdown
session_created	A new session is created
session_removed	A session is destroyed
client_attached	A client attaches to a session
client_detached	A client detaches
attach_view_changed	The attached view layout changes

Variable	Type	Set by	Description
${SESSION_ID}	UUID string	new-session	Current session UUID
${SESSION_NAME}	string	new-session	Current session name
${PANE_COUNT}	integer string	new-session, split-pane, close-pane	Number of panes
${FOCUSED_PANE}	integer string	new-session, focus-pane	Focused pane index

Field	Type	Default	Description
name	string	none	Playbook name
description	string	none	Description
viewport.cols	u16	80	Viewport columns
viewport.rows	u16	24	Viewport rows
shell	string	system default	Shell binary
timeout_ms	u64	30000	Max execution time in ms
record	bool	false	Enable recording
plugins.enable	string[]	[]	Plugin IDs to enable
plugins.disable	string[]	[]	Plugin IDs to disable
vars	table	{}	Static variables (NAME = "VALUE")
env	table	{}	Environment variables
env_mode	string	none	"inherit" or "clean"
include	string[]	[]	Paths to include

Field	Present when	Type
status	always	"ok", "fail", or "error"
action	action executed	string
elapsed_ms	action executed	u64
detail	action has detail output	string
error	status is "fail" or "error"	string
snapshot	snapshot action executed	object
panes	screen command executed	array of PaneCapture
session_id	status command executed	UUID string
pane_count	status command executed	u32
focused_pane	status command executed	u32
type	always	message class (response, event, error)
seq	always	monotonic message sequence number
mono_ns	always	monotonic nanoseconds since interactive session start
request_id	JSON command/op requests	correlation id echoed in response

Op	Description
hello	Optional capability handshake.
command	Execute one DSL action line via dsl field (for example new-session, send-keys, assert-screen).
status	Return session metadata (session_id, pane_count, focused_pane).
hydrate	Hydrate detailed data (screen_full, event_window, incident).
subscribe	Start live event streaming with filters and budgets.
unsubscribe	Stop live event streaming.
set_watchpoint	Register anomaly watchpoint (kind: "event_burst").
clear_watchpoint	Remove a watchpoint by id.
quit	End the interactive session.

Element	Source	How
new-session	NewSession request in recording	Direct mapping
split-pane	SplitPane request	Direct mapping with direction
focus-pane	FocusPane request	Direct mapping with target index
send-keys	AttachInput / PaneDirectInput events	Consecutive inputs within 100ms are coalesced. pane=N added when input targets a non-focused pane.
wait-for	PaneOutputRaw events after a command	Last non-empty line of structured-grid-parsed output becomes the barrier pattern. Digit sequences are collapsed to \d+.
assert-screen	PaneOutputRaw events	Up to 3 distinctive content lines per response window become contains= checks.
sleep	Gaps > 200ms with no input/output	Mapped to sleep ms=N
@viewport	First AttachSetViewport request	Emitted as a directive

Field	Type	Always present	Description
playbook_name	string	null	yes
pass	bool	yes	true if all steps passed
steps	StepResult[]	yes	Per-step results
snapshots	SnapshotCapture[]	yes	Captured snapshots (may be empty)
recording_id	string	null	no
recording_path	string	null	no
total_elapsed_ms	u64	yes	Wall-clock execution time
error	string	null	no
sandbox_root	string	null	no