Playbook System Reference

Playbooks are headless, scriptable bmux sessions. A playbook defines a sequence of actions (create sessions, send keystrokes, assert screen content) that bmux executes against an ephemeral sandbox server and reports pass/fail results as structured JSON.
Primary use cases:
  • LLM-driven validation: generate playbooks from bug descriptions, run them to reproduce and verify fixes without manual screen recordings.
  • CI regression tests: deterministic, repeatable terminal interaction tests.
  • Recording conversion: turn a captured bmux session into a re-runnable test.
Execution model: By default, bmux playbook run spawns an isolated sandbox server in a temp directory, executes all steps, reports results, and tears down the server. Use --target-server to run against a live server instead.
Two input formats parse into the same internal representation:
FormatExtensionTypical use
Line-oriented DSL.dsl or stdinQuick authoring, LLM generation, piping
TOML.playbook.tomlStructured config, version control

CLI Commands

bmux playbook run

Run a playbook and report results.
bmux playbook run <source> [flags]
Argument/FlagTypeDefaultDescription
<source>stringrequiredPath to playbook file, or - for stdin
--jsonboolfalseOutput results as JSON to stdout
--interactiveboolfalsePause before each step for interactive control
--target-serverboolfalseRun against the live server instead of a sandbox
--recordboolfalseRecord the execution (overrides playbook config)
--export-gif <path>stringnoneExport recording as GIF (implies --record)
--viewport <COLSxROWS>stringnoneOverride viewport dimensions (e.g. 120x40)
--timeout <secs>u64noneOverride max playbook timeout in seconds
--shell <path>stringnoneOverride shell binary
--var KEY=VALUEstringnoneDefine a variable (repeatable, overrides @var)
--verbose / -vboolfalsePrint step-by-step progress to stderr
Note: global recording auto-export settings (recording.auto_export or --recording-auto-export) do not auto-export playbook recordings. Use --export-gif <path> for playbook runs.
Exit codes: 0 = all steps passed, 1 = one or more steps failed or error.
Stdin example:
echo 'new-session\nsend-keys keys="echo hi\\r"\nwait-for pattern="hi"' | bmux playbook run - --json
Interactive live tour:
Use --interactive from a real terminal (TTY) to enter a full-screen live tour that continuously renders pane output while the playbook runs.
The tour starts paused so you can immediately choose step-by-step (n) or switch to live mode (c / l).
  • space: pause/resume live playback
  • n: single-step one playbook step (when paused)
  • c / l: return to live running mode
  • :<dsl>: run an ad-hoc DSL action at step boundaries
  • q: abort run (remaining scheduled steps are marked skipped)
  • ?: show control help in the status line
If stdin/stdout are not TTYs (for example piped input in CI), --interactive automatically falls back to the line-prompt controls.

bmux playbook validate

Parse and validate a playbook without executing it.
bmux playbook validate <source> [--json]
Returns validation errors (missing new-session as first step, unknown actions, etc.).

bmux playbook dry-run

Parse, validate, and print the execution plan without running.
bmux playbook dry-run <source> [--json]
Argument/FlagTypeDefaultDescription
<source>stringrequiredPath to playbook file, or - for stdin
--jsonboolfalseOutput as structured JSON
Exit codes: 0 = playbook is valid, 1 = validation errors found.
JSON output:
{ "valid": true, "config": { "name": "my-test", "viewport": "80x24", "shell": "sh", "timeout_ms": 30000, "env_mode": "default", "record": false }, "steps": [ { "index": 0, "action": "new-session", "dsl": "new-session" }, { "index": 1, "action": "send-keys", "dsl": "send-keys keys='echo hi\\r'" }, { "index": 2, "action": "wait-for", "dsl": "wait-for pattern='hi'" } ], "step_count": 3, "errors": [] }
Each step’s dsl field contains the round-trip DSL serialization of the action, which is valid DSL syntax that can be copy-pasted.

bmux playbook diff

Compare results from two playbook runs. Produces a structured diff covering step status changes, screen text differences, timing comparison, and failure capture comparison.
bmux playbook diff <left.json> <right.json> [flags]
Argument/FlagTypeDefaultDescription
<left.json>stringrequiredPath to baseline/left playbook result JSON
<right.json>stringrequiredPath to new/right playbook result JSON
--jsonboolfalseOutput diff as structured JSON
--timing-threshold <pct>u6450Flag steps that slowed by more than this percent
Exit codes: 0 = no changes detected, 1 = changes or regressions found.
JSON output includes:
  • summary – outcome change, step/snapshot counts, total timing delta
  • step_diffs – per-step status changes, timing deltas, detail/expected/actual on failures
  • snapshot_diffs – per-snapshot pane text diffs (unified diff format via Myers algorithm)
  • failure_capture_diffs – screen state diffs from auto-snapshots on failure
  • timing_regressions – steps that exceeded the timing threshold
Usage pattern for before/after verification:
# Run before fix bmux playbook run --json test.dsl > before.json # Apply fix... bmux playbook run --json test.dsl > after.json # Compare bmux playbook diff --json before.json after.json

bmux playbook cleanup

Clean up sandbox temp directories from previous playbook runs. Useful after SIGKILL or crashes that prevent normal cleanup.
This command now uses the shared sandbox cleanup engine with source=playbook under the hood, so behavior stays aligned with bmux sandbox cleanup.
bmux playbook cleanup [--dry-run] [--json]
FlagTypeDefaultDescription
--dry-runboolfalseList orphaned dirs without deleting
--jsonboolfalseOutput as JSON
For advanced filters (for example --older-than or --failed-only), use:
bmux sandbox cleanup --source playbook [flags]

bmux playbook interactive

Start an interactive playbook session with a socket for agent control.
bmux playbook interactive [flags]
FlagTypeDefaultDescription
--socket <path>stringautoSocket path override
--recordboolfalseRecord the session
--viewport <COLSxROWS>string80x24Viewport dimensions
--shell <path>stringsystem defaultShell binary
--timeout <secs>u64no limitMax session lifetime
See Interactive Mode Protocol for the wire format.

bmux playbook from-recording

Generate a playbook from an existing recording.
bmux playbook from-recording <recording-id-or-name> [--output <path>]
If --output is omitted, writes to stdout. The generated playbook includes wait-for barriers and assert-screen checks derived from the recorded output. See Recording to Playbook Conversion.

DSL Format

Each line is one of:
Line typePrefixExample
Blank / whitespace(empty)Ignored
Comment## this is a comment
Config directive@@viewport cols=80 rows=24
Actionaction namesend-keys keys='echo hi\r'

Argument Format

Actions and directives use key=value pairs separated by whitespace:
action-name key1=value1 key2='value with spaces' key3="also quoted"
Quoting rules:
FormExampleNotes
Barekey=valueTerminated by next whitespace
Single-quotedkey='hello world'Supports C-style escapes
Double-quotedkey="hello world"Supports C-style escapes
C-style escape sequences (inside quoted values and send-keys keys=):
EscapeByteName
\r0x0DCarriage return
\n0x0ALine feed
\t0x09Tab
\00x00Null
\a0x07Bell
\b0x08Backspace
\e0x1BEscape (ESC)
\\0x5CLiteral backslash
\'0x27Literal single quote
\"0x22Literal double quote
\xNN0xNNArbitrary hex byte

Config Directives

Directives set playbook-wide configuration. They must appear before any action lines (or be interspersed; order relative to actions does not matter since directives are processed in a first pass).
DirectiveSyntaxDefaultDescription
@viewport@viewport cols=<u16> rows=<u16>80x24Terminal viewport dimensions
@driver@driver sandbox|attach-simsandboxExecution backend; attach-sim runs deterministic attach UI simulation without a server/PTY
@shell@shell <path>system defaultShell binary for the sandbox
@timeout@timeout <ms>30000Max playbook execution time in milliseconds
@record@record true|falsefalseEnable recording of the execution
@render-trace@render-trace true|falsefalseEnable per-step normalized render summaries
@name@name <string>nonePlaybook name (included in JSON output)
@description@description <string>nonePlaybook description
@plugin@plugin enable=<id> or @plugin disable=<id>all enabledEnable/disable specific plugins
@var@var NAME=VALUEnoneDefine a static variable for ${NAME} substitution
@env@env NAME=VALUEnoneSet an environment variable in the sandbox process
@env-mode@env-mode inherit|cleaninheritSandbox environment isolation mode
@include@include <path>noneInclude another playbook file (recursive, max depth 10)

Environment Modes

ModeBehavior
inheritSandbox inherits the full parent environment, then overlays deterministic defaults for TERM (xterm-256color), LANG (C.UTF-8), LC_ALL (C.UTF-8), and HOME (sandbox temp dir). @env entries are applied on top.
cleanSandbox starts with an empty environment. Only PATH, USER, and SHELL are inherited from the parent. All other variables use deterministic defaults or explicit @env entries.
Resolution chain: @env-mode in playbook (if set) > BMUX_PLAYBOOK_ENV_MODE environment variable (if set) > inherit.

Actions Reference

Deterministic Attach Simulation

Use @driver attach-sim for lightweight attach UI tests that do not start a server or PTY. The driver feeds normalized terminal events into the same attach UI reducer used by production, applies effects to fake state, and renders with the real status-line renderer.
@driver attach-sim @viewport cols=100 rows=24 seed-window-list names='one,two,three' active='one' render assert-rendered contains='1:one' locate id='one' text='1:one' locate id='three' text='3:three' terminal-event kind=mouse phase=down button=left col='${one.center_col}' row='${one.row}' terminal-event kind=mouse phase=move button=left col='${three.end_col}' row='${three.row}' terminal-event kind=mouse phase=up button=left col='${three.end_col}' row='${three.row}' assert-effect operation='move-window' assert-state path='windows.names' equals='["two","three","one"]'
Supported attach-sim actions:
ActionPurpose
seed-window-listSeed fake windows: names='one,two' active='one'
seed-pane-textSeed fake focused-pane text for scrollback/selection scenarios: lines='one|two' cursor_row=2 cursor_col=1
seed-pane-layoutSeed fake pane layout for mouse/layout scenarios, currently split='vertical' or split='floating'
set-configSet supported sim config, currently status_bar.tab_order=mru|stable and appearance.status_position=top|bottom
renderRe-render fake attach status UI
snapshotCapture the current attach-sim render in the playbook result snapshots
locateLocate rendered text and define ${id.start_col}, ${id.center_col}, ${id.end_col}, ${id.row}
terminal-eventSend normalized terminal input; currently mouse events are supported
send-attachSend an attach key chord through the attach keybinding processor in simulation
assert-renderedAssert rendered output contains or matches text
assert-effectAssert an effect such as move-window, resize-pane, focus-pane, or move-floating-pane was emitted
assert-no-effectAssert an effect was not emitted
assert-stateAssert fake state; currently supports windows.names, windows.active_name, scrollback.active, scrollback.cursor, selection.active, selection.text, help_overlay.open, help_overlay.scroll, and prompt.active
This driver is intentionally generic around terminal events, rendering, effects, and state assertions. Feature fixtures are allowed, but the input and assertion primitives should remain reusable for future attach UI behavior.
When adding attach-sim coverage for another UI feature, prefer this pattern:
  1. Put production behavior behind a reducer/effect path that accepts normalized terminal input and explicit geometry/config.
  2. Extend the simulation harness fake state only enough to execute those effects.
  3. Add generic actions or assertions only when the new behavior needs reusable terminal/render/effect/state vocabulary.
  4. Keep feature-specific setup in narrowly named seed/config actions or fixtures, not in duplicated test-only UI logic.

Session Lifecycle

new-session

Create a new session. Must be the first action in a sandbox playbook.
new-session [name=<string>]
ArgTypeRequiredDefaultDescription
namestringnoautoSession name
Sets ${SESSION_ID}, ${SESSION_NAME}, ${PANE_COUNT} (=1), ${FOCUSED_PANE} (=1).

kill-session

Kill a session by name.
kill-session name=<string>
ArgTypeRequiredDefaultDescription
namestringyes-Session name

Pane Management

split-pane

Split the current pane.
split-pane [direction=vertical|horizontal|v|h] [ratio=<f64>]
ArgTypeRequiredDefaultDescription
directionstringnoverticalSplit direction. v/vertical or h/horizontal
ratiof64nonone (server default)Split ratio (0.0-1.0)
Increments ${PANE_COUNT}.

focus-pane

Change the focused pane.
focus-pane target=<u32>
ArgTypeRequiredDefaultDescription
targetu32yes-Pane index to focus (1-based)
Updates ${FOCUSED_PANE}.

close-pane

Close a pane.
close-pane [target=<u32>]
ArgTypeRequiredDefaultDescription
targetu32nofocused panePane index to close (1-based)
Decrements ${PANE_COUNT}.

Input

send-keys

Send input bytes to a pane. This is the primary way to type commands.
send-keys keys=<escaped-string> [pane=<u32>]
ArgTypeRequiredDefaultDescription
keysstringyes-Input bytes with C-style escapes. Use \r for Enter.
paneu32nofocused paneTarget pane index (1-based). Uses PaneDirectInput for race-free delivery.
Examples:
send-keys keys='echo hello\r' send-keys keys='ls -la\r' pane=2 send-keys keys='\x03' # Ctrl+C send-keys keys='\e[A' # Up arrow

send-bytes

Send raw bytes specified as a hex string.
send-bytes hex=<hex-string>
ArgTypeRequiredDefaultDescription
hexstringyes-Hex-encoded bytes (e.g. 1b5b41 for ESC [ A)

send-attach

Send a key chord through the attach keybinding runtime (same path as interactive attach mode). Use this for UI-mode behaviors like scrollback/copy-mode, keybinding-driven pane focus, and runtime/plugin commands.
send-attach key=<chord>
ArgTypeRequiredDefaultDescription
keystringyes-Key chord string (e.g. ctrl+a [, k, esc)

prefix-key

Compatibility alias that sends Ctrl-A plus one key via send-attach.
prefix-key key=<char>
ArgTypeRequiredDefaultDescription
keycharyes-Single character to send after the prefix
Do not mix attach UI-mode entry with send-keys for follow-up navigation keys. send-keys writes bytes to the pane shell; send-attach runs attach key handling.
# Bad: enters scrollback, then types into shell prefix-key key='[' send-keys keys='k\r' # Good: all UI-mode keys use attach path send-attach key='ctrl+a [' send-attach key='k' send-attach key='enter'

Synchronization

wait-for

Poll the screen until a regex pattern matches. This is the primary synchronization mechanism – use it after send-keys to wait for output before proceeding.
wait-for pattern=<regex> [pane=<u32>] [timeout=<ms>] [retry=<u32>]
ArgTypeRequiredDefaultDescription
patternregexyes-Regex pattern to match against screen text
paneu32nofocused panePane index (1-based)
timeoutu64no5000Max wait time in milliseconds
retryu32no1Number of attempts (1 = no retry)
Polling behavior: Exponential backoff starting at 10ms, doubling up to 200ms max (10, 20, 40, 80, 160, 200, 200…). Each poll drains output and refreshes the screen.
On timeout: The step fails with an error message that includes the first 200 characters of the current screen text for debugging.
Pattern tips:
  • Use \\d+ to match any sequence of digits (PIDs, line numbers, etc.)
  • Use \\$ to match a literal $ (common in shell prompts)
  • The pattern is tested against the full visible screen text of the target pane.

sleep

Pause execution for a fixed duration. Prefer wait-for when possible.
sleep ms=<u64>
ArgTypeRequiredDefaultDescription
msu64yes-Duration in milliseconds

wait-for-event

Wait for a server-side event.
wait-for-event event=<name> [timeout=<ms>]
ArgTypeRequiredDefaultDescription
eventstringyes-Event name (exact match)
timeoutu64no5000Max wait time in milliseconds
Supported event names:
Event nameTriggered when
server_startedServer finishes startup
server_stoppingServer begins shutdown
session_createdA new session is created
session_removedA session is destroyed
client_attachedA client attaches to a session
client_detachedA client detaches
attach_view_changedThe attached view layout changes

Assertions

assert-screen

Assert conditions on the visible screen text. At least one of contains, not_contains, or matches is required.
assert-screen [pane=<u32>] [contains=<string>] [not_contains=<string>] [matches=<regex>]
ArgTypeRequiredDefaultDescription
paneu32nofocused panePane index (1-based)
containsstringno-Substring that must be present
not_containsstringno-Substring that must NOT be present
matchesregexno-Regex pattern that must match
Checks are evaluated in order: contains first, then not_contains, then matches. All specified checks must pass.
On failure: The error detail includes the full screen text of the target pane, allowing the caller to see what was actually on screen.
Examples:
assert-screen contains='hello world' assert-screen not_contains='error' pane=1 assert-screen matches='total \\d+ files' assert-screen contains='success' not_contains='failure'

assert-layout

Assert the number of panes.
assert-layout pane_count=<u32>
ArgTypeRequiredDefaultDescription
pane_countu32yes-Expected number of panes

assert-cursor

Assert the cursor position in a pane.
assert-cursor [pane=<u32>] row=<u16> col=<u16>
ArgTypeRequiredDefaultDescription
paneu32nofocused panePane index (1-based)
rowu16yes-Expected cursor row (0-based)
colu16yes-Expected cursor column (0-based)

render-mark / assert-render

When @render-trace true is enabled, playbooks attach a normalized render summary to each step result. Use render-mark to name the current trace position, then assert-render to verify bounded render work since that mark. The step summary is derived from normalized pane/cell deltas and does not store raw ANSI bytes or pane text. Exact trace snapshots use compact semantic ops such as full-frame and pane-row-segment:<pane>:<row>:<start_col>:<cells>; the same compact format also covers actual attach-render trace ops such as status-line, help-overlay, and extension-cached-replay:<surface> for trace-backed summaries.
@render-trace true render-mark id='baseline' sleep ms=10 assert-render since='baseline' max_frames=0 max_rows_emitted=0 max_cells_emitted=0 full_frame=false
ArgTypeRequiredDefaultDescription
sincestryes-Existing render-mark ID
min_framesu64no-Minimum observed frames
max_framesu64no-Maximum observed frames
full_frameboolno-Whether any full-frame render is allowed/expected
max_full_frame_framesu64no-Maximum full-frame render count
max_full_surface_fallbacksu64no-Maximum full-surface fallback count
max_damage_rectsu64no-Maximum damage rect count
max_damage_area_cellsu64no-Maximum damaged cell area
max_rows_emittedu64no-Maximum changed/emitted rows
max_row_segments_emittedu64no-Maximum emitted row segment count
max_cells_emittedu64no-Maximum changed/emitted cells
max_frame_bytesu64no-Maximum estimated frame bytes
status_renderedboolno-Whether status rendering was observed
overlay_renderedboolno-Whether overlay rendering was observed
expected_emitted_rowsstrno-Exact normalized pane rows as pane:row,pane:row
expected_emitted_row_segmentsstrno-Exact normalized row segments as pane:row:start_col:cells
expected_trace_opsstrno-Exact semantic trace ops, comma-separated (full-frame, clear-row:row:cells, pane-row-full:pane:row:cells, pane-row-segment:pane:row:start_col:cells, pane-row-cache-skip:pane:row, pane-rows-sync-deferred:pane:rows, extension-ops:surface:regions:full_surface, extension-cached-replay:surface, extension-imperative:surface:regions:full_surface, status-line, help-overlay, prompt-overlay, damage-overlay:rects:cells, cursor:pane:visible, overlay)

Inspection

snapshot

Capture the current screen state of all panes. Snapshots are included in the PlaybookResult.snapshots array and in interactive mode responses.
snapshot id=<string>
ArgTypeRequiredDefaultDescription
idstringyes-Label for this snapshot (used to identify it in results)
Each snapshot captures every pane’s visible text, cursor position, focus state, and index.

screen

Capture and return the current screen state. In batch mode, the step detail contains JSON-serialized pane captures. In interactive mode, the response panes field is populated.
screen
No arguments. Useful for LLM debugging – inspect screen state without asserting.

status

Query the current session status. Returns session ID, pane count, and focused pane index in the step detail.
status
No arguments.

Layout

resize-viewport

Change the terminal viewport dimensions.
resize-viewport cols=<u16> rows=<u16>
ArgTypeRequiredDefaultDescription
colsu16yes-New column count
rowsu16yes-New row count

Services

invoke-service

Invoke a plugin service.
invoke-service capability=<cap> interface=<id> operation=<op> [kind=query|command] [payload=<json>]
ArgTypeRequiredDefaultDescription
capabilitystringyes-Plugin capability name
interfacestringyes-Service interface ID
operationstringyes-Operation name
kindstringnocommandquery/q or command/cmd
payloadstringno""JSON payload string

Step Modifiers

!continue — Continue on Error

Append !continue to any action line to prevent the playbook from stopping if that step fails. The step is still recorded as fail in the results, and pass will be false, but execution continues to the next step.
assert-screen contains='optional_check' !continue assert-screen contains='required_check'
In TOML format, use continue_on_error = true on the step:
[[step]] action = "assert-screen" contains = "optional_check" continue_on_error = true
This is useful for diagnostic playbooks that want to check multiple conditions and report all failures, not just the first one.

Variable Substitution

Playbook values support ${NAME} variable references. Variables are resolved at execution time, not parse time.

Variable Sources

Variables are resolved in this order (first match wins):
  1. Runtime variables – dynamic values set during execution
  2. Static variables – defined via @var directives
  3. Environment variables – from the process environment
  4. Unresolved – if no match, ${NAME} is left as-is (with a warning logged)

Literal ${ Escaping

Use $${...} to produce a literal ${...} without variable expansion:
send-keys keys='echo $${HOME}\r' # sends literal ${HOME} to the terminal
The first $ acts as an escape character. After resolution, $${HOME} becomes the literal string ${HOME}.

Runtime Variables

VariableTypeSet byDescription
${SESSION_ID}UUID stringnew-sessionCurrent session UUID
${SESSION_NAME}stringnew-sessionCurrent session name
${PANE_COUNT}integer stringnew-session, split-pane, close-paneNumber of panes
${FOCUSED_PANE}integer stringnew-session, focus-paneFocused pane index

Static Variables

Defined with @var:
@var BASE_DIR=/tmp/test @var MARKER=test_marker_42 send-keys keys='cd ${BASE_DIR}\r' wait-for pattern='${MARKER}'
Static variables take priority over environment variables with the same name.

TOML Format

TOML playbooks use [playbook] for config and [[step]] for actions.

[playbook] Section

FieldTypeDefaultDescription
namestringnonePlaybook name
descriptionstringnoneDescription
viewport.colsu1680Viewport columns
viewport.rowsu1624Viewport rows
shellstringsystem defaultShell binary
timeout_msu6430000Max execution time in ms
recordboolfalseEnable recording
plugins.enablestring[][]Plugin IDs to enable
plugins.disablestring[][]Plugin IDs to disable
varstable{}Static variables (NAME = "VALUE")
envtable{}Environment variables
env_modestringnone"inherit" or "clean"
includestring[][]Paths to include

[[step]] Entries

Each step requires an action field. Other fields are action-specific:
[[step]] action = "new-session" name = "my-session" [[step]] action = "send-keys" keys = "echo hello\r" pane = 1 [[step]] action = "wait-for" pattern = "hello" timeout = 5000 [[step]] action = "wait-for" pattern = "flaky_output" retry = 3 [[step]] action = "assert-screen" contains = "hello" [[step]] action = "assert-screen" contains = "optional" continue_on_error = true

TOML Example

Equivalent to the DSL example in Example 1:
[playbook] name = "echo-test" viewport = { cols = 80, rows = 24 } shell = "sh" [[step]] action = "new-session" [[step]] action = "send-keys" keys = "echo hello_world\r" [[step]] action = "wait-for" pattern = "hello_world" [[step]] action = "assert-screen" contains = "hello_world"

Sandbox Environment

How It Works

bmux playbook run (without --target-server) creates an ephemeral sandbox:
  1. Creates a temp directory (/tmp/bpb-<hex>) with isolated config, runtime, data, and state subdirectories.
  2. Writes a minimal bmux.toml config with shell and plugin overrides.
  3. Spawns a bmux server start process pointing at the temp directories.
  4. Waits for the server to accept connections (up to 15 seconds).
  5. Executes all playbook steps against the sandbox.
  6. Stops the server and cleans up the temp directory.

Plugin Configuration

By default, all bundled plugins are available. Use @plugin to control this:
@plugin disable=bmux.windows # disable a specific plugin @plugin enable=bmux.permissions # only enable specific plugins
When any enable is specified, all other plugins are implicitly disabled.

Assertions and Synchronization

Best Practices for Deterministic Assertions

  1. Always use wait-for before assert-screen. Output arrives asynchronously – without a sync barrier, assertions may check stale screen content.
  2. Match on distinctive output, not prompts. Shell prompts vary across machines and shells. Match on your command’s output instead:
    send-keys keys='echo UNIQUE_MARKER_123\r' wait-for pattern='UNIQUE_MARKER_123'
  3. Use \d+ for non-deterministic numbers. PIDs, line counts, timestamps:
    wait-for pattern='process started, pid=\d+'
  4. Use @env-mode clean for maximum determinism. This prevents the sandbox from inheriting unpredictable environment variables.
  5. Use @shell sh for portable playbooks. sh behavior is more predictable across systems than bash/zsh.
  6. Prefer contains over matches when possible. Substring matching is simpler and less fragile than regex.

Interactive Mode Protocol

Interactive mode provides a socket-based REPL for LLM agents to control bmux dynamically.

Startup

bmux playbook interactive --viewport 80x24
On startup, bmux prints a JSON ready message to stdout:
{ "status": "ready", "socket": "/tmp/bpb-xxx/r/playbook.sock", "sandbox_root": "/tmp/bpb-xxx" }
The LLM agent connects to the socket path and communicates via line-delimited JSON.

Wire Protocol

Interactive mode is JSON-op only in v2: one JSON object per line (\n-delimited).
JSON op examples:
{"op":"hello","protocol_version":1,"client":"llm-agent"} {"op":"command","request_id":"r1","dsl":"new-session"} {"op":"subscribe","event_types":["pane_output","cursor_delta","screen_delta"],"screen_delta_format":"line_ops"}
Response: one JSON object per \n.

Response Schema

{ "type": "response" | "event" | "error", "seq": 1, "mono_ns": 1000000, "request_id": "optional-correlation-id", "status": "ok" | "fail" | "error", "action": "send-keys", "elapsed_ms": 12, "detail": "optional detail string", "error": "error message on failure", "snapshot": { "id": "...", "panes": [...] }, "panes": [{ "index": 1, "focused": true, "screen_text": "...", "cursor_row": 0, "cursor_col": 5 }], "session_id": "uuid-string", "pane_count": 2, "focused_pane": 1 }
All fields except status are optional and omitted when not applicable.
FieldPresent whenType
statusalways"ok", "fail", or "error"
actionaction executedstring
elapsed_msaction executedu64
detailaction has detail outputstring
errorstatus is "fail" or "error"string
snapshotsnapshot action executedobject
panesscreen command executedarray of PaneCapture
session_idstatus command executedUUID string
pane_countstatus command executedu32
focused_panestatus command executedu32
typealwaysmessage class (response, event, error)
seqalwaysmonotonic message sequence number
mono_nsalwaysmonotonic nanoseconds since interactive session start
request_idJSON command/op requestscorrelation id echoed in response

Special Commands

OpDescription
helloOptional capability handshake.
commandExecute one DSL action line via dsl field (for example new-session, send-keys, assert-screen).
statusReturn session metadata (session_id, pane_count, focused_pane).
hydrateHydrate detailed data (screen_full, event_window, incident).
subscribeStart live event streaming with filters and budgets.
unsubscribeStop live event streaming.
set_watchpointRegister anomaly watchpoint (kind: "event_burst").
clear_watchpointRemove a watchpoint by id.
quitEnd the interactive session.

Push Output Events

After sending subscribe, the server pushes events as they arrive.
Pane output event:
{ "type": "event", "status": "ok", "event_type": "pane_output", "pane_index": 1, "output_data": "hello world\n" }
Cursor delta event:
{ "type": "event", "status": "ok", "event_type": "cursor_delta", "cursor_delta": { "pane_index": 1, "from": { "row": 10, "col": 1 }, "to": { "row": 10, "col": 12 }, "distance": 11 } }
Screen delta event (LLM-friendly line ops):
{ "type": "event", "status": "ok", "event_type": "screen_delta", "screen_delta": { "pane_index": 1, "format": "line_ops", "base_hash": "9f1b2c3d4e5f6a70", "new_hash": "4f8e1d3ab2c04910", "ops": [ { "op": "set_line", "row": 12, "text": "fn main() {" }, { "op": "cursor", "row": 12, "col": 11 } ] } }
Screen delta event (human-readable unified diff):
{ "type": "event", "status": "ok", "event_type": "screen_delta", "screen_delta": { "pane_index": 1, "format": "unified_diff", "base_hash": "9f1b2c3d4e5f6a70", "new_hash": "4f8e1d3ab2c04910", "diff": "@@ -13,1 +13,1 @@\n-fn mian() {\n+fn main() {\n" } }
Push events have event_type set (e.g. "output"), which distinguishes them from command responses. They may arrive between commands or interleaved with command responses.
FieldTypeDescription
event_typestringPush event type (pane_output, pane_input, cursor_delta, screen_delta, server_event, request_lifecycle, watchpoint_hit)
pane_indexu32The pane that produced the output
output_datastringThe new output text (UTF-8, may contain escape sequences)
Watchpoint hit event:
{ "type": "event", "status": "ok", "event_type": "watchpoint_hit", "watchpoint_hit": { "id": "cursor-delta-burst-1", "kind": "event_burst", "watch_event_type": "cursor_delta", "pane_index": 1, "summary": "event burst detected: event_type=cursor_delta hits=3 min_hits=3 pane=1", "window_ms": 500, "min_hits": 3, "observed_hits": 3, "peak_distance": 12, "evidence_seq_start": 42, "evidence_seq_end": 42 } }
subscribe JSON options:
  • event_types: array of event names (pane_output, cursor_delta, screen_delta, watchpoint_hit).
  • pane_indexes: optional pane-index filter.
  • screen_delta_format: line_ops, unified_diff, or auto.
    • auto resolves to line_ops for machine-readable clients (e.g. client: "llm-agent") and unified_diff otherwise.
  • max_events_per_sec: optional streaming event budget.
  • max_bytes_per_sec: optional streaming byte budget.
  • coalesce_ms: optional per-event-type coalescing interval.
set_watchpoint JSON options:
  • id: required watchpoint id.
  • kind: event_burst.
  • event_type: required watched stream event (pane_output, pane_input, cursor_delta, screen_delta, server_event, request_lifecycle).
  • pane_index: optional pane scope (defaults to any pane).
  • window_ms: burst window in milliseconds (default 500).
  • min_hits: required hit count inside window_ms (default 3).
  • contains_regex: optional regex predicate (v1: supported for event_type: "pane_output" only).
Example (only trigger on pane output that matches):
{"op":"set_watchpoint","id":"errors-only","kind":"event_burst","event_type":"pane_output","contains_regex":"(?i)error|panic","min_hits":1,"window_ms":1000}
watchpoint_hit cannot be watched in v1 (recursive watchpoint loops are blocked).
hydrate JSON options:
  • kind: "screen_full" for full pane snapshot.
  • kind: "event_window" with start_seq and end_seq.
  • kind: "incident" with id (watchpoint id) or around_seq, plus optional window_radius.
Use unsubscribe to stop receiving push events.

Example Session

→ new-session ← {"status":"ok","action":"new-session","elapsed_ms":150,"detail":"session_id=a1b2c3..."} → send-keys keys='echo hello\r' ← {"status":"ok","action":"send-keys","elapsed_ms":5} → screen ← {"status":"ok","action":"screen","panes":[{"index":1,"focused":true,"screen_text":"$ echo hello\nhello\n$ ","cursor_row":2,"cursor_col":2}]} → assert-screen contains='hello' ← {"status":"ok","action":"assert-screen","elapsed_ms":10} → quit ← {"status":"ok","action":"quit"}

Recording to Playbook Conversion

bmux playbook from-recording converts a recorded bmux session into a runnable playbook.

What Gets Generated

ElementSourceHow
new-sessionNewSession request in recordingDirect mapping
split-paneSplitPane requestDirect mapping with direction
focus-paneFocusPane requestDirect mapping with target index
send-keysAttachInput / PaneDirectInput eventsConsecutive inputs within 100ms are coalesced. pane=N added when input targets a non-focused pane.
wait-forPaneOutputRaw events after a commandLast non-empty line of structured-grid-parsed output becomes the barrier pattern. Digit sequences are collapsed to \d+.
assert-screenPaneOutputRaw eventsUp to 3 distinctive content lines per response window become contains= checks.
sleepGaps > 200ms with no input/outputMapped to sleep ms=N
@viewportFirst AttachSetViewport requestEmitted as a directive

Pattern Robustness

Generated patterns are made robust to non-deterministic content:
  • Digit sequences (12345) are replaced with \d+
  • Regex metacharacters (., *, +, $, etc.) are escaped
  • Structural text (command names, paths, error messages) is preserved as literal matches

Limitations

  • Multi-client recordings produce playbooks from a single client’s perspective.
  • Very long outputs (>256KB ring buffer) may have incomplete screen reconstruction.
  • Some manual editing may be needed for complex workflows (e.g., interactive programs, timing-sensitive sequences).

JSON Output Schema

When using --json, bmux playbook run outputs a PlaybookResult:

PlaybookResult

{ "playbook_name": "my-test", "pass": true, "steps": [ ... ], "snapshots": [ ... ], "recording_id": "uuid-string", "recording_path": "/path/to/recording", "total_elapsed_ms": 1234, "error": "top-level error message" }
FieldTypeAlways presentDescription
playbook_namestringnullyes
passboolyestrue if all steps passed
stepsStepResult[]yesPer-step results
snapshotsSnapshotCapture[]yesCaptured snapshots (may be empty)
recording_idstringnullno
recording_pathstringnullno
total_elapsed_msu64yesWall-clock execution time
errorstringnullno
sandbox_rootstringnullno

StepResult

{ "index": 0, "action": "send-keys", "status": "pass", "elapsed_ms": 5, "detail": "optional detail" }
On failure, additional structured fields are included:
{ "index": 3, "action": "assert-screen", "status": "fail", "elapsed_ms": 12, "detail": "assert-screen: pane 1 does not contain 'expected_output'", "expected": "expected_output", "actual": "$ echo something_else\nsomething_else\n$ ", "failure_captures": [ { "index": 1, "focused": true, "screen_text": "$ echo something_else\nsomething_else\n$ ", "cursor_row": 2, "cursor_col": 2 } ] }
FieldTypeDescription
indexu64Step index (0-based)
actionstringAction name
statusstring"pass", "fail", or "skip"
elapsed_msu64Step execution time
detailstringnull
expectedstringnull
actualstringnull
failure_capturesPaneCapture[]null
The expected and actual fields allow machine consumers (LLMs) to compare expected vs actual values without parsing the detail string. The failure_captures array provides the full screen state of every pane at the moment of failure, regardless of which pane was being asserted on.

SnapshotCapture

{ "id": "after_echo", "panes": [ ... ] }

PaneCapture

{ "index": 1, "focused": true, "screen_text": "$ echo hello\nhello\n$ ", "cursor_row": 2, "cursor_col": 2 }
FieldTypeDescription
indexu32Pane index (1-based)
focusedboolWhether this pane has focus
screen_textstringVisible text, trailing whitespace trimmed per line
cursor_rowu16Cursor row (0-based)
cursor_colu16Cursor column (0-based)

Examples

Example 1: Basic echo + assert

The simplest useful playbook: run a command, wait for output, verify it.
@viewport cols=80 rows=24 @shell sh new-session send-keys keys='echo hello_world\r' wait-for pattern='hello_world' assert-screen contains='hello_world'

Example 2: Multi-pane workflow

Split the terminal, send different commands to each pane, verify both.
@viewport cols=120 rows=40 @shell sh new-session split-pane direction=vertical send-keys keys='echo left_pane\r' pane=1 sleep ms=500 assert-screen contains='left_pane' pane=1 send-keys keys='echo right_pane\r' pane=2 sleep ms=500 assert-screen contains='right_pane' pane=2

Example 3: Regex wait-for patterns

Use regex to match output with non-deterministic content.
@shell sh new-session send-keys keys='echo "pid=$$, count=42"\r' wait-for pattern='pid=\d+, count=\d+'

Example 4: Clean environment for determinism

Use @env-mode clean to ensure the sandbox has a predictable environment.
@viewport cols=80 rows=24 @shell sh @env-mode clean new-session send-keys keys='echo $TERM\r' wait-for pattern='xterm-256color' assert-screen contains='xterm-256color'

Example 5: Variables and environment overrides

Use @var for playbook-scoped constants and @env for process environment.
@shell sh @var MARKER=unique_test_id_987 @env MY_APP_MODE=testing new-session send-keys keys='echo ${MARKER} $MY_APP_MODE\r' wait-for pattern='${MARKER}' assert-screen contains='unique_test_id_987 testing'

Example 6: Snapshot inspection

Capture a named snapshot and inspect its content in the JSON output.
@shell sh new-session send-keys keys='ls /etc\r' wait-for pattern='\$' snapshot id=etc_listing
Run with --json and inspect result.snapshots[0].panes[0].screen_text to see the directory listing.

Example 7: Screen and status for debugging

Use screen and status to inspect state mid-playbook. Useful when developing a playbook to understand what the terminal shows.
@shell sh new-session send-keys keys='echo step1\r' sleep ms=300 screen status send-keys keys='echo step2\r' sleep ms=300 screen
Each screen step’s detail in the JSON output contains the full pane text at that point in execution.

Example 8: Expected failure testing

Verify that a specific error condition is detected.
@shell sh new-session send-keys keys='echo real_output\r' wait-for pattern='real_output' assert-screen contains='nonexistent_string'
This playbook is expected to fail. Run with --json and check result.pass == false and the failing step’s detail field for the actual screen content.

Example 9: Recording conversion workflow

  1. Record a session (manual start/stop):
    bmux recording start --name startup-repro # ... do things in bmux ... bmux recording stop # inspect manual recording storage and defaults bmux recording path bmux recording status
    Or use a rolling capture cut without stopping the hidden rolling recorder:
    # ~/.config/bmux/config.toml [recording] enabled = true rolling_window_secs = 300 # rolling capture categories (optional) rolling_capture_input = true rolling_capture_output = true rolling_capture_events = true rolling_capture_protocol_replies = false rolling_capture_images = false # or explicit allowlist (takes precedence over categories when non-empty) # rolling_event_kinds = ["pane_output_raw", "protocol_reply_raw", "pane_image"]
    # default cut window = full rolling window (300s in this example) bmux recording cut --name startup-snapshot # optional explicit window bmux recording cut --last-seconds 90
    You can override rolling behavior at server boot:
    # force on for this boot (and optionally override window) bmux server start --rolling-recording --rolling-window-secs 300 # choose exact kinds for this boot bmux server start --rolling-window-secs 300 --rolling-event-kind-all bmux server start --rolling-window-secs 300 --rolling-event-kind pane-output-raw --rolling-event-kind protocol-reply-raw # category overrides for this boot bmux server start --rolling-capture-input --no-rolling-capture-events --rolling-capture-protocol-replies # force off for this boot bmux server start --no-rolling-recording # kill switch on a running server (and restart with runtime overrides) bmux server recording stop bmux server recording start --rolling-window-secs 120 --rolling-event-kind-all # inspect rolling storage path + status/usage bmux server recording path bmux server recording status # clear rolling data (default: restart if active) bmux server recording clear # clear and keep rolling stopped bmux server recording clear --no-restart
  2. Convert to a playbook:
    bmux playbook from-recording <recording-id-or-name> --output repro.dsl
  3. Review and edit the generated playbook. The auto-generated wait-for patterns may need adjustment for your environment.
  4. Run it:
    bmux playbook run repro.dsl --json

Example 10: CLI variable overrides

Pass variables from the command line to override @var defaults:
# The playbook uses ${MARKER} which defaults to "test" bmux playbook run test.dsl --var MARKER=production_check --json

Example 11: Retry flaky operations

Use retry= on wait-for for operations that may not succeed immediately:
@shell sh new-session send-keys keys='./flaky_server.sh &\r' wait-for pattern='server ready' timeout=3000 retry=3

Example 12: Continue on error for diagnostics

Use !continue to check multiple conditions and report all failures:
@shell sh new-session send-keys keys='run_diagnostics\r' wait-for pattern='\$' assert-screen contains='check_1_ok' !continue assert-screen contains='check_2_ok' !continue assert-screen contains='check_3_ok' !continue snapshot id=diagnostic_results

Example 13: Literal variable references

Use $${...} to send literal ${...} to the terminal:
@shell sh new-session send-keys keys='echo $${HOME}\r' wait-for pattern='\$\{HOME\}'

Example 14: LLM-generated playbook pattern

An LLM generating a playbook from a bug description should follow this pattern:
# 1. Set up a deterministic environment @viewport cols=80 rows=24 @shell sh @env-mode clean # 2. Create a session new-session # 3. For each command: # a. send-keys with \r to execute # b. wait-for on distinctive output (not the prompt) # c. assert-screen to verify expected behavior send-keys keys='mkdir -p /tmp/test_dir\r' wait-for pattern='\$' send-keys keys='ls /tmp/test_dir\r' wait-for pattern='\$' # 4. Assert the expected outcome assert-screen not_contains='No such file' # 5. Use snapshot for evidence capture snapshot id=final_state
Key principles:
  • Always use @env-mode clean and @shell sh for reproducibility.
  • Always wait-for after send-keys before asserting.
  • Match on command output, not shell prompts.
  • Use \d+ in patterns for numbers that may vary.
  • Capture a snapshot at the end for debugging if the playbook fails.