Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

MCP Server

waydriver-mcp is a standalone binary that exposes the library over the Model Context Protocol, letting AI assistants (Claude Desktop, Claude Code, etc.) drive GTK4 apps in isolated headless sessions. It speaks JSON-RPC over stdio and constructs the Mutter backends internally — clients only see the high-level tools below.

ToolPurpose
start_sessionSpawn a headless Mutter session and launch a command inside it (optional report_dir, resolution, scale, isolate_settings, gsettings, record_video, video_bitrate, capture_external_effects overrides per session)
list_sessionsList active session ids, app names, and Wayland displays
kill_sessionTear down a session and clean up all child processes
set_settingChange a GSettings key on the running session live — rewrites the isolated keyfile in place so the app re-applies it via its changed handler (cursor, fonts, color-scheme, …) without a restart
dump_treeDump the AT-SPI accessibility tree as XML — each node carries a _ref you can target with query/click/etc.
queryEvaluate an XPath over the tree; returns every match’s role, name, attributes, and states
click / double_click / right_clickInvoke an element’s primary / secondary / tertiary AT-SPI Action. Auto-waits for visibility + enablement.
hoverMove the pointer to an element’s center — drives a real Wayland motion event so hover-state UI repaints
drag_toPress, move across an element’s center, release — full Wayland drag gesture
drag_to_coordsLike drag_to, but release at raw screen-absolute (x, y) — drop onto empty space or off the source window (libadwaita tab drag-out and other “drop onto nothing” DnD)
focusGive keyboard focus to an element via AT-SPI Component::grab_focus
set_textReplace an editable element’s contents via EditableText (fast, requires the interface)
fillFocus + clear + type — fallback for widgets without EditableText (e.g. GtkTextView/GtkEntry). Tries AT-SPI Component::grab_focus first; widgets whose bridge doesn’t expose Component (the documented GTK4 case) fall back to a pointer click at the widget’s centre to drive focus through the input layer, the same way a user would. Set assume_focused: true to skip the whole focus step when the target is already focused. Supports caret_nav/select_all clear modes.
select_optionPick an entry from a Selection-interface container (combo box, list, …) by label or by index
read_textRead an element’s text via the Text interface
read_valueRead an element’s AT-SPI Value (current/min/max) — a scrolled view’s offset, or a slider/progress/spin value
scrollScroll a located area by wheel detents along an axis (parks the pointer over it first); pair with read_value to confirm the offset moved
type_textType a string into the currently focused element through the input backend
press_keyPress a named key or chord (Return, Ctrl+A, Shift+Tab, Escape, …)
move_pointerMove the pointer by a relative offset in logical pixels
pointer_clickPress and release a pointer button (defaults to left click)
take_screenshotCapture a PNG via the keepalive ScreenCast stream and return its path
compare_element_to_baselineCrop an element and diff it against a committed reference PNG (perceptual CIEDE2000) — returns a diff score (not a pass/fail verdict) and writes a red-highlighted diff image on mismatch
get_captured_effectsRead the desktop notifications and portal open-URI requests the app emitted onto the session bus (mock D-Bus sinks). Requires capture_external_effects: true on start_session; effects have no AT-SPI projection, so this is the only way to assert on them
launch_secondary_instanceRelaunch the app with extra args in the same session env — a single-instance GApplication forwards the command line to the running primary; observe the primary’s reaction via wait_for_stdout_line/query

Selectors use XPath 1.0 against a snapshot of the AT-SPI tree serialized to XML, with role names normalized to PascalCase (e.g. push buttonButton). Example XPaths: //Button[@name='OK'], //Text[@name='search'], //MenuItem[contains(@name, 'Mode')], (//Button)[last()].

Each session produces output under a configurable report directory. Screenshots are written as {report_dir}/{session_id}/{session_id}-{n}.png — each session gets its own subdirectory and n increments per take_screenshot call. The base report_dir defaults to /tmp/waydriver and can be overridden with the --report-dir <PATH> CLI flag or the WAYDRIVER_REPORT_DIR environment variable. Individual start_session calls may also pass a report_dir argument to override the server default for that session.

Alongside the screenshots, each session writes:

  • {session_id}.webm — full-session VP8/WebM recording of the display at 15 fps, finalized with a seekhead on kill_session. On by default; disable per-server with --record-video false / WAYDRIVER_RECORD_VIDEO=false, or per-session with start_session’s record_video: false. Bitrate via --video-bitrate <bits/sec> / WAYDRIVER_VIDEO_BITRATE (default 2_000_000) or per-session video_bitrate.
  • events.jsonl — append-only audit log of every session-scoped tool call (action, params, ok/err status, timestamp) at {report_dir}/{session_id}/events.jsonl.
  • events.js — atomic rewrite of the same data as window.__events_update([...]) for consumption by the viewer.
  • index.html — styled viewer (Tailwind via the Play CDN) that embeds the recording in a <video> tag when present. Reloads events.js every 2 s via a <script src> swap (which works over file:// unlike fetch), append-only rendering so expanded <details> stay expanded across refreshes. Written once at session start.

start_session’s response includes a file:// URL to the session viewer — open it directly from the filesystem in any browser. No HTTP server, no ports, no network access required. Multiple waydriver-mcp instances (different Claude Code tabs / projects) can run side by side without conflict.

Why Docker?

waydriver-mcp needs ~8 system services at runtime (mutter, pipewire, wireplumber, dbus, AT-SPI, gstreamer). Installing these manually is fragile and distro-specific. Docker solves four problems:

  • Security — the MCP server spawns arbitrary processes, interacts with them via D-Bus, and captures their screen. Running this on your host session gives it access to everything your user can do. Inside a container, it only sees what you explicitly mount — no access to your files, browser sessions, or credentials. Add --network none to block network access entirely (the report viewer is purely static file://, so it works without any network)
  • Zero-setup distributiondocker pull and you’re running, no system packages to install
  • D-Bus isolation — each container gets its own dbus-daemon, so apps with singleton D-Bus activation don’t interfere across concurrent test sessions
  • ABI compatibility — apps built inside the container are guaranteed to link against the same libraries the MCP runtime uses

Prebuilt images are published to GitHub Container Registry for each release:

ImagePurpose
ghcr.io/bohdantkachenko/waydriver-mcpRuntime — MCP server with all system deps
ghcr.io/bohdantkachenko/waydriver-mcp-builderBuild env — Fedora 42 + Rust + gcc/g++ + meson + cmake + GTK4/GLib dev headers
docker pull ghcr.io/bohdantkachenko/waydriver-mcp:latest
docker pull ghcr.io/bohdantkachenko/waydriver-mcp-builder:latest

Use the builder image to compile your app in a Fedora environment that matches the runtime. The resulting binary is ABI-compatible with the runtime image. See Testing your app below for language-specific build examples.

MCP client config (e.g. .mcp.json for Claude Code):

{
  "mcpServers": {
    "waydriver-mcp": {
      "command": "sh",
      "args": ["-c", "docker run --rm -i --network none -v \"$PWD:/workspace:ro\" -v /tmp/waydriver:/tmp/waydriver ghcr.io/bohdantkachenko/waydriver-mcp:latest"]
    }
  }
}
  • $PWD:/workspace:ro — mounts the project directory so the MCP can launch your app binaries from /workspace/
  • /tmp/waydriver:/tmp/waydriver — makes session reports (screenshots, WebM recordings, events.jsonl, index.html) accessible on the host at /tmp/waydriver/. The mount uses the same path on both sides so the file:// URL that start_session returns is openable as-is on the host
  • --network none — safe to fully isolate: the report viewer is pure static HTML + JS loaded from your local filesystem

For NixOS users, also mount the Nix store so Nix-built binaries work inside the container:

{
  "mcpServers": {
    "waydriver-mcp": {
      "command": "sh",
      "args": ["-c", "docker run --rm -i --network none -v /nix/store:/nix/store:ro -v \"$PWD:/workspace:ro\" -v /tmp/waydriver:/tmp/waydriver ghcr.io/bohdantkachenko/waydriver-mcp:latest"]
    }
  }
}

Or build from source:

docker build -t waydriver-mcp .

Testing your app with waydriver-mcp

The MCP server is persistent — it stays up for the entire AI assistant session. You rebuild your app independently, and each start_session call picks up the latest binary from the volume. No MCP restart needed between iterations.

Rust apps — build with the builder image, volume-mount the binary:

docker run --rm -v "$PWD:/src:ro" -v "$PWD/build:/out" \
  ghcr.io/bohdantkachenko/waydriver-mcp-builder:latest \
  sh -c "cp -r /src /tmp/build && cd /tmp/build && cargo build --release && cp target/release/myapp /out/"
{
  "mcpServers": {
    "waydriver-mcp": {
      "command": "docker",
      "args": ["run", "--rm", "-i",
        "-v", "/path/to/myapp/build:/workspace:ro",
        "ghcr.io/bohdantkachenko/waydriver-mcp:latest"]
    }
  }
}

Then call start_session with command: "/workspace/myapp".

C/C++ apps — the builder image includes gcc, g++, meson, ninja-build, cmake, and GTK4/GLib dev headers:

docker run --rm -v "$PWD:/src:ro" -v "$PWD/build:/out" \
  ghcr.io/bohdantkachenko/waydriver-mcp-builder:latest \
  sh -c "cp -r /src /tmp/build && cd /tmp/build && meson setup _build && meson compile -C _build && cp _build/myapp /out/"

For extra deps (e.g. libadwaita-devel), extend the builder:

FROM ghcr.io/bohdantkachenko/waydriver-mcp-builder:latest
RUN dnf install -y libadwaita-devel

Node/Python apps — extend the runtime image to add the interpreter, use a named volume for deps:

FROM ghcr.io/bohdantkachenko/waydriver-mcp:latest
RUN dnf install -y nodejs && dnf clean all

Install deps into a named volume (re-run only when lockfile changes):

docker volume create myapp-nodemods
docker run --rm \
  -v "$PWD/package.json:/app/package.json:ro" \
  -v "$PWD/package-lock.json:/app/package-lock.json:ro" \
  -v "myapp-nodemods:/app/node_modules" \
  -w /app \
  ghcr.io/bohdantkachenko/waydriver-mcp-builder:latest \
  sh -c "dnf install -y nodejs npm && npm ci --omit=dev"

Mount source + deps — edit source freely, MCP picks up changes on next start_session:

"args": ["run", "--rm", "-i",
  "-v", "/path/to/myapp/src:/app/src:ro",
  "-v", "myapp-nodemods:/app/node_modules:ro",
  "myapp-mcp:latest"]

NixOS users — mount /nix/store so Nix-built binaries just work:

"args": ["run", "--rm", "-i",
  "-v", "/nix/store:/nix/store:ro",
  "-v", "/path/to/myapp:/workspace:ro",
  "ghcr.io/bohdantkachenko/waydriver-mcp:latest"]

Running with Nix

For local development without Docker, the Nix app wraps the binary with the required runtime env vars:

nix run .#mcp

Sessions are kept in an in-memory HashMap keyed by id, so multiple apps can run concurrently within one server process.