MCP Server
waydriver-mcp is a standalone binary that exposes the library over the Model Context Protocol, letting AI assistants (Claude Desktop, Claude Code, etc.) drive GTK4 apps in isolated headless sessions. It speaks JSON-RPC over stdio and constructs the Mutter backends internally — clients only see the high-level tools below.
| Tool | Purpose |
|---|---|
start_session | Spawn a headless Mutter session and launch a command inside it (optional report_dir, resolution, scale, isolate_settings, gsettings, record_video, video_bitrate, capture_external_effects overrides per session) |
list_sessions | List active session ids, app names, and Wayland displays |
kill_session | Tear down a session and clean up all child processes |
set_setting | Change a GSettings key on the running session live — rewrites the isolated keyfile in place so the app re-applies it via its changed handler (cursor, fonts, color-scheme, …) without a restart |
dump_tree | Dump the AT-SPI accessibility tree as XML — each node carries a _ref you can target with query/click/etc. |
query | Evaluate an XPath over the tree; returns every match’s role, name, attributes, and states |
click / double_click / right_click | Invoke an element’s primary / secondary / tertiary AT-SPI Action. Auto-waits for visibility + enablement. |
hover | Move the pointer to an element’s center — drives a real Wayland motion event so hover-state UI repaints |
drag_to | Press, move across an element’s center, release — full Wayland drag gesture |
drag_to_coords | Like drag_to, but release at raw screen-absolute (x, y) — drop onto empty space or off the source window (libadwaita tab drag-out and other “drop onto nothing” DnD) |
focus | Give keyboard focus to an element via AT-SPI Component::grab_focus |
set_text | Replace an editable element’s contents via EditableText (fast, requires the interface) |
fill | Focus + clear + type — fallback for widgets without EditableText (e.g. GtkTextView/GtkEntry). Tries AT-SPI Component::grab_focus first; widgets whose bridge doesn’t expose Component (the documented GTK4 case) fall back to a pointer click at the widget’s centre to drive focus through the input layer, the same way a user would. Set assume_focused: true to skip the whole focus step when the target is already focused. Supports caret_nav/select_all clear modes. |
select_option | Pick an entry from a Selection-interface container (combo box, list, …) by label or by index |
read_text | Read an element’s text via the Text interface |
read_value | Read an element’s AT-SPI Value (current/min/max) — a scrolled view’s offset, or a slider/progress/spin value |
scroll | Scroll a located area by wheel detents along an axis (parks the pointer over it first); pair with read_value to confirm the offset moved |
type_text | Type a string into the currently focused element through the input backend |
press_key | Press a named key or chord (Return, Ctrl+A, Shift+Tab, Escape, …) |
move_pointer | Move the pointer by a relative offset in logical pixels |
pointer_click | Press and release a pointer button (defaults to left click) |
take_screenshot | Capture a PNG via the keepalive ScreenCast stream and return its path |
compare_element_to_baseline | Crop an element and diff it against a committed reference PNG (perceptual CIEDE2000) — returns a diff score (not a pass/fail verdict) and writes a red-highlighted diff image on mismatch |
get_captured_effects | Read the desktop notifications and portal open-URI requests the app emitted onto the session bus (mock D-Bus sinks). Requires capture_external_effects: true on start_session; effects have no AT-SPI projection, so this is the only way to assert on them |
launch_secondary_instance | Relaunch the app with extra args in the same session env — a single-instance GApplication forwards the command line to the running primary; observe the primary’s reaction via wait_for_stdout_line/query |
Selectors use XPath 1.0 against a snapshot of the AT-SPI tree serialized to XML, with role names normalized to PascalCase (e.g. push button → Button). Example XPaths: //Button[@name='OK'], //Text[@name='search'], //MenuItem[contains(@name, 'Mode')], (//Button)[last()].
Each session produces output under a configurable report directory. Screenshots are written as {report_dir}/{session_id}/{session_id}-{n}.png — each session gets its own subdirectory and n increments per take_screenshot call. The base report_dir defaults to /tmp/waydriver and can be overridden with the --report-dir <PATH> CLI flag or the WAYDRIVER_REPORT_DIR environment variable. Individual start_session calls may also pass a report_dir argument to override the server default for that session.
Alongside the screenshots, each session writes:
{session_id}.webm— full-session VP8/WebM recording of the display at 15 fps, finalized with a seekhead onkill_session. On by default; disable per-server with--record-video false/WAYDRIVER_RECORD_VIDEO=false, or per-session withstart_session’srecord_video: false. Bitrate via--video-bitrate <bits/sec>/WAYDRIVER_VIDEO_BITRATE(default2_000_000) or per-sessionvideo_bitrate.events.jsonl— append-only audit log of every session-scoped tool call (action, params, ok/err status, timestamp) at{report_dir}/{session_id}/events.jsonl.events.js— atomic rewrite of the same data aswindow.__events_update([...])for consumption by the viewer.index.html— styled viewer (Tailwind via the Play CDN) that embeds the recording in a<video>tag when present. Reloadsevents.jsevery 2 s via a<script src>swap (which works overfile://unlikefetch), append-only rendering so expanded<details>stay expanded across refreshes. Written once at session start.
start_session’s response includes a file:// URL to the session viewer — open it directly from the filesystem in any browser. No HTTP server, no ports, no network access required. Multiple waydriver-mcp instances (different Claude Code tabs / projects) can run side by side without conflict.
Why Docker?
waydriver-mcp needs ~8 system services at runtime (mutter, pipewire, wireplumber, dbus, AT-SPI, gstreamer). Installing these manually is fragile and distro-specific. Docker solves four problems:
- Security — the MCP server spawns arbitrary processes, interacts with them via D-Bus, and captures their screen. Running this on your host session gives it access to everything your user can do. Inside a container, it only sees what you explicitly mount — no access to your files, browser sessions, or credentials. Add
--network noneto block network access entirely (the report viewer is purely staticfile://, so it works without any network) - Zero-setup distribution —
docker pulland you’re running, no system packages to install - D-Bus isolation — each container gets its own dbus-daemon, so apps with singleton D-Bus activation don’t interfere across concurrent test sessions
- ABI compatibility — apps built inside the container are guaranteed to link against the same libraries the MCP runtime uses
Running with Docker (recommended)
Prebuilt images are published to GitHub Container Registry for each release:
| Image | Purpose |
|---|---|
ghcr.io/bohdantkachenko/waydriver-mcp | Runtime — MCP server with all system deps |
ghcr.io/bohdantkachenko/waydriver-mcp-builder | Build env — Fedora 42 + Rust + gcc/g++ + meson + cmake + GTK4/GLib dev headers |
docker pull ghcr.io/bohdantkachenko/waydriver-mcp:latest
docker pull ghcr.io/bohdantkachenko/waydriver-mcp-builder:latest
Use the builder image to compile your app in a Fedora environment that matches the runtime. The resulting binary is ABI-compatible with the runtime image. See Testing your app below for language-specific build examples.
MCP client config (e.g. .mcp.json for Claude Code):
{
"mcpServers": {
"waydriver-mcp": {
"command": "sh",
"args": ["-c", "docker run --rm -i --network none -v \"$PWD:/workspace:ro\" -v /tmp/waydriver:/tmp/waydriver ghcr.io/bohdantkachenko/waydriver-mcp:latest"]
}
}
}
$PWD:/workspace:ro— mounts the project directory so the MCP can launch your app binaries from/workspace//tmp/waydriver:/tmp/waydriver— makes session reports (screenshots, WebM recordings,events.jsonl,index.html) accessible on the host at/tmp/waydriver/. The mount uses the same path on both sides so thefile://URL thatstart_sessionreturns is openable as-is on the host--network none— safe to fully isolate: the report viewer is pure static HTML + JS loaded from your local filesystem
For NixOS users, also mount the Nix store so Nix-built binaries work inside the container:
{
"mcpServers": {
"waydriver-mcp": {
"command": "sh",
"args": ["-c", "docker run --rm -i --network none -v /nix/store:/nix/store:ro -v \"$PWD:/workspace:ro\" -v /tmp/waydriver:/tmp/waydriver ghcr.io/bohdantkachenko/waydriver-mcp:latest"]
}
}
}
Or build from source:
docker build -t waydriver-mcp .
Testing your app with waydriver-mcp
The MCP server is persistent — it stays up for the entire AI assistant session. You rebuild your app independently, and each start_session call picks up the latest binary from the volume. No MCP restart needed between iterations.
Rust apps — build with the builder image, volume-mount the binary:
docker run --rm -v "$PWD:/src:ro" -v "$PWD/build:/out" \
ghcr.io/bohdantkachenko/waydriver-mcp-builder:latest \
sh -c "cp -r /src /tmp/build && cd /tmp/build && cargo build --release && cp target/release/myapp /out/"
{
"mcpServers": {
"waydriver-mcp": {
"command": "docker",
"args": ["run", "--rm", "-i",
"-v", "/path/to/myapp/build:/workspace:ro",
"ghcr.io/bohdantkachenko/waydriver-mcp:latest"]
}
}
}
Then call start_session with command: "/workspace/myapp".
C/C++ apps — the builder image includes gcc, g++, meson, ninja-build, cmake, and GTK4/GLib dev headers:
docker run --rm -v "$PWD:/src:ro" -v "$PWD/build:/out" \
ghcr.io/bohdantkachenko/waydriver-mcp-builder:latest \
sh -c "cp -r /src /tmp/build && cd /tmp/build && meson setup _build && meson compile -C _build && cp _build/myapp /out/"
For extra deps (e.g. libadwaita-devel), extend the builder:
FROM ghcr.io/bohdantkachenko/waydriver-mcp-builder:latest
RUN dnf install -y libadwaita-devel
Node/Python apps — extend the runtime image to add the interpreter, use a named volume for deps:
FROM ghcr.io/bohdantkachenko/waydriver-mcp:latest
RUN dnf install -y nodejs && dnf clean all
Install deps into a named volume (re-run only when lockfile changes):
docker volume create myapp-nodemods
docker run --rm \
-v "$PWD/package.json:/app/package.json:ro" \
-v "$PWD/package-lock.json:/app/package-lock.json:ro" \
-v "myapp-nodemods:/app/node_modules" \
-w /app \
ghcr.io/bohdantkachenko/waydriver-mcp-builder:latest \
sh -c "dnf install -y nodejs npm && npm ci --omit=dev"
Mount source + deps — edit source freely, MCP picks up changes on next start_session:
"args": ["run", "--rm", "-i",
"-v", "/path/to/myapp/src:/app/src:ro",
"-v", "myapp-nodemods:/app/node_modules:ro",
"myapp-mcp:latest"]
NixOS users — mount /nix/store so Nix-built binaries just work:
"args": ["run", "--rm", "-i",
"-v", "/nix/store:/nix/store:ro",
"-v", "/path/to/myapp:/workspace:ro",
"ghcr.io/bohdantkachenko/waydriver-mcp:latest"]
Running with Nix
For local development without Docker, the Nix app wraps the binary with the required runtime env vars:
nix run .#mcp
Sessions are kept in an in-memory HashMap keyed by id, so multiple apps can run concurrently within one server process.