GitHub - droxey/agentadventure: OpenClaw skill that drops AI agents into self-hosted WorkAdventure as real avatars — with movement, proximity chat, and experimental voice.

🎮 AgentAdventure
OpenClaw skill that drops AI agents into self-hosted WorkAdventure as real avatars — with movement, proximity chat, and experimental voice.

Quick Start

Prerequisite: A running WorkAdventure instance with LiveKit enabled.

Required API Keys

Key	Purpose	Required	Where to Get
`WA_URL`	Your WorkAdventure instance URL	Yes	Your self-hosted WA deployment
`WA_BOT_NAME`	Display name for the bot avatar	Yes	Any string you choose
`ELEVENLABS_API_KEY`	Text-to-speech for voice chat	Voice only	elevenlabs.io
`DEEPGRAM_API_KEY`	Speech-to-text for voice chat	Voice only	deepgram.com

1. Install OpenClaw

npm install -g openclaw@latest
openclaw gateway start   # starts gateway; creates ~/.openclaw/ on first run

2. Install the Skill

# Option A: From ClawHub (once published)
clawdhub install agentadventure

# Option B: Manual (during development)
mkdir -p ~/.openclaw/skills/agentadventure
# Copy SKILL.md, runner.ts, bridge.ts into the folder
cd ~/.openclaw/skills/agentadventure && npx playwright install chromium

# Verify:
openclaw skills list --eligible

3. Configure & Run

Add the skill entry to ~/.openclaw/openclaw.json:

{
  "skills": {
    "entries": {
      "agentadventure": {
        "enabled": true,
        "env": {
          "WA_URL": "http://play.workadventure.localhost/",
          "WA_BOT_NAME": "AgentBot"
        }
      }
    }
  }
}

For voice support, also add the voice-call skill entry:

{
  "skills": {
    "entries": {
      "agentadventure": { "enabled": true, "env": { "WA_URL": "...", "WA_BOT_NAME": "AgentBot" } },
      "voice-call": {
        "enabled": true,
        "env": {
          "ELEVENLABS_API_KEY": "your-key-here",
          "DEEPGRAM_API_KEY": "your-key-here"
        }
      }
    }
  }
}

Then start the gateway:

openclaw gateway start

Verify by joining the WA map — the agent avatar should appear and respond to proximity chat.

Overview

AgentAdventure is an OpenClaw skill that enables AI agents to appear as visible avatars in a self-hosted WorkAdventure virtual office. Each agent runs inside a headless Chromium browser controlled by Playwright, interacting with the WA Scripting API for movement, proximity chat, and experimental voice conversations — all without modifying the WorkAdventure backend.

Agents enter WorkAdventure the same way a human would: through the anonymous login flow (display name → Woka avatar picker → map entry). Once inside, injected scripts bridge WA events back to the OpenClaw gateway, where agent logic generates responses and sends commands. Matrix provides fallback messaging for non-proximity interactions and multi-agent coordination.

The entire skill is a single folder (~/.openclaw/skills/agentadventure/) deployable via clawdhub install agentadventure or manual placement.

Features

Avatar Presence — Agents appear as real WA users with visible avatars, movement via WA.player.moveTo(), and full participation in proximity bubbles.
Proximity Chat — Bidirectional text chat using WA.chat.sendChatMessage / onChatMessage with ’bubble’ scope and typing indicators.
Player Tracking — Detects nearby players via WA.players.onPlayerEnters / onPlayerLeaves (with configureTracking()), plus bubble lifecycle via proximityMeeting.onJoin().
Voice (Experimental) — STT/TTS pipeline through WA’s listenToAudioStream / startAudioStream APIs, bridged to OpenClaw voice skills (ElevenLabs, Deepgram). Falls back to text on failure.
Matrix Fallback — Leverages WA’s native Matrix bridge for global messaging, room sync, and non-proximity interactions via OpenClaw’s existing Matrix channel.
Error Recovery — Retry wrapper (3 attempts) on all operations, auto-restart on browser crash, voice→text fallback chain. All errors are non-fatal and logged.
No Backend Mods — Pure client-side automation via Playwright. Zero changes to WorkAdventure server code.

Architecture

The skill spawns a Playwright browser session per agent, injects WA Scripting API event listeners, and bridges callbacks to the OpenClaw gateway via page.exposeFunction. Outbound commands (move, chat, voice) flow from agent logic through page.evaluate() calls.

graph TB
    subgraph OpenClaw["OpenClaw Platform"]
        GW[Gateway<br/>Session Mgmt]
        SK[Skill Runner<br/>AgentAdventure]
        VS[Voice Skill<br/>STT/TTS Pipeline]
        MX[Matrix Channel<br/>Chat Fallback]
    end

    subgraph Browser["Playwright Browser (Headless)"]
        PW[Playwright Controller]
        INJ[Injected Scripts<br/>Event Listeners]
    end

    subgraph WA["WorkAdventure v1.28.9"]
        WAC[WA Client<br/>Scripting API]
        AV[Bot Avatar]
        PRX[Proximity Bubble<br/>Chat / Voice]
        LK[LiveKit<br/>Audio Streams]
    end

    GW --> SK
    SK --> PW
    PW --> WAC
    WAC --> AV
    WAC --> PRX
    WAC --> LK
    INJ --> PW
    PW --> GW
    VS <--> SK
    MX <--> GW

Command Flow (Outbound)

Agent logic sends a command (move/chat/voice) → OpenClaw gateway routes it to the AgentAdventure skill → Playwright calls page.evaluate() → WA Scripting API executes the action (avatar moves, message appears in bubble).

Event Flow (Inbound)

A human enters a proximity bubble → WA fires proximityMeeting.onJoin → injected listener calls window.onWAEvent('join', users) → Playwright bridges the callback to the gateway → agent logic processes and responds. The same pattern applies to chat messages (onChatMessage) and audio buffers (listenToAudioStream).

Voice Pipeline

Incoming audio flows through WA’s listenToAudioStream (Float32Array buffers) → an injected listener collects buffers → STT (Deepgram/ElevenLabs) transcribes → agent LLM generates a response → TTS synthesizes audio → startAudioStream / appendAudioData sends it back to the bubble.

⚠️ WA blog documents PCM16 at 24kHz converted to Float32 for the Web Audio API. Verify the actual sampleRate parameter from WA source before hardcoding.

Error & Recovery

Every operation follows a retry → fallback → restart chain. Transient failures retry up to 3 times. Voice failures drop to text chat. Browser crashes trigger an automatic session restart. Non-recoverable errors notify the gateway without crashing the skill.

Matrix Integration

WA’s native Matrix bridge syncs proximity bubbles to Matrix rooms. The OpenClaw Matrix channel handles m.room.message events for fallback/global messaging and multi-agent coordination outside proximity range.

File Structure

~/.openclaw/skills/agentadventure/
├── SKILL.md          # Skill definition — YAML frontmatter + usage instructions
├── runner.ts         # Playwright session: launch, anonymous login, lifecycle, retry
├── bridge.ts         # Event bridge: WA Scripting API ↔ OpenClaw agent logic
├── voice.ts          # Voice pipeline: listenToAudioStream → STT → LLM → TTS → startAudioStream
├── utils.ts          # Shared helpers: retryOp, parseCoords, getMessage, rate limiting
└── __tests__/
    ├── runner.test.ts
    ├── bridge.test.ts
    └── voice.test.ts

Configuration lives in ~/.openclaw/openclaw.json under skills.entries.agentadventure. OpenClaw skills are SKILL.md folders — there is no plugin.json.

Usage

Once deployed, the agent uses the skill when instructed to join WorkAdventure. The skill handles the full lifecycle:

Session launch — Playwright opens a headless Chromium, navigates to the WA URL, completes anonymous login (enters name, confirms Woka avatar, waits for game canvas).
Event injection — Bridge injects listeners for proximity (onJoin, onPlayerEnters/Leaves), chat (onChatMessage with bubble scope), and optionally voice (listenToAudioStream).
Bidirectional interaction — Inbound WA events are bridged to the agent via page.exposeFunction; outbound commands execute via page.evaluate (move, chat with typing indicators, voice).
Recovery — Failures retry up to 3 times; voice falls back to text; browser crashes trigger auto-restart.

Logs are available via openclaw logs and docker logs for WA containers.

For multiple agents, scale with Kubernetes/Helm and limit browsers via environment variables.

Voice Integration

Voice support is experimental and depends on WA’s startAudioStream / listenToAudioStream APIs.

The pipeline works as follows: incoming audio from the WA bubble arrives as Float32Array buffers via listenToAudioStream. These buffers are collected and sent to a STT provider (Deepgram or ElevenLabs). The transcription feeds into the agent LLM, which generates a response. That response is synthesized via TTS and streamed back through startAudioStream / appendAudioData.

On any voice failure (STT timeout, TTS error, stream routing issue), the skill automatically drops to text chat. Headless audio routing uses Playwright’s --use-fake-device-for-media-stream flag; LiveKit handles the WebRTC transport.

Security

Authentication & Authorization

Encrypt API keys at rest via OpenClaw skills.entries.*.env / skills.entries.*.apiKey; rotate every 90 days.
Scope session tokens per agent with JWT claims; expire after 1 hour of inactivity.
Enforce role-based access — deny on agentId mismatch.

API & Network

Rate-limit exposed functions (10 calls/sec per agent); validate with Joi schemas.
Secure WebSocket bridges (wss://, bearer tokens).
Isolate Docker networks; firewall whitelists trusted domains only (workadventure.io, elevenlabs.io).
Prometheus monitoring; Grafana dashboards; alert on anomalies >50%.

Dependencies & Compliance

Weekly npm audit / Snyk scans; remediate high-severity vulns within 7 days.
Quarterly API key rotation (ElevenLabs, Deepgram).
ESLint-security + OWASP checks in CI/CD; annual pen testing with Burp Suite.

Risks & Mitigations

Core

Risk	Mitigation	Verification
Playwright instability / browser crashes	Docker sandbox; auto-restart sessions	Log “Session restarted after crash”
WA Scripting API is client-only (no server bots)	Full browser automation; Matrix fallback	Dry-run script injection; compare manual vs. automated
Perf overhead (browser per agent)	Limit agents; lightweight Chromium	Benchmark CPU/mem; prove <20% overhead
Credentials exposure	Gateway permissions; encrypt keys	Audit logs; no leaks in tests

Proximity & Events

Risk	Mitigation	Verification
Event drops in automated browser	RxJS subs with retries; websocket bridge	Sim bubble join/leave; 100% capture in logs
Bubble scope limits (no history on join)	Agent state tracks context; fetch players on join	Test msg before/after join; agent ignores pre-join
Flaky tests/timeouts	Auto-wait assertions; retries on transients	Induce delay → retry logs success
WA script load errors (CORS)	Console listener + restart	Sim bad script → log/catch/restart

Voice

Risk	Mitigation	Verification
Headless audio routing fails	Fake streams for tests; LiveKit node SDK bridge	Log stream capture/playback; compare manual vs. agent
High latency in STT/TTS	Low-latency providers (Deepgram); cache responses	Measure e2e <500ms vs. WA native (~200ms)
Audio leaks	Encrypt streams; scope voice perms	Audit no external sends without consent
Experimental voice APIs unstable	Fallback to text; monitor WA docs/GitHub	Test stream start/listen; logs show buffers

Verification Strategy

Unit: Vitest on runner, bridge, and voice modules.
E2E: Full flow against WA v1.28.9 Docker — human-agent proximity chat roundtrip.
Voice E2E: Fake audio stream → STT → LLM → TTS → verify playback; test fallback on failure.
Error E2E: Kill browser mid-session → verify auto-restart and recovery logs; >95% uptime.
Performance: <20% CPU/mem overhead per agent; voice latency <500ms end-to-end.

Troubleshooting

Issue	Fix
Browser crash	Check Playwright logs; restart the gateway (`openclaw gateway start`)
Login failure	WA anonymous login: verify name input selector; test in non-headless mode; increase timeouts in `runner.ts`
Missed proximity events	Inspect injected script; ensure `configureTracking()` is called; sim with manual joins; fallback to Matrix
Voice latency	Test STT/TTS providers; cache responses; fallback to text on >500ms
Matrix sync issues	Confirm WA Matrix bridge config; check OpenClaw channel perms; resync rooms
High CPU	Limit to <5 agents per browser; use `headless: false` for debug; monitor with `top`/`htop`
Skill not eligible	Run `openclaw skills list --eligible`; check `requires.bins` are on PATH; restart gateway
General	Enable verbose logging; check WA/OpenClaw docs and GitHub issues

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github		.github
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
PLAN.md		PLAN.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Quick Start

Required API Keys

1. Install OpenClaw

2. Install the Skill

3. Configure & Run

Table of Contents

Overview

Features

Architecture

Command Flow (Outbound)

Event Flow (Inbound)

Voice Pipeline

Error & Recovery

Matrix Integration

File Structure

Usage

Voice Integration

Security

Authentication & Authorization

API & Network

Dependencies & Compliance

Risks & Mitigations

Core

Proximity & Events

Voice

Verification Strategy

Troubleshooting

References

About

Uh oh!

Releases

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Quick Start

Required API Keys

1. Install OpenClaw

2. Install the Skill

3. Configure & Run

Table of Contents

Overview

Features

Architecture

Command Flow (Outbound)

Event Flow (Inbound)

Voice Pipeline

Error & Recovery

Matrix Integration

File Structure

Usage

Voice Integration

Security

Authentication & Authorization

API & Network

Dependencies & Compliance

Risks & Mitigations

Core

Proximity & Events

Voice

Verification Strategy

Troubleshooting

References

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Uh oh!

Contributors

Uh oh!