Mote - An open-source ESP32-S3 voice companion for Clawd
ββββ ββββββββββββββββββ ββββββ βββ ββββββββββ ββββββ
βββββ ββββββββββββββββββββββββββββββ βββββββββββββββββββ
ββββββ βββββββββ βββββββββββββββββββ βββββββββββββββββββ
ββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
βββ βββββββββββββββββββββββββ βββββββββββββββ ββββββ βββ
βββ ββββββββββββββββββββ βββ βββ βββββββ βββ ββββββ βββ
L A B S
Connect to your personal AI powered by clawd.bot through a physical device with an animated face, voice interaction, and tactile controls.
Live Demo: Mote connects to your clawd.bot instance with real-time voice chat using Deepgram for speech-to-text and ElevenLabs for text-to-speech.
The Mote is a voice assistant companion device that brings your personal AI into the physical world. It features:
- π¨ Animated Face Display - 2" IPS LCD with expressive character that reacts to conversation
- π€ Real-time Voice Chat - Stream audio to the cloud for instant transcription via Deepgram
- π Natural TTS Responses - High-quality voice synthesis via ElevenLabs with buffered playback
- π§ AI-Powered Conversations - Connect to your clawd.bot instance
- π± Mobile App Configuration - Easy BLE setup via React Native app
- π Battery Powered - Portable with LiPo battery and USB-C charging
- π WiFi + WebSocket - Direct connection to your gateway server
| Feature | Technology | Description |
|---|---|---|
| Speech-to-Text | Deepgram Nova-2 | Real-time audio streaming with low latency transcription |
| Text-to-Speech | ElevenLabs | Natural voice synthesis (pcm_16000 format) |
| Audio Buffering | PSRAM Ring Buffer | 60-second buffer for smooth playback of long responses |
| Voice Detection | RMS Energy VAD | Automatic silence detection to trigger processing |
- Desk Companion - Desktop device (this firmware)
- Watch Companion - Wearable version (coming soon)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β VOICE CHAT FLOW β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββββββββββ
β Mote Device β β Gateway Server β β clawd.bot + Services β
β (ESP32-S3) β β (apps/web) β β β
ββββββββββββββββββββ€ ββββββββββββββββββββ€ βββββββββββββββββββββββββββ€
β β’ INMP441 Mic βββββββΊβ β’ WebSocket βββββββΊβ β’ Deepgram (STT) β
β β’ MAX98357A Amp ββββββββ β’ Voice Handler ββββββββ β’ ElevenLabs (TTS) β
β β’ ST7789 Display β β β’ Session Mgmt βββββββΊβ β’ clawd.bot (AI) β
β β’ Face Animation β β β β β
ββββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββββββββββ
β β
β BLE Config β API Keys
βΌ βΌ
ββββββββββββββββββββ ββββββββββββββββββββ
β Mobile App β β Environment β
β (apps/native) β β Variables β
ββββββββββββββββββββ€ ββββββββββββββββββββ€
β β’ WiFi Setup β β β’ DEEPGRAM_KEY β
β β’ Gateway Config β β β’ ELEVENLABS_KEY β
β β’ Device Pairing β β β’ GATEWAY_URL β
ββββββββββββββββββββ ββββββββββββββββββββ
Voice Chat Flow:
- Mote streams audio continuously over WebSocket to Gateway Server
- Gateway pipes audio to Deepgram for real-time transcription
- On wake word detection, server captures user's command
- Voice Activity Detection (VAD) on ESP32 detects end of speech
- Transcribed text sent to your clawd.bot instance for AI response
- Response synthesized via ElevenLabs TTS (pcm_16000 format)
- PCM audio streamed back to Mote over WebSocket
- Mote plays audio from 60-second PSRAM ring buffer
- Face animation updates based on conversation state (idle β listening β thinking β speaking)
| Component | Description | Link |
|---|---|---|
| ESP32-S3 N16R8 | DevKit with 16MB Flash, 8MB PSRAM | Amazon |
| 2" IPS LCD | ST7789 240Γ320 display | Amazon |
| INMP441 | I2S MEMS microphone | Amazon |
| MAX98357A | I2S 3W amplifier | Amazon |
| 3W Speakers | 4Ξ© mini speakers | Amazon |
| LiPo Battery | 3.7V rechargeable | Amazon |
| TP4056 Charger | USB-C LiPo charging module | Amazon |
| Resistor Kit | For voltage divider (100kΞ©) | Amazon |
| Breadboards | For prototyping | Amazon |
| PCB Boards | For permanent assembly | Amazon |
| Dupont Wires | Jumper wires for connections | Amazon |
Estimated cost: ~$50-75 for all components
Full wiring guide: See docs/DIAGRAM.md for complete wiring specification
-
Install PlatformIO
# Via VS Code: Install PlatformIO IDE extension # Or via CLI: pip install platformio
-
Clone this Repository
git clone https://github.com/nebaura-labs/mote-firmware.git cd mote-firmware -
Build and Upload
# Build the firmware pio run # Upload to your ESP32-S3 pio run -t upload # Monitor serial output pio device monitor # Or do it all at once pio run -t upload && pio device monitor
-
Configure WiFi/Bluetooth
- On first boot, Mote creates a WiFi AP for setup
- Connect via the Expo mobile app
- Link to your clawd.bot instance
This is a monorepo containing all Mote components:
mote/
βββ firmware/ # ESP32-S3 Firmware (PlatformIO, C++)
β βββ src/
β β βββ main.cpp # Main firmware entry point
β β βββ audio.cpp # I2S audio, ring buffer, playback task
β β βββ voice_client.cpp # WebSocket client for voice chat
β β βββ mote_face.cpp # Animated face display
β β βββ ble_config.cpp # BLE configuration service
β βββ include/ # Header files
β βββ docs/ # Hardware documentation
β βββ platformio.ini # PlatformIO configuration
β βββ CLAUDE.md # Firmware developer docs
β
βββ apps/
β βββ web/ # Gateway Server (TanStack Start + WebSocket)
β β βββ src/
β β β βββ websocket/ # Voice WebSocket handler
β β β β βββ voice-handler.ts # Deepgram + ElevenLabs integration
β β β βββ routes/ # API routes
β β βββ package.json
β β
β βββ native/ # React Native Mobile App (Expo)
β βββ app/ # Expo Router screens
β βββ lib/ # BLE client, Mote protocol
β βββ package.json
β
βββ packages/
β βββ api/ # Shared API services
β β βββ src/services/
β β βββ elevenlabs.ts # ElevenLabs TTS with PCM gain
β βββ shared/ # Shared types and constants
β
βββ package.json # Root workspace config (bun)
βββ turbo.json # Turborepo pipeline
βββ README.md # This file
Each user configures their own API keys through the mobile app settings. You'll need accounts with:
| Service | Purpose | Get Key |
|---|---|---|
| clawd.bot | AI backend for conversations | clawd.bot |
| Deepgram | Real-time speech-to-text | deepgram.com |
| ElevenLabs | Text-to-speech synthesis | elevenlabs.io |
Note: API keys are stored per-user in the database (encrypted), not as environment variables. Users enter their keys in the mobile app settings.
Create a .env file in apps/web/:
# apps/web/.env
# Database (Neon Postgres)
DATABASE_URL=postgresql://user:password@host:5432/database
# Authentication
BETTER_AUTH_SECRET=your-secret-here-at-least-32-characters-long
BETTER_AUTH_URL=http://localhost:3001
CORS_ORIGIN=http://localhost:8081
# Encryption key for storing user API keys securely
ENCRYPTION_KEY=your-encryption-key-here-at-least-32-characters
# Environment
NODE_ENV=development# Install dependencies (uses bun)
bun install
# Set up environment
cp apps/web/.env.example apps/web/.env
# Edit .env with your API keys# Run all dev servers (web + native)
bun run dev
# Build all packages
bun run build
# Run web app only
bun run dev:web
# Run native app only
bun run dev:nativeSee apps/web/README.md for full documentation.
# Start gateway server with WebSocket
bun run dev:web
# Server runs on http://localhost:3000
# WebSocket voice endpoint: ws://localhost:3000/voiceSee apps/native/README.md for full documentation.
# Start Expo dev server
bun run dev:native
# Or run directly
cd apps/native
bun start
bun ios # iOS simulator
bun android # Android emulatorSee firmware/CLAUDE.md for full documentation.
cd firmware
pio run # Build
pio run -t upload # Upload to device
pio device monitor # Serial monitor
pio run -t upload && pio device monitor # All at onceAll GPIO pins are defined in the firmware headers:
| Component | Pins | Notes |
|---|---|---|
| Display (SPI) | MOSI:11, CLK:13, CS:10, DC:9, RST:14, BL:8 | ST7789 240x320 |
| Microphone (I2S) | WS:39, SCK:40, SD:41 | INMP441 3.3V only |
| Amplifier (I2S) | BCLK:16, LRC:17, DIN:18 | MAX98357A 5V |
| Battery ADC | GPIO 2 | 100kΞ© voltage divider |
| RGB LED | GPIO 38 | NeoPixel status indicator |
See firmware/CLAUDE.md and docs/DIAGRAM.md for detailed wiring.
The firmware uses a PSRAM ring buffer for TTS playback:
// Audio buffer configuration (firmware/src/audio.cpp)
#define AUDIO_SAMPLE_RATE 16000 // 16kHz for voice
#define AUDIO_RING_BUFFER_SIZE (16000 * 60) // 60 seconds (~1MB in PSRAM)
#define VAD_THRESHOLD 300 // RMS energy threshold
#define VAD_HOLDOFF_MS 800 // Silence detection delayThe gateway server applies gain to ElevenLabs TTS output:
// TTS gain configuration (apps/web/src/websocket/voice-handler.ts)
const ttsResult = await synthesizeSpeech({
outputFormat: "pcm_16000", // 16kHz PCM for ESP32
useSpeakerBoost: true, // ElevenLabs speaker boost
gain: 1.5, // 1.5x volume boost (prevents clipping)
});- clawd.bot - Personal AI gateway backend
- Nebaura Labs - Official website and hardware kits
We welcome contributions! However, please note the license restrictions below.
Ways to contribute:
- π Report bugs and issues
- π‘ Suggest features and improvements
- π Improve documentation
- π§ Submit pull requests for bug fixes
- π¨ Design new face animations
Before contributing:
- Understand the hardware architecture (see CLAUDE.md)
- Check existing issues and PRs
- Test your changes on real hardware
- Follow the existing code style
Important: This project uses a source-available, non-commercial license.
You are free to:
- β Build your own Mote device for personal use
- β Modify the code for your own projects
- β Share the code with others (under same terms)
- β Use it for educational purposes
You may NOT:
- β Sell hardware devices with this firmware
- β Use this commercially without permission
- β Remove license or attribution
License: Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
See LICENSE for full terms.
Interested in selling Mote-based hardware or using this commercially?
Official Hardware: Purchase pre-built Mote devices from Nebaura Labs
Commercial Licensing: Contact us at [your-email@nebaura.studio] for commercial licensing options.
Build your own! All necessary components are listed in the Hardware Requirements section above.
Estimated cost: ~$165-175 for all components
Need help? Join our community or contact us for build support.
Not an engineer? Purchase ready-to-use Mote devices from Nebaura Labs:
- Fully assembled and tested
- Same open firmware (you own the code)
- Supports further development
- Hold BOOT button while resetting to enter download mode
- Check USB cable supports data (not just charging)
- Verify correct COM port selected
- Close serial monitor before uploading
- Check backlight pin (GPIO 8) is HIGH
- Verify SPI wiring matches pin configuration
- Test with simple TFT_eSPI example first
- Ensure amplifier has 5V power
- Verify I2S pins match configuration
- Check speaker polarity
- Check serial logs for
[Audio] Buffer underrunmessages
| Symptom | Cause | Solution |
|---|---|---|
| Loud noise on startup | Uninitialized buffer | Fixed in latest firmware (memset + bufferReady flag) |
| Choppy/stuttering | Buffer underruns | Increase AUDIO_RING_BUFFER_SIZE or check WiFi |
| Audio cuts off early | Buffer overflow | Buffer size is 60 seconds, increase if needed |
| Static/distortion | Gain too high | Reduce gain in voice-handler.ts (default 1.5x) |
| Quiet audio | Gain too low | Increase gain or enable useSpeakerBoost |
| Symptom | Cause | Solution |
|---|---|---|
| No transcription | Deepgram API key invalid | Check DEEPGRAM_API_KEY in .env |
| No TTS response | ElevenLabs key/voice invalid | Check ELEVENLABS_API_KEY and voice ID |
| WebSocket disconnects | Network issues | Check WiFi signal, gateway server logs |
| VAD not triggering | Threshold too high | Reduce VAD_THRESHOLD in audio.cpp |
| VAD always active | Threshold too low | Increase VAD_THRESHOLD (default 300) |
- Mote advertises as "Mote" when no WiFi config is saved
- Use the mobile app to configure WiFi and gateway settings
- Device restarts automatically after saving configuration
More help: Check the Issues or contact support
Built with:
- ESP-IDF and Arduino Framework by Espressif
- PlatformIO for firmware build system
- TFT_eSPI / LovyanGFX for display rendering
- Deepgram for real-time speech-to-text
- ElevenLabs for natural text-to-speech
- TanStack Start for the gateway server
- Expo and React Native for the mobile app
Inspired by the open hardware community and the vision of personal AI companions that respect privacy and user control.
- Website: nebaura.studio
- GitHub: github.com/nebaura-labs
- Issues: Report bugs here
βββββββββββββββββββββββββββββββββββββββββ
β β
β Built with β₯ by Nebaura Labs β
β Pittsburgh, PA β
β β
βββββββββββββββββββββββββββββββββββββββββ
Made with love for the open hardware and personal AI community.