Skip to content

Mote - An open-source ESP32-S3 voice companion for Clawd.bot

License

Notifications You must be signed in to change notification settings

Nebaura-Labs/mote

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

64 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Nebaura Labs Mote

Mote - An open-source ESP32-S3 voice companion for Clawd

 β–ˆβ–ˆβ–ˆβ•—   β–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ•—   β–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—
 β–ˆβ–ˆβ–ˆβ–ˆβ•—  β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β•β•β•β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—
 β–ˆβ–ˆβ•”β–ˆβ–ˆβ•— β–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•‘
 β–ˆβ–ˆβ•‘β•šβ–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β•  β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•‘
 β–ˆβ–ˆβ•‘ β•šβ–ˆβ–ˆβ–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•‘β•šβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•‘
 β•šβ•β•  β•šβ•β•β•β•β•šβ•β•β•β•β•β•β•β•šβ•β•β•β•β•β• β•šβ•β•  β•šβ•β• β•šβ•β•β•β•β•β• β•šβ•β•  β•šβ•β•β•šβ•β•  β•šβ•β•
                         L A B S

Connect to your personal AI powered by clawd.bot through a physical device with an animated face, voice interaction, and tactile controls.

Live Demo: Mote connects to your clawd.bot instance with real-time voice chat using Deepgram for speech-to-text and ElevenLabs for text-to-speech.


🎯 What is Mote?

The Mote is a voice assistant companion device that brings your personal AI into the physical world. It features:

  • 🎨 Animated Face Display - 2" IPS LCD with expressive character that reacts to conversation
  • 🎀 Real-time Voice Chat - Stream audio to the cloud for instant transcription via Deepgram
  • πŸ”Š Natural TTS Responses - High-quality voice synthesis via ElevenLabs with buffered playback
  • 🧠 AI-Powered Conversations - Connect to your clawd.bot instance
  • πŸ“± Mobile App Configuration - Easy BLE setup via React Native app
  • πŸ”‹ Battery Powered - Portable with LiPo battery and USB-C charging
  • 🌐 WiFi + WebSocket - Direct connection to your gateway server

Voice Chat Features

Feature Technology Description
Speech-to-Text Deepgram Nova-2 Real-time audio streaming with low latency transcription
Text-to-Speech ElevenLabs Natural voice synthesis (pcm_16000 format)
Audio Buffering PSRAM Ring Buffer 60-second buffer for smooth playback of long responses
Voice Detection RMS Energy VAD Automatic silence detection to trigger processing

Form Factors

  • Desk Companion - Desktop device (this firmware)
  • Watch Companion - Wearable version (coming soon)

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                              VOICE CHAT FLOW                                β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Mote Device    β”‚      β”‚   Gateway Server β”‚      β”‚   clawd.bot + Services  β”‚
β”‚   (ESP32-S3)     β”‚      β”‚   (apps/web)     β”‚      β”‚                         β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€      β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€      β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ β€’ INMP441 Mic    │─────►│ β€’ WebSocket      │─────►│ β€’ Deepgram (STT)        β”‚
β”‚ β€’ MAX98357A Amp  │◄─────│ β€’ Voice Handler  │◄─────│ β€’ ElevenLabs (TTS)      β”‚
β”‚ β€’ ST7789 Display β”‚      β”‚ β€’ Session Mgmt   │◄────►│ β€’ clawd.bot (AI)        β”‚
β”‚ β€’ Face Animation β”‚      β”‚                  β”‚      β”‚                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
        β”‚                         β”‚
        β”‚  BLE Config             β”‚  API Keys
        β–Ό                         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Mobile App     β”‚      β”‚   Environment    β”‚
β”‚   (apps/native)  β”‚      β”‚   Variables      β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€      β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ β€’ WiFi Setup     β”‚      β”‚ β€’ DEEPGRAM_KEY   β”‚
β”‚ β€’ Gateway Config β”‚      β”‚ β€’ ELEVENLABS_KEY β”‚
β”‚ β€’ Device Pairing β”‚      β”‚ β€’ GATEWAY_URL    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Voice Chat Flow:

  1. Mote streams audio continuously over WebSocket to Gateway Server
  2. Gateway pipes audio to Deepgram for real-time transcription
  3. On wake word detection, server captures user's command
  4. Voice Activity Detection (VAD) on ESP32 detects end of speech
  5. Transcribed text sent to your clawd.bot instance for AI response
  6. Response synthesized via ElevenLabs TTS (pcm_16000 format)
  7. PCM audio streamed back to Mote over WebSocket
  8. Mote plays audio from 60-second PSRAM ring buffer
  9. Face animation updates based on conversation state (idle β†’ listening β†’ thinking β†’ speaking)

πŸš€ Quick Start

Hardware Requirements

Component Description Link
ESP32-S3 N16R8 DevKit with 16MB Flash, 8MB PSRAM Amazon
2" IPS LCD ST7789 240Γ—320 display Amazon
INMP441 I2S MEMS microphone Amazon
MAX98357A I2S 3W amplifier Amazon
3W Speakers 4Ξ© mini speakers Amazon
LiPo Battery 3.7V rechargeable Amazon
TP4056 Charger USB-C LiPo charging module Amazon
Resistor Kit For voltage divider (100kΞ©) Amazon
Breadboards For prototyping Amazon
PCB Boards For permanent assembly Amazon
Dupont Wires Jumper wires for connections Amazon

Estimated cost: ~$50-75 for all components

Full wiring guide: See docs/DIAGRAM.md for complete wiring specification

Software Setup

  1. Install PlatformIO

    # Via VS Code: Install PlatformIO IDE extension
    # Or via CLI:
    pip install platformio
  2. Clone this Repository

    git clone https://github.com/nebaura-labs/mote-firmware.git
    cd mote-firmware
  3. Build and Upload

    # Build the firmware
    pio run
    
    # Upload to your ESP32-S3
    pio run -t upload
    
    # Monitor serial output
    pio device monitor
    
    # Or do it all at once
    pio run -t upload && pio device monitor
  4. Configure WiFi/Bluetooth

    • On first boot, Mote creates a WiFi AP for setup
    • Connect via the Expo mobile app
    • Link to your clawd.bot instance

πŸ“ Project Structure

This is a monorepo containing all Mote components:

mote/
β”œβ”€β”€ firmware/                 # ESP32-S3 Firmware (PlatformIO, C++)
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ main.cpp          # Main firmware entry point
β”‚   β”‚   β”œβ”€β”€ audio.cpp         # I2S audio, ring buffer, playback task
β”‚   β”‚   β”œβ”€β”€ voice_client.cpp  # WebSocket client for voice chat
β”‚   β”‚   β”œβ”€β”€ mote_face.cpp     # Animated face display
β”‚   β”‚   └── ble_config.cpp    # BLE configuration service
β”‚   β”œβ”€β”€ include/              # Header files
β”‚   β”œβ”€β”€ docs/                 # Hardware documentation
β”‚   β”œβ”€β”€ platformio.ini        # PlatformIO configuration
β”‚   └── CLAUDE.md             # Firmware developer docs
β”‚
β”œβ”€β”€ apps/
β”‚   β”œβ”€β”€ web/                  # Gateway Server (TanStack Start + WebSocket)
β”‚   β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”‚   β”œβ”€β”€ websocket/    # Voice WebSocket handler
β”‚   β”‚   β”‚   β”‚   └── voice-handler.ts  # Deepgram + ElevenLabs integration
β”‚   β”‚   β”‚   └── routes/       # API routes
β”‚   β”‚   └── package.json
β”‚   β”‚
β”‚   └── native/               # React Native Mobile App (Expo)
β”‚       β”œβ”€β”€ app/              # Expo Router screens
β”‚       β”œβ”€β”€ lib/              # BLE client, Mote protocol
β”‚       └── package.json
β”‚
β”œβ”€β”€ packages/
β”‚   β”œβ”€β”€ api/                  # Shared API services
β”‚   β”‚   └── src/services/
β”‚   β”‚       └── elevenlabs.ts # ElevenLabs TTS with PCM gain
β”‚   └── shared/               # Shared types and constants
β”‚
β”œβ”€β”€ package.json              # Root workspace config (bun)
β”œβ”€β”€ turbo.json                # Turborepo pipeline
└── README.md                 # This file

πŸ”‘ API Keys & Configuration

Each user configures their own API keys through the mobile app settings. You'll need accounts with:

Service Purpose Get Key
clawd.bot AI backend for conversations clawd.bot
Deepgram Real-time speech-to-text deepgram.com
ElevenLabs Text-to-speech synthesis elevenlabs.io

Note: API keys are stored per-user in the database (encrypted), not as environment variables. Users enter their keys in the mobile app settings.

Server Environment Variables

Create a .env file in apps/web/:

# apps/web/.env

# Database (Neon Postgres)
DATABASE_URL=postgresql://user:password@host:5432/database

# Authentication
BETTER_AUTH_SECRET=your-secret-here-at-least-32-characters-long
BETTER_AUTH_URL=http://localhost:3001
CORS_ORIGIN=http://localhost:8081

# Encryption key for storing user API keys securely
ENCRYPTION_KEY=your-encryption-key-here-at-least-32-characters

# Environment
NODE_ENV=development

πŸ”§ Development

Setup

# Install dependencies (uses bun)
bun install

# Set up environment
cp apps/web/.env.example apps/web/.env
# Edit .env with your API keys

Monorepo Commands

# Run all dev servers (web + native)
bun run dev

# Build all packages
bun run build

# Run web app only
bun run dev:web

# Run native app only
bun run dev:native

Gateway Server (apps/web)

See apps/web/README.md for full documentation.

# Start gateway server with WebSocket
bun run dev:web
# Server runs on http://localhost:3000
# WebSocket voice endpoint: ws://localhost:3000/voice

Mobile App (apps/native)

See apps/native/README.md for full documentation.

# Start Expo dev server
bun run dev:native

# Or run directly
cd apps/native
bun start
bun ios      # iOS simulator
bun android  # Android emulator

Firmware (ESP32-S3)

See firmware/CLAUDE.md for full documentation.

cd firmware
pio run                    # Build
pio run -t upload          # Upload to device
pio device monitor         # Serial monitor
pio run -t upload && pio device monitor  # All at once

Pin Configuration

All GPIO pins are defined in the firmware headers:

Component Pins Notes
Display (SPI) MOSI:11, CLK:13, CS:10, DC:9, RST:14, BL:8 ST7789 240x320
Microphone (I2S) WS:39, SCK:40, SD:41 INMP441 3.3V only
Amplifier (I2S) BCLK:16, LRC:17, DIN:18 MAX98357A 5V
Battery ADC GPIO 2 100kΞ© voltage divider
RGB LED GPIO 38 NeoPixel status indicator

See firmware/CLAUDE.md and docs/DIAGRAM.md for detailed wiring.

Audio System Configuration

The firmware uses a PSRAM ring buffer for TTS playback:

// Audio buffer configuration (firmware/src/audio.cpp)
#define AUDIO_SAMPLE_RATE       16000   // 16kHz for voice
#define AUDIO_RING_BUFFER_SIZE  (16000 * 60)  // 60 seconds (~1MB in PSRAM)
#define VAD_THRESHOLD           300     // RMS energy threshold
#define VAD_HOLDOFF_MS          800     // Silence detection delay

The gateway server applies gain to ElevenLabs TTS output:

// TTS gain configuration (apps/web/src/websocket/voice-handler.ts)
const ttsResult = await synthesizeSpeech({
  outputFormat: "pcm_16000",  // 16kHz PCM for ESP32
  useSpeakerBoost: true,      // ElevenLabs speaker boost
  gain: 1.5,                  // 1.5x volume boost (prevents clipping)
});

πŸ”— Related Projects


πŸ› οΈ Contributing

We welcome contributions! However, please note the license restrictions below.

Ways to contribute:

  • πŸ› Report bugs and issues
  • πŸ’‘ Suggest features and improvements
  • πŸ“ Improve documentation
  • πŸ”§ Submit pull requests for bug fixes
  • 🎨 Design new face animations

Before contributing:

  1. Understand the hardware architecture (see CLAUDE.md)
  2. Check existing issues and PRs
  3. Test your changes on real hardware
  4. Follow the existing code style

πŸ“œ License

Important: This project uses a source-available, non-commercial license.

For Personal Use (Free)

You are free to:

  • βœ… Build your own Mote device for personal use
  • βœ… Modify the code for your own projects
  • βœ… Share the code with others (under same terms)
  • βœ… Use it for educational purposes

Restrictions

You may NOT:

  • ❌ Sell hardware devices with this firmware
  • ❌ Use this commercially without permission
  • ❌ Remove license or attribution

License: Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)

See LICENSE for full terms.

Commercial Use

Interested in selling Mote-based hardware or using this commercially?

Official Hardware: Purchase pre-built Mote devices from Nebaura Labs

Commercial Licensing: Contact us at [your-email@nebaura.studio] for commercial licensing options.


πŸͺ Get Hardware

DIY Builders

Build your own! All necessary components are listed in the Hardware Requirements section above.

Estimated cost: ~$165-175 for all components

Need help? Join our community or contact us for build support.

Pre-Assembled Kits

Not an engineer? Purchase ready-to-use Mote devices from Nebaura Labs:

  • Fully assembled and tested
  • Same open firmware (you own the code)
  • Supports further development

πŸ› Troubleshooting

Upload Issues

  • Hold BOOT button while resetting to enter download mode
  • Check USB cable supports data (not just charging)
  • Verify correct COM port selected
  • Close serial monitor before uploading

Display Not Working

  • Check backlight pin (GPIO 8) is HIGH
  • Verify SPI wiring matches pin configuration
  • Test with simple TFT_eSPI example first

No Audio Output

  • Ensure amplifier has 5V power
  • Verify I2S pins match configuration
  • Check speaker polarity
  • Check serial logs for [Audio] Buffer underrun messages

Audio Issues

Symptom Cause Solution
Loud noise on startup Uninitialized buffer Fixed in latest firmware (memset + bufferReady flag)
Choppy/stuttering Buffer underruns Increase AUDIO_RING_BUFFER_SIZE or check WiFi
Audio cuts off early Buffer overflow Buffer size is 60 seconds, increase if needed
Static/distortion Gain too high Reduce gain in voice-handler.ts (default 1.5x)
Quiet audio Gain too low Increase gain or enable useSpeakerBoost

Voice Chat Issues

Symptom Cause Solution
No transcription Deepgram API key invalid Check DEEPGRAM_API_KEY in .env
No TTS response ElevenLabs key/voice invalid Check ELEVENLABS_API_KEY and voice ID
WebSocket disconnects Network issues Check WiFi signal, gateway server logs
VAD not triggering Threshold too high Reduce VAD_THRESHOLD in audio.cpp
VAD always active Threshold too low Increase VAD_THRESHOLD (default 300)

BLE Configuration

  • Mote advertises as "Mote" when no WiFi config is saved
  • Use the mobile app to configure WiFi and gateway settings
  • Device restarts automatically after saving configuration

More help: Check the Issues or contact support


πŸ™ Acknowledgments

Built with:

  • ESP-IDF and Arduino Framework by Espressif
  • PlatformIO for firmware build system
  • TFT_eSPI / LovyanGFX for display rendering
  • Deepgram for real-time speech-to-text
  • ElevenLabs for natural text-to-speech
  • TanStack Start for the gateway server
  • Expo and React Native for the mobile app

Inspired by the open hardware community and the vision of personal AI companions that respect privacy and user control.


πŸ“§ Contact


     ╔═══════════════════════════════════════╗
     β•‘                                       β•‘
     β•‘   Built with β™₯ by Nebaura Labs        β•‘
     β•‘   Pittsburgh, PA                      β•‘
     β•‘                                       β•‘
     β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•

Made with love for the open hardware and personal AI community.

About

Mote - An open-source ESP32-S3 voice companion for Clawd.bot

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published