Skip to content

Desktop AI assistant with voice input and chat interface. Multi-service architecture using OpenAI Response API.

License

Notifications You must be signed in to change notification settings

destorted93/ai-agent-desktop

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

52 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

AI Agent

License: MIT

Desktop AI assistant with voice input, chat interface, encrypted history, and powerful tools.

πŸš€ Quick Start

Double-click START_CLEAN.bat to launch everything:

  • 🎀 Transcribe Service (voice-to-text)
  • πŸ€– Agent Service (AI brain)
  • πŸ’¬ Widget (desktop interface)

Everything runs in the background - no terminal windows! The widget appears on your desktop ready to use.

What it does

A modular AI agent that:

  • Listens to your voice and transcribes it
  • Chats with you using GPT-5
  • Executes tasks through tools
  • Remembers context across sessions
  • Runs as separate services for stability

Architecture

Multi-service architecture - Each component runs independently:

Core Services

  • agent-main/ - Main AI agent (CLI + API modes)
  • agent/ - Agent core (OpenAI wrapper with streaming)
  • transcribe/ - Audio β†’ text conversion
  • widget/ - Desktop UI with voice/chat

Data & Tools

  • chat_history/ - Conversation persistence (encrypted)
  • memory/ - User context storage (encrypted)
  • secure_storage/ - Config, keychain secrets, and encrypted JSON helpers
  • tools/ - Agent capabilities (filesystem, web, todos, etc.)

Utilities

  • service-template/ - Boilerplate for new services

Features

🎀 Voice Input

  • Click to record
  • Auto-transcribe using Whisper
  • Multi-language support

πŸ’¬ Chat Interface

  • Type or speak your messages
  • Real-time streaming responses
  • Color-coded output (thinking, responses, function calls)
  • Screenshot sharing
  • Encrypted, persistent history

πŸ› οΈ Agent Capabilities

  • Files: Read, write, search, edit
  • Web: Search and scrape
  • Todos: Task management
  • Memory: Remember user preferences
  • Documents: Create Word files
  • Charts: Generate visualizations
  • Terminal: Run commands
  • Images: AI image generation

Installation

Option 1: Quick Install

INSTALL.bat

Option 2: Manual Install

pip install -r agent-main/requirements.txt
pip install -r transcribe/requirements.txt
pip install -r widget/requirements.txt

Running

Complete System (Recommended)

1. Install dependencies first:

INSTALL.bat

2. Launch the agent:

START_CLEAN.bat  # Clean launch - runs in background, no terminals

This is the main way to use the agent. Just close the widget to stop everything.

Alternative Launchers

START.bat        # Shows terminal windows (useful for debugging)

Individual Components

# Interactive CLI
python agent-main/app.py --mode interactive

# Agent API
python agent-main/app.py --mode service --port 6002

# Transcribe service
python transcribe/app.py

# Widget only
python widget/widget.py

Configuration

Set your OpenAI API key (either method works):

# Windows
$env:OPENAI_API_KEY = "sk-..."

# Or use the widget Settings (βš™) to save an API Token
# The token is stored in Windows Credential Manager under
#   Service: ai-agent-desktop/api_token, Username: api_token

Service Ports

  • 6001 - Transcribe service
  • 6002 - Agent service
  • Widget connects to both

Adding Features

Each service has its own README with details:

  • /agent-main/README.md - Main agent docs
  • /transcribe/README.md - Transcription service
  • /widget/README.md - Desktop widget
  • /tools/README.md - Available tools

Project Layout

ai-agent-desktop/
β”œβ”€β”€ INSTALL.bat         # Install all dependencies
β”œβ”€β”€ START_CLEAN.bat     # Launch everything (background, no terminals)
β”œβ”€β”€ START.bat           # Launch with visible terminals
β”œβ”€β”€ agent-main/         # Main AI agent
β”œβ”€β”€ agent/              # Core agent logic
β”œβ”€β”€ transcribe/         # Voice-to-text service
β”œβ”€β”€ widget/             # Desktop interface
β”œβ”€β”€ tools/              # Agent tools
β”œβ”€β”€ chat_history/       # Conversation storage
β”œβ”€β”€ memory/             # User context
β”œβ”€β”€ secure_storage/     # Shared secure storage helpers
└── service-template/   # New service boilerplate

Why Multi-Service Architecture?

  • Isolation: One crash doesn't kill everything
  • Resources: Distribute load across processes
  • Development: Work on parts independently
  • Scaling: Add more services easily

Set your OpenAI API key:

# PowerShell (permanent)
[System.Environment]::SetEnvironmentVariable('OPENAI_API_KEY', 'sk-...', 'User')

# Or just enter it when prompted by START.bat

Usage Guide

Getting Started

  1. Install: Run INSTALL.bat to install all dependencies
  2. Launch: Double-click START.bat to start all services
  3. Use: Click πŸ’¬ on the widget to open chat, or use voice recording

For detailed instructions, see LAUNCH_GUIDE.md

Widget Controls

  • β–Ά Start Recording - Record voice input
  • ⏹ Stop Recording - Stop and transcribe
  • πŸ’¬ Chat - Open/close chat window
  • βš™ Settings - Language selection, Base URL, and API Token

Chat Window

  • Type messages or use voice input
  • Real-time streaming responses
  • Persistent chat history
  • Color-coded display for different response types

For detailed chat features, see widget/CHAT_FEATURE.md

Documentation

API Documentation

When services are running, access interactive API docs:

Services Overview

Transcribe Service (Port 6001)

  • Audio transcription using OpenAI Whisper
  • Multi-language support
  • FastAPI-based REST API

Agent Service (Port 6002)

  • Conversational AI with GPT-5
  • Tool execution (file ops, web search, memory, todos)
  • Streaming responses
  • Chat history management

Widget (Desktop App)

  • Always-on-top interface
  • Voice recording with transcription
  • Chat window with encrypted, persistent history
  • Draggable, customizable position

Storage and Security

  • Chat history and memories are stored encrypted at:
    • %APPDATA%/ai-agent-desktop/chat_history.enc
    • %APPDATA%/ai-agent-desktop/memories.enc
  • Encryption key (data_key) is stored in Windows Credential Manager:
    • Service: ai-agent-desktop/data_key, Username: data_key
  • API Token saved from Settings is stored under:
    • Service: ai-agent-desktop/api_token, Username: api_token
  • See secure_storage/README.md for details.

Contributing

Contributions are welcome! Feel free to:

  • Report bugs or issues
  • Suggest new features
  • Submit pull requests
  • Improve documentation

License

This project is licensed under the MIT License - see the LICENSE file for details.

Author

destorted93

About

Desktop AI assistant with voice input and chat interface. Multi-service architecture using OpenAI Response API.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published