Agents Assemble is an intelligent memory management system that helps users capture, store, and retrieve important life events through natural conversation. The system utilizes a multi-agent architecture powered by Google's Generative AI to create a personal "life witness" that remembers events, people, and contexts with rich detail.
The Life Witness Agent allows users to:
- Store memories through natural conversation or by uploading photos
- Enrich memories with contextual information from calendar and email
- Ask follow-up questions to add more details to memories
- Query memories later using natural language
- Receive contextually rich responses that recall precise details
The system is designed as a collection of specialized AI agents orchestrated by a central planning and execution mechanism:
┌─────────────────┐ ┌───────────────┐ ┌────────────────┐
│ User Input │ │ Plan Creation │ │ Plan Execution │
│ ┌─────────────┐ │ │ ┌───────────┐ │ │ ┌────────────┐ │
│ │ Voice │─┼───────┼─▶ PlannerAgt│─┼───────┼─▶ Orchestrate│ │
│ └─────────────┘ │ │ └───────────┘ │ │ └────────────┘ │
│ ┌─────────────┐ │ │ │ │ │
│ │ Text │─┼───────┼───────────────┼───────┼─▶ │
│ └─────────────┘ │ │ │ │ │
│ ┌─────────────┐ │ │ │ │ ┌────────────┐ │
│ │ Photos │─┼───────┼───────────────┼───────┼─▶ Specialized│ │
│ └─────────────┘ │ │ │ │ │ Agents │ │
└─────────────────┘ └───────────────┘ └────────────────┘
- InputProcessor: Central entry point for all user requests, handles voice transcription, session management, and orchestrates the overall request flow.
- PlanExecutor: Orchestrates agent execution based on plans created by the PlannerAgent, managing sequential and parallel agent execution.
- SessionManager: Maintains conversation state, handles pending memories, and provides session persistence.
- PlannerAgent: The "brain" of the system that analyzes user intent and creates execution plans.
- MemoryAgent: Manages memory storage, continuation, completion, and retrieval using vector embeddings.
- VisionAgent: Analyzes images to extract visual information and enrich memories.
- VoiceAgent: Processes speech audio and converts it to text using Google Speech-to-Text.
- ContextAgent: Gathers contextual information from calendar and email to enrich memories.
- ResponseAgent: Generates natural language responses based on agent outputs.
- StorageService: Handles persistence of memories, embeddings, and session data.
- GeminiService: Provides access to Google Gemini AI models for various agent operations.
The PlannerAgent serves as the system's strategic brain, analyzing user input to determine intent and create optimal execution plans.
Key Features:
- Intent classification (store/query/continue/complete memory)
- Dynamic agent selection based on input type and intent
- AI-powered execution planning with fallback mechanisms
- Session state management and context awareness
The MemoryAgent manages the lifecycle of memory creation, enrichment, and retrieval using vector embeddings.
Key Features:
- Memory creation and structure generation
- AI-powered entity extraction
- Semantic search using vector embeddings
- Follow-up question generation for memory enhancement
- Memory completion and finalization
The VisionAgent analyzes images using Google's Gemini multimodal capabilities to extract visual information.
Key Features:
- Image analysis and description
- Object and person recognition
- Text extraction from images
- Visual context integration with memories
The VoiceAgent handles speech-to-text conversion for voice inputs.
Key Features:
- Audio processing and transcription
- Speaker recognition (planned)
- Emotion detection from voice (planned)
The ContextAgent enriches memories with contextual information from external sources.
Key Features:
- Calendar event integration
- Email context retrieval
- Temporal context analysis
- Location and environment context
The ResponseAgent generates natural language responses based on the outputs of other agents.
Key Features:
- Context-aware response generation
- Memory-based answer formulation
- Follow-up question generation
- Conversational continuity
┌─────────────┐
│ User │
└──────┬──────┘
│ ▲
▼ │
┌─────────────┐
│ InputProc │
└──────┬──────┘
│ ▲
▼ │
┌─────────────┐ ┌─────────────┐
│ PlannerAgt │────►│ PlanExecutor│
└─────────────┘ └──────┬──────┘
│ ▲
▼ │
┌────────────────────┬─┴────────────────────┐
│ │ │
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ MemoryAgt │ │ VisionAgt │ │ ContextAgt │
└─────────────┘ └─────────────┘ └─────────────┘
│ │ │
└────────────┬───────┴──────────────┬──────┘
│ │
▼ ▼
┌─────────────┐ ┌─────────────┐
│ ResponseAgt │ │ SessionMgr │
└─────────────┘ └─────────────┘
The frontend provides a user-friendly interface for interacting with the Life Witness Agent:
- VoiceInterface: Handles audio recording and playback
- PhotoUpload: Enables image uploads for memory enrichment
- MemoryTimeline: Displays memories in a chronological view
- MemoryCard: Renders individual memory details
- AgentStatus: Shows real-time agent activity
- Backend: Python, FastAPI
- Frontend: Next.js, React, TypeScript
- AI: Google Gemini API, Google Speech-to-Text
- Vector Storage: FAISS
- External Services: Google Calendar API, Gmail API
- Python 3.9+
- Node.js 18+
- Google Cloud account with Gemini API access
-
Clone the repository:
git clone https://github.com/YourUsername/agents-assemble.git cd agents-assemble -
Install backend dependencies:
cd backend pip install -r requirements.txt -
Set up environment variables:
cp .env.example .env # Edit .env with your API keys -
Install frontend dependencies:
cd ../frontend npm install
-
Start the backend server:
cd backend python main.py -
Start the frontend development server:
cd frontend npm run dev -
Open your browser and navigate to
http://localhost:3000
For detailed system flows and architecture, see the UML diagrams in the project:
agents-assemble-sequence-diagram.puml: Main sequential flowmemory-operations-sequence-diagram.puml: Memory operationssystem-architecture-diagram.puml: System components and relationships
This project is licensed under the MIT License - see the LICENSE file for details.
