Agents Assemble - Life Witness Agent

Agents Assemble is an intelligent memory management system that helps users capture, store, and retrieve important life events through natural conversation. The system utilizes a multi-agent architecture powered by Google's Generative AI to create a personal "life witness" that remembers events, people, and contexts with rich detail.

Project Overview

The Life Witness Agent allows users to:

Store memories through natural conversation or by uploading photos
Enrich memories with contextual information from calendar and email
Ask follow-up questions to add more details to memories
Query memories later using natural language
Receive contextually rich responses that recall precise details

System Architecture

The system is designed as a collection of specialized AI agents orchestrated by a central planning and execution mechanism:

┌─────────────────┐       ┌───────────────┐       ┌────────────────┐
│   User Input    │       │ Plan Creation │       │ Plan Execution │
│ ┌─────────────┐ │       │ ┌───────────┐ │       │ ┌────────────┐ │
│ │   Voice     │─┼───────┼─▶ PlannerAgt│─┼───────┼─▶ Orchestrate│ │
│ └─────────────┘ │       │ └───────────┘ │       │ └────────────┘ │
│ ┌─────────────┐ │       │               │       │                │
│ │   Text      │─┼───────┼───────────────┼───────┼─▶              │
│ └─────────────┘ │       │               │       │                │
│ ┌─────────────┐ │       │               │       │ ┌────────────┐ │
│ │   Photos    │─┼───────┼───────────────┼───────┼─▶ Specialized│ │
│ └─────────────┘ │       │               │       │ │   Agents   │ │
└─────────────────┘       └───────────────┘       └────────────────┘

Core Modules

Backend Core

InputProcessor: Central entry point for all user requests, handles voice transcription, session management, and orchestrates the overall request flow.
PlanExecutor: Orchestrates agent execution based on plans created by the PlannerAgent, managing sequential and parallel agent execution.
SessionManager: Maintains conversation state, handles pending memories, and provides session persistence.

AI Agents

PlannerAgent: The "brain" of the system that analyzes user intent and creates execution plans.
MemoryAgent: Manages memory storage, continuation, completion, and retrieval using vector embeddings.
VisionAgent: Analyzes images to extract visual information and enrich memories.
VoiceAgent: Processes speech audio and converts it to text using Google Speech-to-Text.
ContextAgent: Gathers contextual information from calendar and email to enrich memories.
ResponseAgent: Generates natural language responses based on agent outputs.

Services & Storage

StorageService: Handles persistence of memories, embeddings, and session data.
GeminiService: Provides access to Google Gemini AI models for various agent operations.

Agent Descriptions

PlannerAgent

The PlannerAgent serves as the system's strategic brain, analyzing user input to determine intent and create optimal execution plans.

Key Features:

Intent classification (store/query/continue/complete memory)
Dynamic agent selection based on input type and intent
AI-powered execution planning with fallback mechanisms
Session state management and context awareness

MemoryAgent

The MemoryAgent manages the lifecycle of memory creation, enrichment, and retrieval using vector embeddings.

Key Features:

Memory creation and structure generation
AI-powered entity extraction
Semantic search using vector embeddings
Follow-up question generation for memory enhancement
Memory completion and finalization

VisionAgent

The VisionAgent analyzes images using Google's Gemini multimodal capabilities to extract visual information.

Key Features:

Image analysis and description
Object and person recognition
Text extraction from images
Visual context integration with memories

VoiceAgent

The VoiceAgent handles speech-to-text conversion for voice inputs.

Key Features:

Audio processing and transcription
Speaker recognition (planned)
Emotion detection from voice (planned)

ContextAgent

The ContextAgent enriches memories with contextual information from external sources.

Key Features:

Calendar event integration
Email context retrieval
Temporal context analysis
Location and environment context

ResponseAgent

The ResponseAgent generates natural language responses based on the outputs of other agents.

Key Features:

Context-aware response generation
Memory-based answer formulation
Follow-up question generation
Conversational continuity

Component Interactions

┌─────────────┐
│    User     │
└──────┬──────┘
       │ ▲
       ▼ │
┌─────────────┐
│ InputProc   │
└──────┬──────┘
       │ ▲
       ▼ │
┌─────────────┐     ┌─────────────┐
│  PlannerAgt │────►│ PlanExecutor│
└─────────────┘     └──────┬──────┘
                           │ ▲
                           ▼ │
      ┌────────────────────┬─┴────────────────────┐
      │                    │                      │
      ▼                    ▼                      ▼
┌─────────────┐     ┌─────────────┐       ┌─────────────┐
│  MemoryAgt  │     │  VisionAgt  │       │ ContextAgt  │
└─────────────┘     └─────────────┘       └─────────────┘
      │                    │                      │
      └────────────┬───────┴──────────────┬──────┘
                   │                      │
                   ▼                      ▼
            ┌─────────────┐       ┌─────────────┐
            │ ResponseAgt │       │ SessionMgr  │
            └─────────────┘       └─────────────┘

Sequence UML diagram

Frontend Components

The frontend provides a user-friendly interface for interacting with the Life Witness Agent:

VoiceInterface: Handles audio recording and playback
PhotoUpload: Enables image uploads for memory enrichment
MemoryTimeline: Displays memories in a chronological view
MemoryCard: Renders individual memory details
AgentStatus: Shows real-time agent activity

Technology Stack

Backend: Python, FastAPI
Frontend: Next.js, React, TypeScript
AI: Google Gemini API, Google Speech-to-Text
Vector Storage: FAISS
External Services: Google Calendar API, Gmail API

Getting Started

Prerequisites

Python 3.9+
Node.js 18+
Google Cloud account with Gemini API access

Installation

Clone the repository:

git clone https://github.com/YourUsername/agents-assemble.git
cd agents-assemble

Install backend dependencies:

cd backend
pip install -r requirements.txt

Set up environment variables:

cp .env.example .env
# Edit .env with your API keys

Install frontend dependencies:
```
cd ../frontend
npm install
```

Running the Application

Start the backend server:
```
cd backend
python main.py
```
Start the frontend development server:
```
cd frontend
npm run dev
```
Open your browser and navigate to http://localhost:3000

UML Diagrams

For detailed system flows and architecture, see the UML diagrams in the project:

agents-assemble-sequence-diagram.puml: Main sequential flow
memory-operations-sequence-diagram.puml: Memory operations
system-architecture-diagram.puml: System components and relationships

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
backend		backend
frontend		frontend
images		images
scripts		scripts
tests		tests
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
Agent-Assemble_UML.png		Agent-Assemble_UML.png
DEMO.md		DEMO.md
EXPLANATION.md		EXPLANATION.md
LICENSE		LICENSE
README.md		README.md
Test-audio.wav		Test-audio.wav
UML-DIAGRAMS-README.md		UML-DIAGRAMS-README.md
agents-assemble-sequence-diagram.puml		agents-assemble-sequence-diagram.puml
harvard.wav		harvard.wav
memory-operations-sequence-diagram.puml		memory-operations-sequence-diagram.puml
system-architecture-diagram.puml		system-architecture-diagram.puml
uml-diagrams-explanation.md		uml-diagrams-explanation.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agents Assemble - Life Witness Agent

Project Overview

System Architecture

Core Modules

Backend Core

AI Agents

Services & Storage

Agent Descriptions

PlannerAgent

MemoryAgent

VisionAgent

VoiceAgent

ContextAgent

ResponseAgent

Component Interactions

Sequence UML diagram

Frontend Components

Technology Stack

Getting Started

Prerequisites

Installation

Running the Application

UML Diagrams

License

About

Uh oh!

Releases

Packages

Languages

License

ShivaniNR/agents-assemble

Folders and files

Latest commit

History

Repository files navigation

Agents Assemble - Life Witness Agent

Project Overview

System Architecture

Core Modules

Backend Core

AI Agents

Services & Storage

Agent Descriptions

PlannerAgent

MemoryAgent

VisionAgent

VoiceAgent

ContextAgent

ResponseAgent

Component Interactions

Sequence UML diagram

Frontend Components

Technology Stack

Getting Started

Prerequisites

Installation

Running the Application

UML Diagrams

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages