Skip to content

NightTalkerMY/FinalYearProject

Repository files navigation

Intelligent Holographic AI for Retail

Python 3.10+ LLM Phi-2 Vector DB ChromaDB Vision ResNet STT Whisper TTS Coqui Frontend React Media FFmpeg

This repository contains the core orchestration and microservices for an interactive, AI-powered holographic retail assistant. The system utilizes a distributed microservice architecture, integrating large language models, retrieval-augmented generation, dynamic gesture control, speech processing, and a 3D React-based avatar.

🎥 Product Demonstration

Video demonstration of the Intelligent Holographic AI system in action to be uploaded soon!

🌟 Key Innovations & Contributions

While the foundational architecture builds upon established research, this project introduces system-level optimizations to satisfy the latency, accuracy, and responsiveness constraints of a real-time retail deployment:

RAG & LLM Pipeline Enhancements

  • Length-Aware Reranking: The cross-encoder reranking stage was optimized by introducing length-aware document arrangement prior to inference. This design minimizes padding inefficiencies, reducing overall inference latency while preserving retrieval quality. Performance was benchmarked against MS MARCO and custom retail datasets, maintaining strong Mean Reciprocal Rank (MRR) and Hit Rate metrics.
  • Instruction-Tuned Semantic Routing: Traditional precomputed query matching was replaced with a dynamic, instruction-tuned semantic routing mechanism. Incoming queries are encoded using a task-specific instruction function Φ with an instruction prefix (I_task) and compared directly against raw document embeddings. Evaluation on retail datasets showed measurable improvements in macro recall, F1 score, and precision, enabling more adaptive and context-aware retrieval.

Dynamic Gesture Control Enhancements

  • Real-Time Boxgate Logic: The baseline gesture capture pipeline was re-architected from a manual, keyboard-triggered termination model to a fully automated, continuous inference loop using custom boxgate logic. This enables real-time segmentation without user intervention.
  • Performance Optimization: By eliminating manual termination overhead, the system achieves higher gesture segmentation purity and lower latency variance, resulting in smoother interaction and improved perceptual continuity for the holographic avatar.

📊 Detailed Evaluation & Metrics For a comprehensive breakdown of the empirical data supporting these improvements—including MS MARCO benchmarks, retail dataset F1/precision scores, and latency tests—please refer to the experiment_metric.md file (coming soon).

🏗️ System Architecture & Microservices

The project is divided into specialized directories. Each acts as an independent microservice with its own virtual environment and dependencies, all communicating with the central main_orchestrator.py.

  • Chatbot_Phi2/: Core LLM engine directory. Contains code for fine-tuning and real-time inference, running as an independent main.py microservice.
  • Gesture_System/: Dynamic hand gesture control system utilizing ResNet. Handles both model training and real-time vision inference via its own main.py.
  • RAG/: Retrieval-Augmented Generation pipeline using ChromaDB for contextual memory and knowledge retrieval.
  • STT/: Speech-to-Text voice transcription layer powered by OpenAI Whisper.
  • TTS/: Text-to-Speech voice generation layer using Coqui TTS.
  • react_avatar/: Frontend 3D avatar rendering layer built with React.
  • mediamtx/: Contains the configuration files for real-time media routing and streaming.

📥 Prerequisites & External Dependencies

Before running the system, several external binaries and large model assets must be downloaded.

1. External Binaries

Download the following tools and place them in the root directory (or respective folder):

2. Hugging Face Assets (Models, Datasets & 3D Files)

Due to file size limits, datasets, fine-tuned models, and heavy 3D assets are hosted externally on Hugging Face: [INSERT_HUGGINGFACE_PROFILE_LINK]

Please download and place the following assets into their respective directories:

  • Chatbot_Phi2/: Download the specific datasets and model weights.
  • Gesture_System/: Download the ResNet training datasets and inference models.
  • react_avatar/: Download the public/ directory containing the rendered 3D avatar files and place it inside the frontend folder.

⚙️ Installation & Setup

Because this project uses a microservice architecture, each Python directory requires its own separate virtual environment.

Step 1: Setup Python Microservices

For each of the following directories (Chatbot_Phi2, Gesture_System, RAG, STT, TTS), navigate into the folder, create a virtual environment, and install its specific dependencies:

cd [Directory_Name]
python -m venv venv

# Activate the venv (Windows):
venv\Scripts\activate
# OR Activate the venv (Mac/Linux):
source venv/bin/activate

pip install -r requirements.txt
deactivate
cd ..

Step 2: Setup the React Avatar

Navigate to the frontend directory and install the Node packages:

cd react_avatar
npm install
cd ..

Step 3: Setup the Main Orchestrator

Finally, setup the root environment that ties everything together:

python -m venv venv
venv\Scripts\activate
pip install -r requirements.txt

🚀 Running the System

The entire microservice architecture is fully automated through the central orchestrator. You do not need to manually start each individual component.

To launch the complete Intelligent Holographic AI system:

  1. Open your terminal in the root directory.
  2. Ensure your root virtual environment is activated.
  3. Run the orchestrator:
python main_orchestrator.py

(Note: dummy_gesture_control.py and dummy_no_mic.py are provided at the root level for testing isolated orchestrator components without full hardware requirements).

📚 Acknowledgements & References

This project builds upon and significantly modifies concepts from the following academic research:

  • RAG & LLM Architecture: The foundational retrieval-augmented generation structure was inspired by TeleOracle: Fine-Tuned Retrieval-Augmented Generation With Long-Context Support for Networks (Alabbasi et al., IEEE Internet of Things Journal, 2025). In this repository, the architecture has been uniquely adapted and improved to support real-time retail microservices using Microsoft Phi-2 and ChromaDB.
  • Dynamic Gesture System: The core vision methodology is based on Skeleton-Based Real-Time Hand Gesture Recognition Using Data Fusion and Ensemble Multi-Stream CNN Architecture (Habib, Yusuf, & Moustafa, MDPI Technologies, 2025). The system has been modified and fine-tuned for specialized, real-time interactive avatar control using ResNet.

About

An intelligent hologram-based AI assistant for retail, developed as a Final Year Project to deliver interactive customer engagement and smart product guidance.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors