A retrieval QA solution for the WPVH Eindhoven AI chat, using real-time sensor data. This application is powered by the Mistral LLM model served via Ollama.
Check out the video demo: Watch Here
- Real-time Data QA: Chat with an AI that has access to the latest sensor data.
- Local LLM: Uses a locally hosted Mistral model with Ollama for privacy and control.
- Extensible: Built with modular components using Flask and LangChain.
- Vector Storage: Utilizes ChromaDB for efficient similarity searches on sensor data.
- Backend: Python, Flask
- LLM & Tooling: Ollama, LangChain, Mistral
- Database: ChromaDB (for vector storage)
- Deployment: Docker
Follow these steps to get the project up and running on your local machine.
- Python 3.x
- Ollama: Install from Ollama.ai.
- For Windows users: Use the Docker version of Ollama. Tutorial here.
Note: This project was developed before Ollama had a native Windows client. While a native client is now available, these instructions are based on the Docker setup.
- For Windows users: Use the Docker version of Ollama. Tutorial here.
-
Clone the Repository
git clone <repository-url> cd vHUB_QA_withSensorData
Replace
<repository-url>with the actual URL of this repository. -
Install Dependencies
pip install -r requirements.txt
-
Set Up Ollama Model (Skip for Windows) Create a system prompt for the LLM model:
ollama create vhubAgent -f ./modelfile
-
Run Ollama
- For Docker:
docker exec -it ollama ollama run vhubAgent - For Windows (with Docker):
docker exec -it ollama ollama run mistral
- For Docker:
-
Enter Credentials Open
keys.pyand enter your username and password for the vhub delta API. -
Start the Servers
- Data Server: Open a terminal and run:
python dataServerWithRoomsCSVNew.py
- Retrieval QA Server: Open a new terminal and run:
python ollamaWithDataCSV.py
- Data Server: Open a terminal and run:
🎉 Your chat application is now live at http://localhost:5003.
- Clear Data: Use the
clear databutton in the UI to prevent overwhelming the data server. The dataframe will need some time to repopulate with new values. - Threading: The application uses a semaphore to limit concurrent threads to 100 for fetching sensor data. This can be adjusted in
dataServerWithRoomsCSVNew.py. - Sensor Limit: There are 191 sensors in
sensors_list.txt. Increasing the thread limit inthreading.Semaphore(100)to191to fetch all data concurrently may lead to instability, such as saving incorrect values or injecting empty data.
This document contains information about deployment options and requirements.
Both local deployment options will require Python installed (development done with Python version 3.11). Python modules such as Langchain, Flask and others must be installable.
LLM models can be downloaded from:
- Ollama: https://ollama.ai/library (e.g., mistral)
- LM Studio: https://huggingface.co/TheBloke (e.g., mistral GGUF 7B-13B)
Docker deployment doesn't mean a fully containerised application - Docker is used to run the Ollama LLM server on Windows.
- Requirements: Locally installed Docker application.
- Get Docker: https://www.docker.com/get-started/
- Ollama Image: https://hub.docker.com/r/ollama/ollama
This alternative will require code modification.
- Requirements: Locally installed LM Studio.
- Get LM Studio: https://lmstudio.ai/
Azure deployment allows several options. They are available as subscription plans and pay-as-you-go.
Relevant services:
- App Services: For Python Flask app deployment. (Quickstart: Deploy a Python (Django or Flask) web app to Azure App Service)
- AI Studio: To connect to an LLM run by Azure. (How to deploy Llama 2 family of large language models with Azure AI Studio)
- Virtual machine: To deploy the full-stack inside a Linux VM. (Virtual Machines (VMs) for Linux and Windows | Microsoft Azure)
- Server access
- Ollama installed (https://ollama.ai/)
- Python 3.11
- Permission to install Linux packages and Python modules.