This RAG Visualizer offers a comprehensive exploration of a 3D embedding space in an easy-to-distibute streamlit app. Chunks are clustered by DBSCAN and described, using a subset of chunks, passed to the underlying LLM. Users can interactively prompt the LLM using LlamaIndex for navie RAG. The prompt and retrieved chunks are visualized and highlighted. RAG Visualizer serves as a toy example for your LLMOps pipeline by describing of clusters, revealing meaningful patterns and enchancing interpretability.
- Clone the repository to your local machine:
git clone https://github.com/your-username/rag-visualizer.git
- Navigate to the project directory:
cd rag-visualizer
- Install the required dependencies using pip:
pip install -r requirements.txt
- in
constants.py
change the HF_TOKEN variable with your specific token.HF_TOKEN = 'YOUR_HF_TOKEN'
- Run the RAG visualizer script:
streamlit run app.py
By default, Streamlit operates on port 8501. If deploying to the cloud, ensure explicit traffic allowance on port 8501 and use the external IP address of your instance in the format: external_ip:8501
.
The current LLM is meta-llama/Llama-2-7b-chat-hf
in 8bit format and embedding model is sentence-transformers/all-MiniLM-L6-v2
. Therefore, your HF token will need to have access to the Llama2 models. You can use other LLMs. However, make sure to configure the system prompt for your given LLM. The query engine is LlamaIndex. This app is optimized to run efficiently on a single Nvidia T4 GPU, ensuring reasonable response times. Take these considerations into account for a seamless deployment experience.
This project is licensed under the MIT license.