RAG-based VectorDB-LLM Query Engine

This project implements an AI-powered document query system using LangChain, ChromaDB, and OpenAI's language models. It enables users to create a searchable database from markdown documents and query it using natural language.

Features

Vector database creation from markdown documents
Embedding and query cost estimation
Similarity searches on the database
AI-powered response generation for user queries

Architecture Diagram

Requirements

Python 3.7+
Dependencies listed in requirements.txt

Installation

Clone this repository

Create a virtual environment:

python -m venv .venv
source .venv/bin/activate  # On Windows, use `.venv\Scripts\activate`

Install required packages:
```
pip install -r requirements.txt
```
Set up your OpenAI API key in a .env file:
```
OPENAI_API_KEY=your_api_key_here
```

Getting Started

Follow these steps to quickly set up and use the RAG-based VectorDB-LLM Query Engine:

Create a database from your markdown documents:
```
python create_database.py --data_folder data/go-docs --chroma_db_path chroma_go_docs/
```
This command will process the markdown files in the data/go-docs directory and create a vector database in the chroma_go_docs/ folder.

Query the database with a natural language question:

python query_data.py --query_text "Explain goroutines in go in a sentence" --chroma_db_path chroma_go_docs/ --prompt_model gpt-3.5-turbo

View the AI-generated response:

Goroutines are lightweight, concurrent functions or methods in Go that run independently, managed by the Go runtime, allowing for efficient parallel execution and easy implementation of concurrent programming patterns.

Usage

For more detailed usage instructions, refer to the following sections:

Create the Database

python create_database.py --data_folder path/to/your/markdown/files --chroma_db_path path/to/save/database

Query the Database

python query_data.py --query_text "Your question here" --chroma_db_path path/to/database --prompt_model gpt-3.5-turbo

File Structure

create_database.py: Database creation script
query_data.py: Database querying script
estimate_cost.py: Cost estimation module
get_token_count.py: Token counting utility
data/: Markdown documents directory
chroma/: ChromaDB database storage (gitignored)

Notes

Uses OpenAI's text-embedding-3-small for embeddings and gpt-3.5-turbo for responses by default
Place markdown files in data/ or specify a custom path
ChromaDB database stored in chroma/ (gitignored)

Troubleshooting

Ensure compatible Python version and correct package installation
Verify OpenAI API key and account credits for API errors

License

This project is licensed under the terms of the MIT License. For more information, please refer to the LICENSE file.

Built with

For questions or issues, please open an issue on the GitHub repository.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
create_database.py		create_database.py
estimate_cost.py		estimate_cost.py
query_data.py		query_data.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG-based VectorDB-LLM Query Engine

Features

Architecture Diagram

Requirements

Installation

Getting Started

Usage

File Structure

Notes

Troubleshooting

License

Built with

About

Releases

Packages

Languages

License

justine-george/ai-markdown-llm-retrieval

Folders and files

Latest commit

History

Repository files navigation

RAG-based VectorDB-LLM Query Engine

Features

Architecture Diagram

Requirements

Installation

Getting Started

Usage

File Structure

Notes

Troubleshooting

License

Built with

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages