This project aims to help researchers find answers from a set of research papers with the help of a customized RAG pipeline and a powerfull LLM, all offline and free of cost.
For more details, please checkout the blog post about this project.
- Download some research papers from Arxiv
- Use Llamaindex to load, chunk, embed and store these documents to a Qdrant database
- FastAPI endpoint that receives a query/question, searches through our documents and find the best matching chunks
- Feed these relevant documents into an LLM as a context
- Generate an easy to understand answer and return it as an API response alongside citing the sources
This project now uses Docker and Docker Compose for easier setup and reproducibility: You can visit this blog article to learn more: Dockerizing a RAG Application with FastAPI, LlamaIndex, Qdrant and Ollama
git clone https://github.com/Otman404/local-rag-llamaindex
cd local-rag-llamaindex
docker compose build
This will:
-
Build a rag-base-image containing shared dependencies (LlamaIndex, llama-index, etc.)
-
Build the api service (FastAPI backend) using Dockerfile in api/
-
Build the data_ingestion service used for downloading and indexing research papers using Dockerfile in data/
docker compose up ollama qdrant api
This starts:
- API server (FastAPI on port
8000
) - Qdrant vector DB (port
6333
) - Ollama LLM server (port
11434
)
To ingest new research papers into Qdrant:
docker compose run data_ingestion --query "LLM" --max 5 --ingest
--query
: search keyword for downloading research papers from Arxiv.
--max
: Limits the results.
--ingest
: Ingests the downloaded papers into the Qdrant database.
This uses arxiv
to fetch papers, chunk them with LlamaIndex, and store them into Qdrant.
✅ You should now be able to query the API at
http://localhost:8000
To stop all running containers and networks started by Docker Compose:
docker compose down
docker run -p 6333:6333 -v ~/qdrant_storage:/qdrant/storage:z qdrant/qdrant
Inside the project folder run
# create virtual env with uv
uv venv
# activate the virtual env
source .venv/bin/activate
# install project dependencies
uv pip install ".[dev]"
uv run data/data.py --query "LLM" --max 5 --ingest
Follow this article for more infos on how to run models from hugging face locally with Ollama.
Create model from Modelfile
ollama create research_assistant -f ollama/Modelfile
Start the model server
ollama run research_assistant
By default, Ollama runs on http://localhost:11434
uv run fastapi api/main.py dev