A production-grade backend system that answers research questions using a combination of Retrieval-Augmented Generation (RAG), vector similarity search (Qdrant), and a HuggingFace transformer-based question answering model. Feedback from users can be collected and used to fine-tune the model via Direct Preference Optimization (DPO).
- For inferencing I am using Ollama with a quantized Llama3 model
- Semantic search over research papers using embeddings + Qdrant
- LLM-based answer generation with grounding
- User feedback collection for training preference pairs
- FastAPI backend with modular architecture
- Context generated is still not optimal
- Need to improve how data is stored in indexed
This project is being built in public. Alongside the codebase, I maintain open documentation of design decisions, ongoing challenges, and planned improvements.
Explore the project:
-
🗺 Roadmap — Planned features and development direction
→ROADMAP.md -
📖 Engineering Journal — Challenges faced and how they were solved
→/challenges
LLM-Powered-Research-Assistant/
├── app/
│ ├── main.py # FastAPI entrypoint
│ ├── api/ # API layer
│ │ ├── routes.py # API endpoints
│ │ └── dependencies.py # Input validation and dependency injection
│ ├── core/ # Business logic
│ │ ├── inference.py # QA pipeline
│ │ ├── retriever.py # FAISS retrieval logic
│ │ ├── feedback.py # Feedback persistence logic
│ │ └── utils.py # Utility functions
├── data/ # Raw arXiv papers
├── models/ # Saved DPO fine-tuned models
├── scripts/ # Ingestion + fine-tuning scripts
│ ├── ingest3.py # Chunking + embedding logic
│ └── train_dpo.py # DPO fine-tuning workflow
├── Dockerfile
├── requirements.txt
└── README.mdgit clone https://github.com/yourusername/LLM-Powered-Research-Assistant.git
cd LLM-Powered-Research-Assistantpython -m venv venv
source venv/bin/activate # or venv\Scripts\activate on Windows
pip install -r requirements.txtdocker run -p 6333:6333 qdrant/qdrant uvicorn app.main:app --reloadAccess the API at: http://localhost:8000/api/ask
POST /api/ask
{
"query": "What is DPO in LLM training?"
}POST /api/feedback
{
"query": "What is DPO?",
"response": "Direct Preference Optimization...",
"user_feedback": "positive"
}- LLM: distilbert-base-cased-distilled-squad (default, replaceable)
- Embedding: sentence-transformers (e.g. all-MiniLM-L6-v2)
- VectorDB: Qdrant
- Fine-tuning: HuggingFace TRL (DPO)
You can run this project in a Docker container for easy deployment and reproducibility.
docker build -t llm-research-assistant .docker run -p 8000:8000 llm-research-assistantThis will start the FastAPI server and expose it on http://localhost:8000.
If you need to set environment variables (such as HF_TOKEN for Hugging Face), you can pass them with the -e flag:
docker run -p 8000:8000 -e HF_TOKEN=your_hf_token llm-research-assistantOr use a .env file:
docker run --env-file .env -p 8000:8000 llm-research-assistant- Make sure your
requirements.txtis up to date with all dependencies. - The Dockerfile installs system dependencies needed for PDF extraction (e.g.,
poppler-utils). - If you want to mount local data or models, use the
-vflag withdocker run.
- Integrate LangChain agents
- Add logging and monitoring (Prometheus)
- Deploy to cloud (AWS/GCP/Render)
