A comprehensive system for building a custom academic Q&A assistant using RAG (Retrieval Augmented Generation) and QLoRA fine-tuning.
- Data Collection: Automated arXiv paper scraping and PDF processing
- RAG Pipeline: Hybrid retrieval combining vector search (FAISS) and keyword search (SQLite FTS5)
- Synthetic Data Generation: GPT-4 powered Q&A pair generation for fine-tuning
- QLoRA Fine-Tuning: Efficient fine-tuning of LLaMA 3.1 8B model with 4-bit quantization
- Gradio UI: Interactive web interface for testing and comparing models
- FastAPI Backend: RESTful API for integration with external applications
- Python 3.10+
- CUDA-capable GPU (recommended)
- OpenAI API key (for synthetic data generation)
export GRADIO_SERVER_PORT=7861
python gradio-ui.pyThen access the UI at http://localhost:7861
python pipeline-runner.pyuvicorn module8-api:app --host 0.0.0.0 --port 8000├── config/ # Configuration files
├── modules/ # Core modules
│ ├── m1_langchain_llama/ # LLM loading and chain building
│ ├── m2_data_collection/ # arXiv scraping
│ ├── m3_rag_pipeline/ # RAG indexing
│ ├── m4_hybrid_retrieval/ # Hybrid search (FAISS + SQLite)
│ ├── m5_synthetic_data/ # Q&A generation
│ ├── m6_fine_tuning/ # QLoRA training
│ └── m8_api_service/ # FastAPI endpoints
├── storage/ # Data, indexes, and models
├── gradio-ui.py # Main Gradio interface
├── module8-api.py # FastAPI application
└── pipeline-runner.py # Full pipeline execution
Edit config/settings.py to customize:
- Model selection (base model, embedding model)
- Training parameters (LoRA rank, learning rate, etc.)
- RAG settings (top-k retrieval, similarity thresholds)
- Data collection (arXiv categories, number of papers)
- Setup Guide - Setting up OpenAI API
- Deployment Guide - Production deployment
- Restart Guide - Troubleshooting Gradio UI
Contributions are welcome! Please feel free to submit a Pull Request.
This project is open source and available under the MIT License.