This project implements a modular Retrieval-Augmented Generation (RAG) pipeline using Python. It supports loading data from both PDF and TXT files, chunking, embedding, vector search with FAISS, and answer generation with a language model via DSPy.
- Ingests text from PDF and TXT files (easily extensible to other formats)
- Chunks text for granular retrieval
- Embeds text using HuggingFace Transformers
- Fast similarity search with FAISS
- Modular design for easy extension
- Uses DSPy for LLM orchestration
-
Clone the repository and navigate to the project directory.
-
Create the conda environment:
conda env create -f environment.yml conda activate rag-pipeline
-
(Optional) If you have a GPU, replace
cpuonlywith the appropriatecudatoolkitversion inenvironment.ymland reinstall. -
Add your HuggingFace token:
- Edit
main.pyand replace'YOUR_HF_TOKEN'with your actual token.
- Edit
-
Add your data:
- Place your
.pdfand/or.txtfiles in the project directory. - Update the
sourceslist inmain.pyto include your files.
- Place your
Run the main script:
python main.pydata_loader.py— Load text from PDF and TXT fileschunker.py— Chunk text with overlapembedder.py— Load embedding model and generate embeddingsfaiss_index.py— Build and use a FAISS indexretriever.py— Retrieve top-k similar chunks for a queryrag_module.py— DSPy RAG module definitionmain.py— Entry point, wiring all modules togetherenvironment.yml— Conda environment specification
To add new data sources, extend data_loader.py with new loader functions.
- This pipeline is for prototyping and educational purposes. For production, consider improvements such as better chunking, metadata tracking, and evaluation.
- Requires a HuggingFace account and token for LLM access.
Feel free to open issues or contribute improvements!
# First time setup - this takes time
rag = PersistentRAGManager()
sources = ["your_document1.pdf", "your_document2.txt"]
rag.build_pipeline(sources, "my_corpus_name", hf_token="your_token")
rag.setup_dspy(hf_token="your_token")
# Test it
result = rag.ask("What is this about?")
print(result['answer'])What happens:
- Loads your documents
- Chunks them
- Generates embeddings
- Builds FAISS index
- Saves everything to
rag_cache/directory
# In any new session - this is fast!
rag = PersistentRAGManager()
rag.build_pipeline([], "my_corpus_name", hf_token="your_token") # Loads from cache
rag.setup_dspy(hf_token="your_token")
# Ask unlimited questions
result1 = rag.ask("Question 1?")
result2 = rag.ask("Question 2?")
# ... as many as you wantWhat happens:
- Loads embeddings from disk
- Loads FAISS index from disk
- Loads documents from disk
- No re-processing needed!
- Build Once, Use Forever: Process your documents once, ask questions anytime
- Fast Startup: Subsequent sessions load in seconds, not minutes
- Multiple Corpora: You can have different RAG systems for different topics
- Persistent Storage: Your embeddings and index are saved between sessions
- Memory Efficient: Only loads what you need
# Session 1: Build your RAG
python session_example.py
# Edit the file to uncomment session_1_build_pipeline()
# Session 2: Use your RAG (new terminal)
python session_example.py
# Edit the file to uncomment session_2_use_pipeline()
# Session 3: Interactive questions (new terminal)
python session_example.py
# Edit the file to uncomment session_3_quick_questions()The system saves these files in rag_cache/:
my_corpus_name_documents.pkl- Your chunked documentsmy_corpus_name_embeddings.npy- Document embeddingsmy_corpus_name_index.faiss- FAISS search indexmy_corpus_name_metadata.pkl- Source info and metadata
You can have different RAG systems for different topics:
# Philosophy RAG
rag1 = PersistentRAGManager()
rag1.build_pipeline(philosophy_files, "philosophy", hf_token=token)
# Science RAG
rag2 = PersistentRAGManager()
rag2.build_pipeline(science_files, "science", hf_token=token)
# Use either one anytime
rag1.ask("What is existentialism?")
rag2.ask("What is quantum physics?")This gives you a persistent, reusable RAG system that you can build once and use across multiple sessions!
python main.py- Loads your documents once
- Builds the index once
- Then you can ask unlimited questions interactively
- Type
quitto exit
from rag_manager import RAGManager
# Setup once
sources = ["your_document1.pdf", "your_document2.txt"]
rag = RAGManager(sources, hf_token="your_token")
# Ask multiple questions
result1 = rag.ask("What is the main topic?")
result2 = rag.ask("Can you explain concept X?")
result3 = rag.ask("What are the key findings?")
print(result1['answer'])
print(result2['answer'])
print(result3['answer'])questions = [
"What is the main argument?",
"What methodology was used?",
"What are the conclusions?",
"How does this relate to previous work?"
]
for question in questions:
result = rag.ask(question)
print(f"Q: {question}")
print(f"A: {result['answer']}\n")- One-time Setup: Load documents and build index only once
- Fast Queries: Subsequent questions are much faster
- Reusable: Same pipeline for multiple questions
- Flexible: Can ask follow-up questions, related questions, etc.
- Memory Efficient: Index stays in memory
- Prepare your documents (PDFs, TXTs about your topic)
- Update the sources list in either script
- Add your HuggingFace token
- Run the pipeline (interactive or programmatic)
- Ask unlimited questions about your topic!
The pipeline will remember your documents and can answer any question about the content you've loaded.