BetterRAG helps you find the optimal text chunking strategy for your Retrieval-Augmented Generation pipeline through rigorous, data-driven evaluation. Stop guessing which chunking method works bestβmeasure it!
π Compare Strategies | βοΈ Zero-Code Configuration | π Interactive Dashboard |
Text chunking can make or break your RAG system's performance. Different strategies yield dramatically different results, but the optimal approach depends on your specific documents and use case. BetterRAG provides:
- Quantitative comparison between chunking strategies
- Visualized metrics to understand performance differences
- Clear recommendations based on real data
- No coding required to evaluate and improve your pipeline
|
|
|
|
- Python 3.8+
- MongoDB (local or remote)
- API keys for Azure OpenAI and/or Google Gemini
# 1. Clone the repository
git clone https://github.com/yourusername/betterrag.git
cd betterrag
# 2. Install dependencies
pip install -r requirements.txt
# 3. Set up your configuration
cp config.template.yaml config.yaml
# Edit config.yaml with your API keys and preferences
# Add your documents to data/documents/
# Run the evaluation
python -m app.main
# View the interactive dashboard
# Default: http://127.0.0.1:8050/
BetterRAG provides clear visual comparisons between chunking strategies:
Based on comprehensive metrics, BetterRAG will recommend the most effective chunking approach for your specific documents and queries.
BetterRAG uses a single YAML configuration file for all settings:
# Chunking strategies to evaluate
chunking:
fixed_size:
enabled: true
chunk_size: 500
chunk_overlap: 50
recursive:
enabled: true
chunk_size: 1000
separators: ["\n\n", "\n", " ", ""]
semantic:
enabled: true
model: "all-MiniLM-L6-v2"
# API credentials (or use environment variables)
api:
azure_openai:
api_key: ${AZURE_OPENAI_API_KEY}
endpoint: ${AZURE_OPENAI_ENDPOINT}
See config_setup.md for detailed configuration instructions.
# Run dashboard only (using previously processed data)
python -m app.main --dashboard-only
# Reset database before processing
python -m app.main --reset-db
# Use custom config file
python -m app.main --config my_custom_config.yaml
- Create a new chunker implementation in
app/chunkers/
- Register it in
app/chunkers/__init__.py
- Add configuration parameters in
config.yaml
Extend the ChunkingEvaluator
class in app/evaluation/metrics.py
to add new metrics.
Contributions are welcome! Feel free to:
- Report bugs and issues
- Suggest new features or enhancements
- Add support for additional LLM providers
- Implement new chunking strategies
This project is licensed under the MIT License - see the LICENSE file for details.
Built with β€οΈ for the RAG community