LocalRAG Forge is a local-first RAG framework for maximizing the value of private, self-hosted, and on-device LLM systems. It helps developers build stronger document intelligence workflows around retrieval, reranking, knowledge ingestion, grounded generation, and evaluation without depending on a cloud-only stack.
The project is designed for teams that want to get more capability out of local models through better retrieval pipelines, higher-quality context construction, reusable dataset workflows, and repeatable evaluation. With LocalRAG Forge, you can improve retrieval quality, structure knowledge ingestion, benchmark responses, and validate local RAG behavior before shipping changes into production.
- Local-first RAG pipeline orchestration
- Retrieval and reranking optimization
- Knowledge ingestion and document processing
- Dataset generation
- LLM evaluation
- Automated AI testing
- Dataset quality analysis
- Retrieval grounding checks
- Workflow-level benchmarking
- Extensible evaluation modules
Document / Knowledge Base
|
v
RAG Pipeline
(retrieve, rank, prompt)
|
v
LLM Response
|
v
Evaluation Module
(relevance, grounding,
retrieval coverage)
|
v
Quality Metrics
and Test Reports
Install Python dependencies:
pip install -r requirements.txtOr install the local framework package in editable mode:
pip install -e .Optional local services:
docker compose up -dRun the end-to-end demo:
python examples/demo_rag_test.pyfrom core.engine import build_default_llm
from dataset.dataset_builder import DatasetBuilder
from evaluation.evaluator import RAGEvaluator
from pipeline.workflow import TestingWorkflow
from rag.rag_pipeline import SimpleRAGPipeline, SourceDocument
documents = [
SourceDocument(
document_id="doc-1",
text="LocalRAG Forge can generate evaluation datasets from source documents and golden answers.",
),
SourceDocument(
document_id="doc-2",
text="LocalRAG Forge evaluates RAG systems using grounding, relevance, and retrieval quality metrics.",
),
]
pipeline = SimpleRAGPipeline(
documents=documents,
llm_client=build_default_llm(),
)
dataset = DatasetBuilder.from_records(
name="demo-dataset",
records=[
{
"sample_id": "sample-1",
"question": "What can LocalRAG Forge generate for evaluation?",
"expected_answer": "LocalRAG Forge can generate evaluation datasets from source documents and golden answers.",
"relevant_document_ids": ["doc-1"],
}
],
)
workflow = TestingWorkflow(
pipeline=pipeline,
evaluator=RAGEvaluator(),
)
report = workflow.run(dataset)
print(report.average_score)LocalRAG Forge
├── core
│ └── engine.py
├── rag
│ └── rag_pipeline.py
├── dataset
│ └── dataset_builder.py
├── evaluation
│ └── evaluator.py
├── pipeline
│ └── workflow.py
├── examples
│ └── demo_rag_test.py
├── docs
├── tests
└── cli
core: shared runtime components such as LLM adapters, execution primitives, and engine abstractions.rag: retrieval and generation pipeline implementations used to test AI application behavior.dataset: tools for building evaluation datasets, golden sets, and sample records.evaluation: scoring logic for response quality, grounding, retrieval coverage, and benchmark metrics.pipeline: orchestration workflows that connect datasets, pipelines, and evaluators into repeatable test runs.examples: runnable demos that show how to test a RAG workflow with minimal setup.docs: architecture notes, guides, and future public documentation.tests: automated tests for framework behavior and regression protection.cli: command-line interfaces for running evaluations and local automation.
- Local model orchestration
- Private knowledge workflows
- Agent testing
- Multi-model evaluation
- Dataset synthesis
- AI pipeline benchmarking
- Retrieval regression suites
- CI-integrated evaluation reports
Contributions are welcome. Please open an issue to discuss major changes, submit pull requests for improvements, and include tests or examples when adding new features.
MIT License