Skip to content

xingyundelisen/LocalRAG-Forge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

436 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LocalRAG Forge

Introduction

LocalRAG Forge is a local-first RAG framework for maximizing the value of private, self-hosted, and on-device LLM systems. It helps developers build stronger document intelligence workflows around retrieval, reranking, knowledge ingestion, grounded generation, and evaluation without depending on a cloud-only stack.

The project is designed for teams that want to get more capability out of local models through better retrieval pipelines, higher-quality context construction, reusable dataset workflows, and repeatable evaluation. With LocalRAG Forge, you can improve retrieval quality, structure knowledge ingestion, benchmark responses, and validate local RAG behavior before shipping changes into production.

Features

  • Local-first RAG pipeline orchestration
  • Retrieval and reranking optimization
  • Knowledge ingestion and document processing
  • Dataset generation
  • LLM evaluation
  • Automated AI testing
  • Dataset quality analysis
  • Retrieval grounding checks
  • Workflow-level benchmarking
  • Extensible evaluation modules

Architecture

Document / Knowledge Base
          |
          v
     RAG Pipeline
  (retrieve, rank, prompt)
          |
          v
      LLM Response
          |
          v
   Evaluation Module
 (relevance, grounding,
   retrieval coverage)
          |
          v
     Quality Metrics
   and Test Reports

Installation

Install Python dependencies:

pip install -r requirements.txt

Or install the local framework package in editable mode:

pip install -e .

Optional local services:

docker compose up -d

Quick Start

Run the end-to-end demo:

python examples/demo_rag_test.py

Usage Example

from core.engine import build_default_llm
from dataset.dataset_builder import DatasetBuilder
from evaluation.evaluator import RAGEvaluator
from pipeline.workflow import TestingWorkflow
from rag.rag_pipeline import SimpleRAGPipeline, SourceDocument

documents = [
    SourceDocument(
        document_id="doc-1",
        text="LocalRAG Forge can generate evaluation datasets from source documents and golden answers.",
    ),
    SourceDocument(
        document_id="doc-2",
        text="LocalRAG Forge evaluates RAG systems using grounding, relevance, and retrieval quality metrics.",
    ),
]

pipeline = SimpleRAGPipeline(
    documents=documents,
    llm_client=build_default_llm(),
)

dataset = DatasetBuilder.from_records(
    name="demo-dataset",
    records=[
        {
            "sample_id": "sample-1",
            "question": "What can LocalRAG Forge generate for evaluation?",
            "expected_answer": "LocalRAG Forge can generate evaluation datasets from source documents and golden answers.",
            "relevant_document_ids": ["doc-1"],
        }
    ],
)

workflow = TestingWorkflow(
    pipeline=pipeline,
    evaluator=RAGEvaluator(),
)

report = workflow.run(dataset)
    print(report.average_score)

Project Structure

LocalRAG Forge
├── core
│   └── engine.py
├── rag
│   └── rag_pipeline.py
├── dataset
│   └── dataset_builder.py
├── evaluation
│   └── evaluator.py
├── pipeline
│   └── workflow.py
├── examples
│   └── demo_rag_test.py
├── docs
├── tests
└── cli
  • core: shared runtime components such as LLM adapters, execution primitives, and engine abstractions.
  • rag: retrieval and generation pipeline implementations used to test AI application behavior.
  • dataset: tools for building evaluation datasets, golden sets, and sample records.
  • evaluation: scoring logic for response quality, grounding, retrieval coverage, and benchmark metrics.
  • pipeline: orchestration workflows that connect datasets, pipelines, and evaluators into repeatable test runs.
  • examples: runnable demos that show how to test a RAG workflow with minimal setup.
  • docs: architecture notes, guides, and future public documentation.
  • tests: automated tests for framework behavior and regression protection.
  • cli: command-line interfaces for running evaluations and local automation.

Roadmap

  • Local model orchestration
  • Private knowledge workflows
  • Agent testing
  • Multi-model evaluation
  • Dataset synthesis
  • AI pipeline benchmarking
  • Retrieval regression suites
  • CI-integrated evaluation reports

Contributing

Contributions are welcome. Please open an issue to discuss major changes, submit pull requests for improvements, and include tests or examples when adding new features.

License

MIT License

About

Local-first RAG framework for maximizing the value of private and on-device LLM systems.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors