Skip to content

bostdiek/ragGroundTruthGenerator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

79 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🎯 AI Ground Truth Generator

A complete, extensible platform for generating high-quality ground truth data for AI/RAG applications

Transform your subject matter experts' knowledge into training data that actually works. This full-stack application provides a working template that development teams can immediately use and extend for their own AI solutions.

flowchart TD
    A[πŸ‘€ Subject Matter Expert] --> B[πŸ“‹ Create Collection]
    B --> C[❓ Add Questions]
    C --> D[πŸ” Retrieve Documents]
    D --> E[πŸ“„ Source Documents]
    E --> F[πŸ€– Generate Answers]
    F --> G[πŸ“ AI-Generated Answers]
    G --> H[βœ… Expert Review]
    H --> I{Quality Check}
    I -->|βœ… Approve| J[πŸ“€ Export Training Data]
    I -->|❌ Revise| K[✏️ Edit & Improve]
    K --> H
    I -->|❓ Needs Context| L[πŸ“š Add More Documents]
    L --> D
    
    style A fill:#e1f5fe
    style J fill:#e8f5e8
    style I fill:#fff3e0
    style K fill:#ffeaa7
    style L fill:#fdcb6e
Loading

πŸš€ What You Get: End-to-End Ground Truth Pipeline

This isn't just another demoβ€”it's a production-ready foundation that solves the real challenge of getting quality training data from domain experts:

πŸ“‹ Collection Management

Organize your ground truth work into logical collections, track progress, and manage team collaboration.

❓ Question Handling

Add questions that need expert answers, categorize by domain, and track them through the entire workflow.

πŸ” Document Retrieval

Automatically fetch relevant source documents from your data sources using configurable retrieval providers.

πŸ€– AI Answer Generation

Generate initial answers using configurable AI models (Azure OpenAI, local models, or custom providers).

βœ… Expert Review Process

Enable subject matter experts to review, approve, revise, and provide feedback on generated answers.

πŸ“€ Export for Training

Export approved Q&A pairs in formats ready for model training, fine-tuning, or RAG evaluation.


🎬 Try It Now: 60-Second Demo

Get the complete workflow running in under a minute:

# Clone and start the application
git clone <repository-url>
cd ise-ai-ground-truth-generator
docker-compose up

# Open your browser to http://localhost:3000
# Login with: demo / password
# Explore collections, Q&A generation, and review workflows

That's it! You now have a fully functional ground truth generation platform running locally with demo data.

πŸ” What to Explore in the Demo

  1. Login β†’ Use demo / password to access the platform
  2. Collections β†’ See how ground truth work is organized
  3. Add Questions β†’ Experience the question input workflow
  4. Retrieve Documents β†’ Watch automatic document fetching
  5. Generate Answers β†’ See AI-powered answer generation
  6. Review Process β†’ Try the expert review and approval workflow
  7. Export Data β†’ Download training-ready Q&A pairs

The demo uses in-memory storage and mock AI responses, so you can immediately see the complete workflow without any external dependencies.


πŸ—οΈ Architecture: Built for Extension

This platform uses a provider pattern that makes it easy to swap out components without touching core logic:

graph LR
    subgraph "πŸ–₯️ Frontend Layer"
        A[React + TypeScript<br/>πŸ“± Extensible UI<br/>πŸ” Auth Providers]
    end
    
    subgraph "βš™οΈ Backend Layer" 
        B[FastAPI + Python<br/>πŸ”„ Provider Pattern<br/>πŸ›‘οΈ Type Safety<br/>πŸ“– OpenAPI Docs]
    end
    
    subgraph "πŸ”Œ Provider Layer"
        C[πŸ” Authentication<br/>Simple Demo Auth<br/>β†’ Extend for Production Auth]
        D[πŸ—„οΈ Database<br/>In-Memory Storage β†’ PostgreSQL/MongoDB]
        E[πŸ” Data Sources<br/>Memory + Template Providers<br/>β†’ Azure Search/Elasticsearch/SharePoint]
        F[πŸ€– AI Models<br/>Demo Response Generator<br/>β†’ OpenAI/Azure OpenAI/Local Models]
    end
    
    A <--> B
    B <--> C
    B <--> D
    B <--> E
    B <--> F
    
    style A fill:#e3f2fd
    style B fill:#f3e5f5
    style C fill:#fff3e0
    style D fill:#e8f5e8
    style E fill:#fce4ec
    style F fill:#f1f8e9
Loading

πŸ”§ Architecture: Built for Extension

Component What's Included Extension Examples
πŸ” Authentication Simple demo auth (demo/password) Azure AD B2C, Auth0, Custom LDAP
πŸ—„οΈ Database In-memory storage (demo only) PostgreSQL, MongoDB, SQL Server
πŸ” Data Sources Mock documents (demo only) Azure Search, Elasticsearch, SharePoint
πŸ€– AI Generation Template responses (demo only) Azure OpenAI, OpenAI, Local models

⚠️ Important: The demo includes basic implementations only. Production extensions require additional development following the provider pattern.


🎯 Perfect For Teams Building RAG Applications

The RAG Ground Truth Challenge

Building effective RAG (Retrieval-Augmented Generation) applications requires high-quality ground truth data:

  • Questions your users will actually ask
  • Documents that contain the right information
  • Answers that correctly interpret those documents
  • Human validation from subject matter experts

This creates the foundation for measuring both retrieval metrics (did we find the right documents?) and generation metrics (did we produce accurate answers from those documents?).

How This Platform Helps

  1. πŸ“ Capture Real Questions β†’ Collect actual user questions and expert scenarios
  2. πŸ” Test Retrieval β†’ Verify your search finds the right documents
  3. πŸ€– Validate Generation β†’ Ensure AI correctly interprets retrieved content
  4. βœ… Expert Review β†’ Get domain expert validation before training
  5. πŸ“Š Measure Quality β†’ Track approval rates and identify improvement areas

πŸ› οΈ Extension Architecture

Backend: Provider Pattern Ready

The FastAPI backend uses a clean provider pattern designed for easy extension. Each component implements a base interface:

# Example: Your custom data source
class SharePointProvider(BaseDataSourceProvider):
    async def retrieve_documents(self, query: str) -> List[Document]:
        # Your SharePoint integration
        pass

# Register in factory.py
if RETRIEVAL_PROVIDER == "sharepoint":
    return SharePointProvider()

Extension Framework Supports:

  • Authentication: Any identity provider (base interface provided)
  • Databases: Any persistent storage (base interface provided)
  • Data Sources: Any document/search system (base interface provided)
  • AI Models: Any text generation service (base interface provided)

πŸ“ Note: Base interfaces and demo implementations are included. Production providers require implementation following the established patterns.

Frontend: React + Provider Pattern

The React frontend mirrors the backend's extensibility:

// Your authentication provider
export class CustomAuthProvider implements AuthService {
  async signIn(credentials: SignInRequest): Promise<AuthResult> {
    // Your custom authentication integration
  }
}

// Environment-driven configuration
REACT_APP_AUTH_PROVIDER=custom

Backend: Use Your Framework

While this template uses FastAPI/Python, the architecture translates to any backend framework:

// Node.js/Express example
interface IDataSourceProvider {
  async retrieveDocuments(query: string): Promise<Document[]>
}

// Register providers based on config
const dataSourceProvider = createProvider(process.env.DATA_SOURCE_PROVIDER)

Framework Translation:

  • Node.js/Express β†’ Factory pattern with dependency injection
  • .NET Core β†’ Built-in DI container with interface registration
  • Java Spring β†’ Component scanning and autowiring
  • Go β†’ Interface composition and factory functions

Use the API specification and architecture patterns as your implementation guide.

Frontend: Use Your Stack

  • Vue.js/Nuxt β†’ Follow the same API patterns
  • Angular β†’ Implement the service interfaces
  • Svelte β†’ Use the OpenAPI spec for type generation
  • Mobile β†’ React Native, Flutter, or native apps

πŸš€ Getting Started

Option 1: Quick Demo (Recommended)

git clone git@github.com:bryanostdiek_microsoft/rag_ground_truth_generator.git
cd rag_ground_truth_generator
docker-compose up

Visit http://localhost:3000 and login with demo / password.

Option 2: Development Setup

# Backend
cd backend
uv sync  # or pip install -e .
uv run uvicorn app:app --reload

# Frontend  
cd frontend
npm install
npm start

Option 3: Production Deployment

⚠️ Production Readiness: This demo includes in-memory storage and mock providers only. For production deployment, you'll need to:

  • Implement persistent database providers (PostgreSQL, MongoDB, etc.)
  • Add production authentication (Azure AD B2C, Auth0, etc.)
  • Configure real data sources (Azure Search, Elasticsearch, etc.)
  • Set up actual AI model providers (Azure OpenAI, OpenAI, etc.)

The provider pattern architecture makes this straightforward - see the Extension Guides for implementation examples.

Deployment strategies will depend on your specific infrastructure and requirements. The Docker setup provides a foundation that can be adapted for cloud platforms or on-premises deployments.


πŸ“š Documentation

🎯 Quick Start Guides

πŸ—οΈ Architecture Deep Dives

πŸ”§ Extension Guides

πŸ“– Component READMEs


πŸ” Use Cases

πŸ’Ό Enterprise RAG Applications

  • Customer support knowledge bases
  • Internal documentation Q&A
  • Compliance and policy guidance
  • Technical troubleshooting systems

πŸŽ“ AI Model Training

  • Fine-tuning language models
  • RAG evaluation datasets
  • Chatbot training data
  • Domain-specific AI assistants

🀝 Contributing

This is a template project designed for teams to fork and customize. We welcome:

  • πŸ› Bug Reports β†’ Issues with the demo or documentation
  • πŸ“– Documentation β†’ Improve setup guides and examples
  • πŸ”§ Provider Examples β†’ Additional provider implementations
  • πŸ’‘ Feature Requests β†’ Enhancements that benefit multiple teams

πŸ†˜ Support

  • πŸ“– Documentation Issues β†’ Create an issue in this repository
  • πŸ’¬ Implementation Questions β†’ Use GitHub Discussions for this repository
  • πŸ› Bug Reports β†’ Create an issue in this repository
  • πŸš€ Feature Requests β†’ Use GitHub Discussions for this repository

Ready to get started? Run docker-compose up and login with demo / password to see the full workflow in action! 🎯

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors