A complete, extensible platform for generating high-quality ground truth data for AI/RAG applications
Transform your subject matter experts' knowledge into training data that actually works. This full-stack application provides a working template that development teams can immediately use and extend for their own AI solutions.
flowchart TD
A[π€ Subject Matter Expert] --> B[π Create Collection]
B --> C[β Add Questions]
C --> D[π Retrieve Documents]
D --> E[π Source Documents]
E --> F[π€ Generate Answers]
F --> G[π AI-Generated Answers]
G --> H[β
Expert Review]
H --> I{Quality Check}
I -->|β
Approve| J[π€ Export Training Data]
I -->|β Revise| K[βοΈ Edit & Improve]
K --> H
I -->|β Needs Context| L[π Add More Documents]
L --> D
style A fill:#e1f5fe
style J fill:#e8f5e8
style I fill:#fff3e0
style K fill:#ffeaa7
style L fill:#fdcb6e
This isn't just another demoβit's a production-ready foundation that solves the real challenge of getting quality training data from domain experts:
Organize your ground truth work into logical collections, track progress, and manage team collaboration.
Add questions that need expert answers, categorize by domain, and track them through the entire workflow.
Automatically fetch relevant source documents from your data sources using configurable retrieval providers.
Generate initial answers using configurable AI models (Azure OpenAI, local models, or custom providers).
Enable subject matter experts to review, approve, revise, and provide feedback on generated answers.
Export approved Q&A pairs in formats ready for model training, fine-tuning, or RAG evaluation.
Get the complete workflow running in under a minute:
# Clone and start the application
git clone <repository-url>
cd ise-ai-ground-truth-generator
docker-compose up
# Open your browser to http://localhost:3000
# Login with: demo / password
# Explore collections, Q&A generation, and review workflowsThat's it! You now have a fully functional ground truth generation platform running locally with demo data.
- Login β Use
demo/passwordto access the platform - Collections β See how ground truth work is organized
- Add Questions β Experience the question input workflow
- Retrieve Documents β Watch automatic document fetching
- Generate Answers β See AI-powered answer generation
- Review Process β Try the expert review and approval workflow
- Export Data β Download training-ready Q&A pairs
The demo uses in-memory storage and mock AI responses, so you can immediately see the complete workflow without any external dependencies.
This platform uses a provider pattern that makes it easy to swap out components without touching core logic:
graph LR
subgraph "π₯οΈ Frontend Layer"
A[React + TypeScript<br/>π± Extensible UI<br/>π Auth Providers]
end
subgraph "βοΈ Backend Layer"
B[FastAPI + Python<br/>π Provider Pattern<br/>π‘οΈ Type Safety<br/>π OpenAPI Docs]
end
subgraph "π Provider Layer"
C[π Authentication<br/>Simple Demo Auth<br/>β Extend for Production Auth]
D[ποΈ Database<br/>In-Memory Storage β PostgreSQL/MongoDB]
E[π Data Sources<br/>Memory + Template Providers<br/>β Azure Search/Elasticsearch/SharePoint]
F[π€ AI Models<br/>Demo Response Generator<br/>β OpenAI/Azure OpenAI/Local Models]
end
A <--> B
B <--> C
B <--> D
B <--> E
B <--> F
style A fill:#e3f2fd
style B fill:#f3e5f5
style C fill:#fff3e0
style D fill:#e8f5e8
style E fill:#fce4ec
style F fill:#f1f8e9
| Component | What's Included | Extension Examples |
|---|---|---|
| π Authentication | Simple demo auth (demo/password) |
Azure AD B2C, Auth0, Custom LDAP |
| ποΈ Database | In-memory storage (demo only) | PostgreSQL, MongoDB, SQL Server |
| π Data Sources | Mock documents (demo only) | Azure Search, Elasticsearch, SharePoint |
| π€ AI Generation | Template responses (demo only) | Azure OpenAI, OpenAI, Local models |
β οΈ Important: The demo includes basic implementations only. Production extensions require additional development following the provider pattern.
Building effective RAG (Retrieval-Augmented Generation) applications requires high-quality ground truth data:
- Questions your users will actually ask
- Documents that contain the right information
- Answers that correctly interpret those documents
- Human validation from subject matter experts
This creates the foundation for measuring both retrieval metrics (did we find the right documents?) and generation metrics (did we produce accurate answers from those documents?).
- π Capture Real Questions β Collect actual user questions and expert scenarios
- π Test Retrieval β Verify your search finds the right documents
- π€ Validate Generation β Ensure AI correctly interprets retrieved content
- β Expert Review β Get domain expert validation before training
- π Measure Quality β Track approval rates and identify improvement areas
The FastAPI backend uses a clean provider pattern designed for easy extension. Each component implements a base interface:
# Example: Your custom data source
class SharePointProvider(BaseDataSourceProvider):
async def retrieve_documents(self, query: str) -> List[Document]:
# Your SharePoint integration
pass
# Register in factory.py
if RETRIEVAL_PROVIDER == "sharepoint":
return SharePointProvider()Extension Framework Supports:
- Authentication: Any identity provider (base interface provided)
- Databases: Any persistent storage (base interface provided)
- Data Sources: Any document/search system (base interface provided)
- AI Models: Any text generation service (base interface provided)
π Note: Base interfaces and demo implementations are included. Production providers require implementation following the established patterns.
The React frontend mirrors the backend's extensibility:
// Your authentication provider
export class CustomAuthProvider implements AuthService {
async signIn(credentials: SignInRequest): Promise<AuthResult> {
// Your custom authentication integration
}
}
// Environment-driven configuration
REACT_APP_AUTH_PROVIDER=customWhile this template uses FastAPI/Python, the architecture translates to any backend framework:
// Node.js/Express example
interface IDataSourceProvider {
async retrieveDocuments(query: string): Promise<Document[]>
}
// Register providers based on config
const dataSourceProvider = createProvider(process.env.DATA_SOURCE_PROVIDER)Framework Translation:
- Node.js/Express β Factory pattern with dependency injection
- .NET Core β Built-in DI container with interface registration
- Java Spring β Component scanning and autowiring
- Go β Interface composition and factory functions
Use the API specification and architecture patterns as your implementation guide.
- Vue.js/Nuxt β Follow the same API patterns
- Angular β Implement the service interfaces
- Svelte β Use the OpenAPI spec for type generation
- Mobile β React Native, Flutter, or native apps
git clone git@github.com:bryanostdiek_microsoft/rag_ground_truth_generator.git
cd rag_ground_truth_generator
docker-compose upVisit http://localhost:3000 and login with demo / password.
# Backend
cd backend
uv sync # or pip install -e .
uv run uvicorn app:app --reload
# Frontend
cd frontend
npm install
npm start
β οΈ Production Readiness: This demo includes in-memory storage and mock providers only. For production deployment, you'll need to:
- Implement persistent database providers (PostgreSQL, MongoDB, etc.)
- Add production authentication (Azure AD B2C, Auth0, etc.)
- Configure real data sources (Azure Search, Elasticsearch, etc.)
- Set up actual AI model providers (Azure OpenAI, OpenAI, etc.)
The provider pattern architecture makes this straightforward - see the Extension Guides for implementation examples.
Deployment strategies will depend on your specific infrastructure and requirements. The Docker setup provides a foundation that can be adapted for cloud platforms or on-premises deployments.
- 60-Second Demo β Get running immediately
- Extension Overview β Understand the provider pattern
- Backend Architecture β FastAPI, providers, and extension points
- Frontend Architecture β React patterns and extensibility
- API Reference β Complete endpoint documentation
- Backend Extensions β Add custom providers and integrations
- Frontend Extensions β Customize UI and add features
- Frontend APIs β Service layer and API integration
- Backend Details β FastAPI setup, uv, testing, and development
- Frontend Details β React setup, state management, and testing
- Customer support knowledge bases
- Internal documentation Q&A
- Compliance and policy guidance
- Technical troubleshooting systems
- Fine-tuning language models
- RAG evaluation datasets
- Chatbot training data
- Domain-specific AI assistants
This is a template project designed for teams to fork and customize. We welcome:
- π Bug Reports β Issues with the demo or documentation
- π Documentation β Improve setup guides and examples
- π§ Provider Examples β Additional provider implementations
- π‘ Feature Requests β Enhancements that benefit multiple teams
- π Documentation Issues β Create an issue in this repository
- π¬ Implementation Questions β Use GitHub Discussions for this repository
- π Bug Reports β Create an issue in this repository
- π Feature Requests β Use GitHub Discussions for this repository
Ready to get started? Run docker-compose up and login with demo / password to see the full workflow in action! π―