Skip to content

Interactive Evaluation Dashboard #23

@sanju234-san

Description

@sanju234-san

Create an interactive web-based dashboard (using Streamlit or Gradio) that provides real-time insights into the OpenMath model's performance on math problems. This dashboard will allow users to explore evaluation results, analyze error patterns, and understand where the model excels or struggles.🎯 Key Features1. Live Problem Solver

Input box for users to enter custom math problems
Real-time step-by-step solution display
Shows intermediate reasoning steps
Displays final answer with confidence score
2. Batch Evaluation Interface

Upload custom test sets (JSON/CSV)
Run evaluation on GSM8K test set (with configurable sample size)
Live progress bar during evaluation
Save evaluation results for later analysis
3. Performance Metrics Dashboard

Overall Accuracy (with visual gauge/meter)
Accuracy by Problem Type:

Addition/Subtraction
Multiplication/Division
Multi-step word problems
Percentage/Ratio problems

Accuracy by Difficulty:

Easy (1-2 steps)
Medium (3-4 steps)
Hard (5+ steps)

Charts/graphs showing performance trends

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions