-
Notifications
You must be signed in to change notification settings - Fork 21
Description
Create an interactive web-based dashboard (using Streamlit or Gradio) that provides real-time insights into the OpenMath model's performance on math problems. This dashboard will allow users to explore evaluation results, analyze error patterns, and understand where the model excels or struggles.🎯 Key Features1. Live Problem Solver
Input box for users to enter custom math problems
Real-time step-by-step solution display
Shows intermediate reasoning steps
Displays final answer with confidence score
2. Batch Evaluation Interface
Upload custom test sets (JSON/CSV)
Run evaluation on GSM8K test set (with configurable sample size)
Live progress bar during evaluation
Save evaluation results for later analysis
3. Performance Metrics Dashboard
Overall Accuracy (with visual gauge/meter)
Accuracy by Problem Type:
Addition/Subtraction
Multiplication/Division
Multi-step word problems
Percentage/Ratio problems
Accuracy by Difficulty:
Easy (1-2 steps)
Medium (3-4 steps)
Hard (5+ steps)
Charts/graphs showing performance trends