An AI-powered desktop app that automatically solves Spellcast word puzzles for you.
Spellcast.Agent.Demo.Video.mp4
This app can:
- 📸 Take screenshots of your Spellcast game
- 🤖 Uses OCR to read the letters in the grid
- 🎯 Finds the highest-scoring word using DFS with prefix comparison
- 🖱️ GUI automation to automatically trace the solution
- 🔄 Agentic mode to play an entire game through without the press of a single button
-
Install and run
git clone https://github.com/pravinl23/SpellcastSolver.git cd SpellcastSolver pip install -r requirements.txt python main/app.py -
Use keyboard shortcuts while playing Spellcast:
⌘+1- Set up screen region (first time only)⌘+2- Take screenshot⌘+3- Show solution overlay⌘+4- Auto-solve (moves mouse automatically)⌘+5- Agentically play⌘+K- Stop agentic mode
The app combines several technologies:
- Computer Vision: Custom AI model trained to recognize letters
- Word Finding: Smart algorithm to find the best scoring words
- Automation: Controls your mouse to execute the solution
- macOS, Windows, or Linux
- Python 3.8+
- Webcam/screen access permissions
This repository documents the training and evaluation of a custom YOLOv8s object detection model for recognizing Spellcast game letters (A-Z + background) from grid screenshots. The model was trained on a hand-labeled dataset and evaluated using a comprehensive suite of metrics and visualizations.
Training this model was a highly iterative process. I curated and labeled a diverse dataset of Spellcast game grids, ensuring a wide variety of backgrounds, lighting conditions, and letter placements. The model was trained for 78 epochs with extensive data augmentation to maximize generalization. Throughout training, I closely monitored loss curves, precision, recall, and mAP metrics to avoid overfitting and ensure robust performance.
-
Normalized Confusion Matrix

This matrix shows the proportion of correct and incorrect predictions for each class. Most letters are classified with near-perfect accuracy, with minor confusion between visually similar letters and the background. -
Raw Confusion Matrix

The raw counts highlight the distribution of predictions. The diagonal dominance confirms strong model performance, with only a handful of misclassifications.
-
F1-Confidence Curve

The F1 score remains high across a wide range of confidence thresholds, peaking at 0.98, indicating excellent balance between precision and recall. -
Precision-Confidence Curve

Precision approaches 1.0 at high confidence, showing the model is highly reliable when it is confident in its predictions. -
Recall-Confidence Curve

Recall also remains high, meaning the model rarely misses true positives. -
Precision-Recall Curve

The area under the curve (AUC) is nearly 1.0, with mAP@0.5 reaching 0.995, demonstrating outstanding overall detection quality.
- Label Distribution & Anchor Placement

This plot shows the frequency of each letter class and the spatial distribution of bounding boxes. The dataset is reasonably balanced, and anchors are well-placed, supporting robust detection across the grid.
- Training & Validation Losses, Metrics

Losses decrease smoothly, and metrics (precision, recall, mAP) improve steadily, indicating stable and effective training.
-
These images show random training batches with ground truth and predicted bounding boxes, demonstrating the model's ability to localize and classify letters accurately.
-
Validation Batch (Labels vs. Predictions)


These side-by-side comparisons on validation data confirm the model's strong generalization to unseen grids.
- Results CSV
The full training log is available inresults.csv, which records epoch-by-epoch metrics including loss, precision, recall, and mAP.
- High Accuracy: The model achieves near-perfect precision and recall on the validation set, with mAP@0.5 of 0.995.
- Robust Generalization: Consistent performance across batches and minimal confusion between classes.
- Effective Training: Smooth loss curves and steadily improving metrics indicate a well-tuned training process.
Training this model required careful dataset preparation, hyperparameter tuning, and continuous monitoring of metrics. The visualizations provided deep insights into model behavior, helping to identify and address rare misclassifications. The result is a highly reliable letter detector, ready for integration into the Spellcast AI Solver pipeline.
To ensure the OCR model is always available for the solver, I deployed it on AWS EC2 using FastAPI and Uvicorn. The model runs in a persistent tmux session, ensuring 24/7 availability even after SSH disconnections. The desktop application makes HTTP requests to this endpoint whenever it needs to process a new game grid.
The deployment stack includes:
- FastAPI for the web server
- Uvicorn as the production server
- tmux for persistent process management
This setup allows the desktop application to offload the computationally intensive OCR tasks to the cloud, while maintaining low latency for real-time game solving.





