FEAT: Basic and Advanced Evaluation: WIP #14

Sheiphan · 2025-05-26T13:31:32Z

I have just started the evaluation. WIP

sergiopaniego · 2025-05-26T13:52:38Z

Nice!
Let's keep the evaluation simple simple for now. we can add more features in future PRs. This way, it's easier for us to review and we can iterate faster 😄
Let's also add some explanation about the coverage in the PR description. You can open a corresponding issue too.
Feel free to ping me once it's ready for review

Sheiphan · 2025-05-26T14:45:15Z

Nice! Let's keep the evaluation simple simple for now. we can add more features in future PRs. This way, it's easier for us to review and we can iterate faster 😄 Let's also add some explanation about the coverage in the PR description. You can open a corresponding issue too. Feel free to ping me once it's ready for review

1. `evaluate.py` – Basic Evaluation

Purpose: Quick check of model performance.
Metrics:
- Precision, Recall, F1-score
- Avg IoU (overlap between predicted & ground truth boxes)
- mAP@0.5 (simplified)
- Per-category Avg IoU (assumes single category)
How it works:
- Runs model on test images
- Matches predicted boxes to ground truths using IoU ≥ 0.5
- Computes TP, FP, FN → Calculates metrics

2. `evaluate_advanced.py` – COCO-style Evaluation

Purpose: Detailed, standardized evaluation
Metrics:
- mAP@[0.5:0.05:0.95], mAP@0.5, mAP@0.75
- Per-category AP@0.5
- Precision/Recall at IoU 0.5 & 0.75
- Avg IoU
How it works:
- Evaluates predictions over multiple IoU thresholds
- Computes precision-recall curves & AP per class
- Averages APs → Final mAP

3. `run_evaluation.py` – Evaluation Runner

Purpose: Run either or both evaluations via CLI
Options:
- --mode: basic, advanced, or both
- --output: Save results as JSON

Example:

python run_evaluation.py --mode advanced --output results/eval.json

sergiopaniego

Thanks for the PR!

Could you explain the purpose of having two different evaluation files? We aim for a simple evaluation (just one file), and we can add more functionality afterwards.

Can we see the results for the current model? We could add those to the documentation and explain how to run the evaluation.

Sheiphan · 2025-05-27T17:31:07Z

The idea I had in mind was to have a script do quick evaluation with basic Metrics, like during model development and frequent iterations and for quick sanity checks. Calculating: Precision, Recall, F1-score, and AP at IoU=0.5 (AP50 or mAP@0.5).

And another one like during final model validation or when comparing against benchmarks. Calculating: mAP across multiple IoU thresholds (0.50–0.95), along with AP@0.5, AP@0.75, and optionally per-category performance.

I can have one evaluate.py script with two modes if that okay, or combine all the metrics.

Can be used in the CLI like this:

python evaluate.py --mode basic
python evaluate.py --mode advanced

FEAT: Basic and Advanced Evaluation: WIP

a633a82

sergiopaniego reviewed May 27, 2025

View reviewed changes

This was referenced May 27, 2025

[Contributions Welcome] Improving Our Fine-Tuning Pipeline #12

Open

Includes evaluation.py #22

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

FEAT: Basic and Advanced Evaluation: WIP #14

FEAT: Basic and Advanced Evaluation: WIP #14

Sheiphan commented May 26, 2025 •

edited

Loading

Uh oh!

sergiopaniego commented May 26, 2025

Uh oh!

Sheiphan commented May 26, 2025

Uh oh!

sergiopaniego left a comment

Uh oh!

Sheiphan commented May 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

FEAT: Basic and Advanced Evaluation: WIP #14

Are you sure you want to change the base?

FEAT: Basic and Advanced Evaluation: WIP #14

Conversation

Sheiphan commented May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sergiopaniego commented May 26, 2025

Uh oh!

Sheiphan commented May 26, 2025

1. evaluate.py – Basic Evaluation

2. evaluate_advanced.py – COCO-style Evaluation

3. run_evaluation.py – Evaluation Runner

Uh oh!

sergiopaniego left a comment

Choose a reason for hiding this comment

Uh oh!

Sheiphan commented May 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Sheiphan commented May 26, 2025 •

edited

Loading

1. `evaluate.py` – Basic Evaluation

2. `evaluate_advanced.py` – COCO-style Evaluation

3. `run_evaluation.py` – Evaluation Runner