Skip to content

Conversation

@Sheiphan
Copy link

@Sheiphan Sheiphan commented May 26, 2025

I have just started the evaluation. WIP

@sergiopaniego
Copy link
Collaborator

Nice!
Let's keep the evaluation simple simple for now. we can add more features in future PRs. This way, it's easier for us to review and we can iterate faster 😄
Let's also add some explanation about the coverage in the PR description. You can open a corresponding issue too.
Feel free to ping me once it's ready for review

@Sheiphan
Copy link
Author

Nice! Let's keep the evaluation simple simple for now. we can add more features in future PRs. This way, it's easier for us to review and we can iterate faster 😄 Let's also add some explanation about the coverage in the PR description. You can open a corresponding issue too. Feel free to ping me once it's ready for review

1. evaluate.pyBasic Evaluation

  • Purpose: Quick check of model performance.

  • Metrics:

    • Precision, Recall, F1-score
    • Avg IoU (overlap between predicted & ground truth boxes)
    • mAP@0.5 (simplified)
    • Per-category Avg IoU (assumes single category)
  • How it works:

    • Runs model on test images
    • Matches predicted boxes to ground truths using IoU ≥ 0.5
    • Computes TP, FP, FN → Calculates metrics

2. evaluate_advanced.pyCOCO-style Evaluation

  • Purpose: Detailed, standardized evaluation

  • Metrics:

    • mAP@[0.5:0.05:0.95], mAP@0.5, mAP@0.75
    • Per-category AP@0.5
    • Precision/Recall at IoU 0.5 & 0.75
    • Avg IoU
  • How it works:

    • Evaluates predictions over multiple IoU thresholds
    • Computes precision-recall curves & AP per class
    • Averages APs → Final mAP

3. run_evaluation.pyEvaluation Runner

  • Purpose: Run either or both evaluations via CLI

  • Options:

    • --mode: basic, advanced, or both
    • --output: Save results as JSON
  • Example:

    python run_evaluation.py --mode advanced --output results/eval.json

Copy link
Collaborator

@sergiopaniego sergiopaniego left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

Could you explain the purpose of having two different evaluation files? We aim for a simple evaluation (just one file), and we can add more functionality afterwards.

Can we see the results for the current model? We could add those to the documentation and explain how to run the evaluation.

@Sheiphan
Copy link
Author

The idea I had in mind was to have a script do quick evaluation with basic Metrics, like during model development and frequent iterations and for quick sanity checks. Calculating: Precision, Recall, F1-score, and AP at IoU=0.5 (AP50 or mAP@0.5).

And another one like during final model validation or when comparing against benchmarks. Calculating: mAP across multiple IoU thresholds (0.50–0.95), along with AP@0.5, AP@0.75, and optionally per-category performance.

I can have one evaluate.py script with two modes if that okay, or combine all the metrics.

Can be used in the CLI like this:

python evaluate.py --mode basic
python evaluate.py --mode advanced

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants