Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(metrics): ✨ initial mean average recall metrics added #1656

Closed
wants to merge 8 commits into from

Conversation

onuralpszr
Copy link
Collaborator

Description

Initial mean average recall, metrics added. fixed #1583

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

How has this change been tested, please provide a testcase or example of how you tested the change?

Docs

  • Docs updated? What were the changes:

@onuralpszr onuralpszr self-assigned this Nov 6, 2024
Signed-off-by: Onuralp SEZER <thunderbirdtr@gmail.com>
@LinasKo
Copy link
Collaborator

LinasKo commented Nov 6, 2024

Thanks for matching the structure of prior metrics, It'll be much easier to review.

- Add max_detections parameter [1, 10, 100] for AR@k evaluation
- Update recall calculation to use standard object detection metrics
- Calculate maximum recall across different k values
- Use standard area thresholds for object size categories (32², 96²)

Signed-off-by: Onuralp SEZER <thunderbirdtr@gmail.com>
Signed-off-by: Onuralp SEZER <thunderbirdtr@gmail.com>
Signed-off-by: Onuralp SEZER <thunderbirdtr@gmail.com>
Copy link
Collaborator

@LinasKo LinasKo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Son of a gun, you actually did it from first principles 😄

I made a handful of docs-related changes, removed a bit of dead code. A few things feel slightly off about the algo, and I left the comments - happy to have a call / discussion to figure out how it should work (I might definitely be wrong).

pred_mask = (
predictions.class_id == class_id
if not self._class_agnostic
else slice(None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I learnt something today, thank you! This might be a key to a more elegant class_agnostic implementation.

However, I think this isn't needed as when we're in class-agnostic mode, we're setting class_id to -1.

supervision/metrics/mean_average_recall.py Outdated Show resolved Hide resolved

class_iou = iou[target_mask]
if any(pred_mask):
class_iou = class_iou[:, pred_mask]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General comment: I don't see the sorting of predictions by confidence anywhere. I believe we should evaluate the top-1, top-10, top-100 most confident predictions.

supervision/metrics/mean_average_recall.py Show resolved Hide resolved
supervision/metrics/mean_average_recall.py Outdated Show resolved Hide resolved
Signed-off-by: Onuralp SEZER <thunderbirdtr@gmail.com>
@LinasKo
Copy link
Collaborator

LinasKo commented Nov 7, 2024

I got an implementation out, and it's very similar to Torchmetrics, typically diverging by ~0.02 or so, up to 0.1.

In hindsight, my thinking about what max_detections is was incorrect yesterday. mAR @ 1 only considers 1 detection, but does NOT drop any targets. Therefore, the recall is almost always very small. mAR@1 <= mAR@10 <= mAR@100.

To get mAR of >0.5 is pretty tough - I bet values we've seen yesterday indicate that mAR@1 is too high.

@LinasKo
Copy link
Collaborator

LinasKo commented Nov 7, 2024

My filtering is implemented in #1661, mean_average_precision.py, _compute_confusion_matrix. Check the max_detections variable.

@onuralpszr onuralpszr closed this Nov 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Metrics: Mean Average Recall (mAR)
2 participants