Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MIR_EVAL_392 : added SegmentType, optimized segment.evaluate checks #393

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Mintas
Copy link

@Mintas Mintas commented Nov 11, 2024

  1. Added SegmentType enum, allows to use SegmentType.BOUNDARY and provide list of breakpoints instead of intervals into methods deviation and detection; fixes Segment: make it possible to provide boundaries, not only intervals #392
  2. Refactored some validations to slightly optimize evaluate method
  3. Reduced copy-paste, increasing readability

@craffel
Copy link
Collaborator

craffel commented Nov 11, 2024

@bmcfee as our chief segment officer, any thoughts?

@bmcfee
Copy link
Collaborator

bmcfee commented Nov 15, 2024

A few thoughts:

  1. We did make a conscious decision to promote intervals as the fundamental unit of segmentation, not boundaries. I'm not sure if this is documented in the issues - i couldn't find it at a quick search - but the thinking here is that one can always convert intervals to boundaries, but the reverse is not always possible (without extra information like track duration). Defining all metrics from intervals made it easy to define a clear and consistent API. (As a matter of personal taste, I think time intervals should be the fundamental unit of segmentation, as well as most other tasks in MIR, but that's a very long rant.)
  2. I see why one would use an enum type for this, but it's unusual in the context of mir_eval since no other submodule does this. If we really wanted to support boundaries as input, we could just as well accomplish it by inferring intent from the array shape: ndim=2 would be intervals and ndim=1 would be boundaries. It's obviously less explicit, but a bit more consistent with the rest of the package.
  3. Alternatively, if we did want to support boundary input, we could do so by expanding the function declaration to accept boundaries in addition to / instead of intervals, so that everything is explicit.
  4. One higher level challenge here is that providing only boundaries to the evaluator will mean that only some of the metrics are well defined, so it would have to produce a reduced set of scores. This is of course possible to do, but it's again unusual and not something that other submodules do.

@Mintas
Copy link
Author

Mintas commented Nov 26, 2024

Thank you for your quick response!

It took me some time to come up with a good solution and I decided to split this PR into two independent parts.
The refactoring and optimization of existing logic without any major changes and any additional functionality now reside in this PR: #395

(1) I do agree, that intervals represent the right unit of segmentation, though there are some subtasks of segmentation process that also require proper evaluation, like unsupervised segmentation and boundary detection itself. Some metrics of segment.eval (mostly detection and deviation) come really handy in those applications. Although some (definitely not all) libraries providing algorithm implementations for this kind of tasks expose some api to validate results, there is lack of well-defined standard for them.
In my own research that would be of big help, since I try to build an incremental solution for labeled structured segmentation.
(2) and (3). Nice alternatives! I think, that option with inferring is quite nice, IMO is better than with an extra argument: extra arguments introduce extra complexity for clients of code, and it looks like that library, providing reference solution should stay as user-friendly as possible.
(4) Yes, I don't have a good answer to this challenge yet, but the idea is that with giving just a little more flexibility in library code it is possible to allow users to make their own evaluation pipelines, combining only required metrics with required parameters, may be using library code as a template reference for implementation. Another idea is to implement builder-like classes to handle that templating in the library itself. But that needs more design overview.

P.S.
I am really fond of your, Brian, and Collin's work. Your research and contributions to open source are very inspiring!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Segment: make it possible to provide boundaries, not only intervals
3 participants