Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intersection F1 calculation code #80

Open
astrocyted opened this issue Sep 4, 2023 · 4 comments
Open

Intersection F1 calculation code #80

astrocyted opened this issue Sep 4, 2023 · 4 comments

Comments

@astrocyted
Copy link

Since PSDS Eval package has been removed from github and so the support for it, is there a plan to have a separate standalone code for evaluation of this metric in the repo without having to import from a somewhat obscure psds_eval package that has been removed from github?

I was getting NaNs in my "per class F1 score" so i had to go through the psds_eval package only to discover that its due to the line:

        num_gts = per_class_tp / tp_ratios

where it calculates num_gts in this really bizarre way assuming tp_ratio never being zero(!) and hence yields the false negatives and F1 of all classes with zero TP, to become nan.

def compute_macro_f_score(self, detections, beta=1.):
        """Computes the macro F_score for the given detection table

        The DTC/GTC/CTTC criteria presented in the ICASSP paper (link above)
        are exploited to compute the confusion matrix. From the latter, class
        dependent F_score metrics are computed. These are further averaged to
        compute the macro F_score.

        It is important to notice that a cross-trigger is also counted as
        false positive.

        Args:
            detections (pandas.DataFrame): A table of system detections
                that has the following columns:
                "filename", "onset", "offset", "event_label".
            beta: coefficient used to put more (beta > 1) or less (beta < 1)
                emphasis on false negatives.

        Returns:
            A tuple with average F_score and dictionary with per-class F_score

        Raises:
            PSDSEvalError: if class instance doesn't have ground truth table
        """
        if self.ground_truth is None:
            raise PSDSEvalError("Ground Truth must be provided before "
                                "adding the first operating point")

        det_t = self._init_det_table(detections)
        counts, tp_ratios, _, _ = self._evaluate_detections(det_t)

        per_class_tp = np.diag(counts)[:-1]
        num_gts = per_class_tp / tp_ratios
        per_class_fp = counts[:-1, -1]
        per_class_fn = num_gts - per_class_tp
        f_per_class = self.compute_f_score(per_class_tp, per_class_fp,
                                           per_class_fn, beta)

        # remove the injected world label
        class_names_no_world = sorted(set(self.class_names
                                          ).difference([WORLD]))
        f_dict = {c: f for c, f in zip(class_names_no_world, f_per_class)}
        f_avg = np.nanmean(f_per_class)

        return f_avg, f_dict 

This behaviour by the way could easily lead to a significant overestimation of macro intersection F1 using this code, because if the model's output for a rare class yields zero TPs, the macro F1 in this package ignores it and calculates average across the rest of the classes.

So I think it would be helpful to have a more transparent and clean standalone code for intersection based F1 in the repo.

@turpaultn
Copy link
Collaborator

Hi,
First, the official psds_eval has just been transfered here : https://github.com/DCASE-REPO/psds_eval

This is not an "obscure" clone but the official one with peoepl who have developed it still answering question on their free time. This is a simple question of ownership since it has been developed at Audio Analytic. They transferred it to the DCASE-REPO so the community could still access it even after their acquisition (without having to go on an 'obscure' repo as you said).

About your question on the True Positive Ratio, can you please move it to the other repo please ?
https://github.com/DCASE-REPO/psds_eval/issues

I remember having conversations about your problem since we had the same one, I don't recall what has been the final word (I'll ask around).
Contributions are more than welcome, so we could probably add an option to avoid this problem (you could create a pull request if you want 😉)

Best,

@astrocyted
Copy link
Author

Hi,
Thanks for the reply.
Good to know that the source code for this metric is actively maintained! I could not find the repo on github, and the link you mentioned was not also linked to the page on Pypi (https://pypi.org/project/psds-eval/0.5.3/#description) and also in the DESED_task repo the psds_eval package is installed directly from Pypi without any link to the source code that is being maintained on DCASE-REPO. So there was no obvious way to know if that is where the psds_eval is maintained.

Also it turns out what i was trying to mean in the previous post was more close to "stale" than 'obscure'. I'm not a native speaker so excuse my english.

regarding the issue i was facing, replacing the line

        num_gts = per_class_tp / tp_ratios

with

        num_gts = self._get_dataset_counts().values[:-1].astype(float)

seems to resolve the issue.

Best.

@turpaultn
Copy link
Collaborator

Hi,
Indeed that's a good point, I should take some time to update it since they also gave me ownership on the Pypi repo.
I've added the Pypi version on github so it would be easier to make the link.
On pypi the link to the github repo is already in the "installation" section.

If we do that, we should probably comment why we skip the last value (added by PSDS). Maybe writing a "if" method is cleaner ?

Could you please open the issue on the other repo so we can discuss it there and open a pull request.

@astrocyted
Copy link
Author

I just created the issue on the psds_eval repo, where i mentioned the logic for slicing [:-1] too.

While we're on the subject of intersection F1 calculation , another issue that actually concerns this repo is the following line:

    psds_macro_f1 = np.mean(psds_macro_f1) 

psds_macro_f1 = np.mean(psds_macro_f1)

I dont really understand why the F-1 s calculated over different threshold values are being averaged here. what would that even mean? is there for example a reference in the literature for this measure?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants