Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix handling single class in chunk for CBPE #384

Merged
merged 3 commits into from
May 2, 2024
Merged

Conversation

michael-nml
Copy link
Collaborator

This PR fixes an error when calculating business value, confusion matrix & specificity for binary classification problems where a chunk only contains 1 class.

Previously this would fail with:

nannyml.exceptions.CalculatorException: failed while fitting nannyml.performance_estimation.confidence_based.cbpe.CBPE.
not enough values to unpack (expected 4, got 1)

This happens because the sklearn.metrics.confusion_matrix function NannyML uses internally bases its output on the number of classes present in the input. If only a single class is present, only 1 value is returned where we normally expect 4 for a binary classification problem. This PR resolves this by explicitly providing the expected classes in the labels argument. These expected classes are currently hard-coded as [0, 1] but we may want to change this to derive values from the input if/when we support string-based classes for binary classification.

Additionally, this PR resolves an issue with F1 sampling error calculation when there are no positive cases present in the input. This previously resulted in a ZeroDivisionError. Now it resolves the NaN sampling error.

The `confusion_matrix` function used in various CBPE metrics returns
values for each class/label present in the input. For binary
classification this means we expect 4 values (TP, FP, FN, TN). However
if only one class is represented in the input, the function will only
return one value.

This commit addresses that failure case by explicitly providing the
expected labels to the `confusion_matrix` function. Currently these
values are hard-coded for binary classification, but we may want to
derive them from the input later on if we were to support string-based
pass/fail classes.
Copy link

codecov bot commented May 2, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 78.67%. Comparing base (13ace29) to head (730ae35).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #384      +/-   ##
==========================================
+ Coverage   78.52%   78.67%   +0.15%     
==========================================
  Files         110      110              
  Lines        8562     8567       +5     
  Branches     1522     1523       +1     
==========================================
+ Hits         6723     6740      +17     
+ Misses       1476     1468       -8     
+ Partials      363      359       -4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@nnansters nnansters merged commit d064916 into main May 2, 2024
7 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants