Fix handling single class in chunk for CBPE #384
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR fixes an error when calculating business value, confusion matrix & specificity for binary classification problems where a chunk only contains 1 class.
Previously this would fail with:
This happens because the
sklearn.metrics.confusion_matrix
function NannyML uses internally bases its output on the number of classes present in the input. If only a single class is present, only 1 value is returned where we normally expect 4 for a binary classification problem. This PR resolves this by explicitly providing the expected classes in thelabels
argument. These expected classes are currently hard-coded as[0, 1]
but we may want to change this to derive values from the input if/when we support string-based classes for binary classification.Additionally, this PR resolves an issue with F1 sampling error calculation when there are no positive cases present in the input. This previously resulted in a
ZeroDivisionError
. Now it resolves theNaN
sampling error.