Skip to content

Commit aad7edf

Browse files
authored
Remove chi2 thresholds for analysis & reference (#349)
Chi2 thresholding is based on p-values, which means the calculated thresholds are irrelevant. Previously thresholds were removed as part of alerting. As this is only applied on analysis data the threshold value was still available on reference data, which led to confusion. This commit changes that so the chi2 threshold values are removed across the entire dataset.
1 parent fa38d24 commit aad7edf

File tree

1 file changed

+8
-3
lines changed

1 file changed

+8
-3
lines changed

nannyml/drift/univariate/methods.py

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -443,6 +443,14 @@ def __init__(self, **kwargs) -> None:
443443
self._p_value: float
444444
self._fitted = False
445445

446+
def fit(self, reference_data: pd.Series, timestamps: Optional[pd.Series] = None) -> Self:
447+
super().fit(reference_data, timestamps)
448+
449+
# Thresholding is based on p-values. Ignoring all custom thresholding and disable plotting a threshold
450+
self.lower_threshold_value = None
451+
self.upper_threshold_value = None
452+
return self
453+
446454
def _fit(self, reference_data: pd.Series, timestamps: Optional[pd.Series] = None) -> Self:
447455
reference_data = _remove_nans(reference_data)
448456
self._reference_data_vcs = reference_data.value_counts().loc[lambda v: v != 0]
@@ -462,9 +470,6 @@ def _calculate(self, data: pd.Series):
462470
return stat
463471

464472
def alert(self, value: float):
465-
self.lower_threshold_value = None # ignoring all custom thresholding, disable plotting a threshold
466-
self.upper_threshold_value = None # ignoring all custom thresholding, disable plotting a threshold
467-
468473
return self._p_value < 0.05
469474

470475
def _calc_chi2(self, data: pd.Series):

0 commit comments

Comments
 (0)