13
13
14
14
class MedianCDFQuantileScorer (AnomalyScorer ):
15
15
"""Given an anomalous observation x and samples from the distribution of X, this score represents:
16
- score(x) = 1 - 2 * min[P(X >= x), P(X <= x)]
16
+ score(x) = 1 - 2 * min[P(X > x) + P(X = x) / 2, P(X < x) + P(X = x) / 2]
17
+
18
+ Comparing two NaN values are considered equal here.
17
19
18
20
It scores the observation based on the quantile of x with respect to the distribution of X. Here, if the
19
21
sample x lies in the tail of the distribution, we want to have a large score. Since we apriori don't know
@@ -27,7 +29,7 @@ class MedianCDFQuantileScorer(AnomalyScorer):
27
29
p(X >= x) = 1 / 7
28
30
P(X <= x) = 6 / 7
29
31
With the end score of:
30
- 1 - 2 * min[P(X >= x), P(X <= x)] = 1 - 2 / 7 = 0.71
32
+ 1 - 2 * min[P(X > x) + P(X = x) / 2 , P(X < x) + P(X = x) / 2 ] = 1 - 2 / 7 = 0.71
31
33
32
34
Note: For equal samples, we contribute half of the count to the left and half of the count the right side.
33
35
@@ -49,7 +51,7 @@ def score(self, X: np.ndarray) -> np.ndarray:
49
51
50
52
X = shape_into_2d (X )
51
53
52
- equal_samples = np .sum (X == self ._distribution_samples , axis = 1 )
54
+ equal_samples = np .sum (np . isclose ( X , self ._distribution_samples , rtol = 0 , atol = 0 , equal_nan = True ) , axis = 1 )
53
55
greater_samples = np .sum (X > self ._distribution_samples , axis = 1 ) + equal_samples / 2
54
56
smaller_samples = np .sum (X < self ._distribution_samples , axis = 1 ) + equal_samples / 2
55
57
@@ -60,7 +62,9 @@ def score(self, X: np.ndarray) -> np.ndarray:
60
62
61
63
class RescaledMedianCDFQuantileScorer (AnomalyScorer ):
62
64
"""Given an anomalous observation x and samples from the distribution of X, this score represents:
63
- score(x) = -log(2 * min[P(X >= x), P(X <= x)])
65
+ score(x) = -log(2 * min[P(X > x) + P(X = x) / 2, P(X < x) + P(X = x) / 2])
66
+
67
+ Comparing two NaN values are considered equal here.
64
68
65
69
This is a rescaled version of the score s obtained by the :class:`~dowhy.gcm.anomaly_scorers.MedianCDFQuantileScorer`
66
70
by calculating the negative log-probability -log(1 - s). This has the advantage that small differences in the
0 commit comments