Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Anomaly detection. Detection period value is included in the calculation of the AVG value for the training period #1560

Open
winkel163 opened this issue Jun 21, 2024 · 1 comment

Comments

@winkel163
Copy link

Describe the bug
We had a case recently when a monitored value increased by around 25%, but anomaly detection test secceeded with anomaly_sensitivity parameter set to 2, anomaly_direction to both.

image

It was also unclear where Elementary takes the AVG value from. After some research, I found out that when calculating the AVG value, along with the traning period values, the detection period value is also taken into account.

As a result, the value to be examined affected the AVG value. I am not sure if this is the correct behavour.

In the picture, the value of TRAINING_AVG , highlighted in yellow, is calculated as the AVG for the 3 previous results PLUS the current one:

image

There may be an inaccuracy in the window function.
I also found similar topic in the Slack channel

image

Expected behavior
Elementary anomaly detection test failes.

Environment (please complete the following information):

  • Elementary dbt package version: 0.15.2
  • Data warehouse: Snowflake
@tcassou
Copy link

tcassou commented Oct 11, 2024

Hello there, and thanks for providing this great package!

We've ran into the same situation and conclusion about excluding the detection period from the training dataset. From a prediction perspective, training on the detection period can have a big impact on the anomaly score, and therefore affect the results (false positives / false negatives). I believe it's a common practice to guarantee that train / test sets are fully separated so that no test data "leaks" into the model and biases it.

This issue is probably the same as this one BTW: #1491

Are you planning to address this issue in the near future? Many thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants