Skip to content

Conversation

@shashank10456
Copy link
Contributor

@shashank10456 shashank10456 commented Sep 10, 2024

Explain what this PR does.

This PR supports Anomaly Detection on fields that use valuesDoubleSketches. We add aggregations and postaggregations which run natively on druid. These sketches are converted to values using these postaggregations and are run on druid.
This would enable us to use anomaly detection for inputs using sketches(https://datasketches.apache.org/). For example, latency based anomaly.

Also, I have made few changes to DockerFile and added a patch for Numalogic 0.9.1 to avoid CVE issues. This is important for the perf-anomaly team to avoid moving to Numaflow 1.2.1 and updating all the UDFs and UDSinks. This would help them save lot of time by just upgrading the ML vertices.

@codecov
Copy link

codecov bot commented Sep 10, 2024

Codecov Report

Attention: Patch coverage is 90.00000% with 1 line in your changes missing coverage. Please review.

Project coverage is 92.17%. Comparing base (f29f771) to head (883a32c).
Report is 20 commits behind head on main.

Files with missing lines Patch % Lines
numalogic/connectors/_config.py 90.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #409      +/-   ##
==========================================
- Coverage   93.07%   92.17%   -0.90%     
==========================================
  Files          97       97              
  Lines        4492     4781     +289     
  Branches      387      430      +43     
==========================================
+ Hits         4181     4407     +226     
- Misses        231      276      +45     
- Partials       80       98      +18     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@qhuai
Copy link

qhuai commented Sep 10, 2024

Please replace "Explain what this PR does." with the real description & purpose of this PR.

@shashank10456 shashank10456 changed the title Support trainer perf anomaly 4 Support training for latency based anomalies specifically for perf-anomaly Sep 12, 2024
if not self.post_aggregations:
self.post_aggregations = {
"p90": _post_agg.QuantilesDoublesSketchToQuantile(
output_name="agg_out", field=postaggregator.Field("agg_out"), fraction=0.90
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a library feature, percentile is better to be configurable.

if not self.aggregations:
self.aggregations = {"count": doublesum("count")}
self.aggregations = {
"agg_out": _agg.quantiles_doubles_sketch("valuesDoublesSketch", "agg0", 64)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this value 64 imply? Does it need to be configurable for different latency anomaly use cases?


ARG PYTHON_VERSION=3.11
FROM python:${PYTHON_VERSION}-slim-bookworm AS builder
FROM python:${PYTHON_VERSION}-bookworm AS builder
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The -slim-bookworm here replaced with -bookworm. However, the same -slim-bookworm at line 31 remains unchanged. Why is the difference?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants