-
Notifications
You must be signed in to change notification settings - Fork 437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metric refactor - 2x perf and allocation free #1989
Metric refactor - 2x perf and allocation free #1989
Conversation
} | ||
} | ||
} | ||
|
||
impl<T: Number<T>> ValueMap<T> { | ||
fn measure(&self, measurement: T, attrs: AttributeSet) { | ||
fn measure(&self, measurement: T, attrs: &[KeyValue]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the key - we operate on incoming ref/slice itself without allocating, copying, sorting, de-duplicating. If the look up fails, the "slow" path will do everything as before. The net effect is, for users who use metrics API normally (i.e without duplicates, and without changing the order of attributes), we'll have highest performance and zero heap allocations.
has_no_value_attribute_value: AtomicBool, | ||
no_attribute_value: T::AtomicTracker, | ||
count: AtomicUsize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
count is tracked separately, as number of entries in HashMap is no longer accurate, as we insert original+sorted in hashmap.
|
||
/// The storage for sums. | ||
struct ValueMap<T: Number<T>> { | ||
values: RwLock<HashMap<AttributeSet, T::AtomicTracker>>, | ||
values: RwLock<HashMap<Vec<KeyValue>, Arc<T::AtomicTracker>>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The metric value is Arc
shared to ensure attributes provided in any order results in contributing to same timeseries. Arc equality is also leveraged in export time to de-dup entries.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1989 +/- ##
=====================================
Coverage 75.0% 75.1%
=====================================
Files 122 122
Lines 20562 20633 +71
=====================================
+ Hits 15430 15497 +67
- Misses 5132 5136 +4 ☔ View full report in Codecov by Sentry. |
value: value.get_value(), | ||
exemplars: vec![], | ||
}); | ||
if seen.insert(Arc::as_ptr(&value)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dedup sorted+original attributes!
@utpilla I leveraged Arc pointer to de-dup. Storing metric points in a different array was tried as well, but this looks easier to me.
@@ -925,6 +925,62 @@ mod tests { | |||
assert_eq!(data_point1.value, 6); | |||
} | |||
|
|||
#[tokio::test(flavor = "multi_thread", worker_threads = 1)] | |||
async fn counter_aggregation_attribute_ordering() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added this to investigate some issue, that was false alarm. Keeping this test for now. It is not that useful as-is, but can be augmented in subsequent PRs to make sure we cover all scenarios via testing.
let n = values.len() + 1; | ||
// Max number of data points need to account for the special casing | ||
// of the no attribute value + overflow attribute. | ||
let n = self.value_map.count.load(Ordering::SeqCst) + 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this calculation is important for correctness of the export. If it is, then we should do this under the self.value_map.values.write()
lock. Otherwise, we read a certain value here and before we reserve capacity for data points, some update thread(s) could enter the write lock and increment the count further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not look important for correctness. Even if it is off by few, it just results in vec resizing in the middle of collect.
We can revisit this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work!
Part 2 of #1740, addressing 1,2,3. Also closes #1954 for SUM.
Refactoring SUM aggregation logic to operate entirely on user provided slice of KeyValues. In hot-path, we avoid allocating, copying, sorting, de-duplicating., leading to 2X or more gains (with 4 attributes. More attributes = much more gains), as long as user do not keep changing the order of attributes.
This is mostly based on the prototypes did outside this repo in https://github.com/cijothomas/metrics-mini/tree/main/metrics
Once the approach is settled, we can replicate to all aggregations. (This PR is only fixing SUM)
Benchmarks:
Stress Test also shows small gains (17M/sec to 20M/sec). #1833 fixed the major contention, so this PR has limited impact on stress test, but the raw speed is significantly improved.
Stress Test with memory stats:
Main:
Throughput: 17,842,800 iterations/sec
Memory usage: 5.06 MB
CPU usage: 98.61875%
Virtual memory usage: 2249.73 MB
PR:
Throughput: 18,869,600 iterations/sec
Memory usage: 4.23 MB
CPU usage: 99.2016%
Virtual memory usage: 2249.55 MB
Visible memory reduction as well from ~5 MB to ~4.2 MB
note: Overflow perf regressed, and that is acceptable. The main goal with overflow feature is to protect SDK from consuming unlimited memory, so it is acceptable that this path is slower. This is rare case, and indicates user needs to adjust cardinality.