Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[perf] Update Stress Test to avoid false sharing #5985

Merged
merged 3 commits into from
Nov 18, 2024

Conversation

utpilla
Copy link
Contributor

@utpilla utpilla commented Nov 18, 2024

Motivation for this PR:

  • The current implementation of Stress Test involves each worker thread to update the long[] statistics array
  • The worker threads operate on different memory location as each of them have a dedicated index that they use to update the statistics array
  • However, this suffers from false sharing. Even though the individual threads never operate on the same memory location, the memory locations they access are close enough to each other to end up in the same cache line

Changes in the PR:

  • Introduce a padded struct MeasurementData to record updates from each worker thread
  • MeasurementData struct has a size of 128 bytes (16 fields * 8 bytes) which ensures that when we create an array of MeasurementData the individual items end up on different cache lines (most systems would have a cache line size <= 128 bytes)

Stress Test Results

Machine details:

Windows 11 (10.0.26100.2314) (Hyper-V)
Intel Xeon Platinum 8370C CPU 2.80GHz, 1 CPU, 16 logical and 8 physical cores
.NET SDK 9.0.100

I ran the stress test for two scenarios to compare the difference.

  1. When stress testing an empty no-op function, the throughput improved 1.5 times (went up from ~4.5 B to ~7 B loops per second)
  2. When stress testing counter metrics, the throughput improved 1.45 times (went up from ~40 M to ~58 M loops per second)

Merge requirement checklist

  • CONTRIBUTING guidelines followed (license requirements, nullable enabled, static analysis, etc.)
  • Unit tests added/updated
  • Appropriate CHANGELOG.md files updated for non-trivial changes
  • Changes in public API reviewed (if applicable)

@utpilla utpilla requested a review from a team as a code owner November 18, 2024 07:21
@github-actions github-actions bot added perf Performance related pkg:OpenTelemetry Issues related to OpenTelemetry NuGet package labels Nov 18, 2024
Copy link

codecov bot commented Nov 18, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 84.56%. Comparing base (2ae01a7) to head (893da78).
Report is 1 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #5985      +/-   ##
==========================================
+ Coverage   84.50%   84.56%   +0.05%     
==========================================
  Files         271      271              
  Lines       12355    12355              
==========================================
+ Hits        10441    10448       +7     
+ Misses       1914     1907       -7     
Flag Coverage Δ
unittests-Project-Experimental 84.53% <ø> (+0.21%) ⬆️
unittests-Project-Stable 84.53% <ø> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

see 2 files with indirect coverage changes

---- 🚨 Try these New Features:

Copy link
Member

@cijothomas cijothomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

(OTel Rust's stress tests already used padding to achieve this goal, but still it has lower throughput. More work for OTel Rust folks to improve perf🤣 )

I'll run stress tests and share results for main vs this PR, for any easy ref in future!

@cijothomas
Copy link
Member

.NET 8, Windows

Main
Logs - 52
Traces - 1.9
Metrics - 35

PR
Logs - 58
Traces - 1.8
Metrics - 38

@CodeBlanch CodeBlanch changed the title Update Stress Test to avoid false sharing [perf] Update Stress Test to avoid false sharing Nov 18, 2024
@CodeBlanch CodeBlanch merged commit 913bccf into open-telemetry:main Nov 18, 2024
30 checks passed
@utpilla
Copy link
Contributor Author

utpilla commented Nov 18, 2024

.NET 8, Windows

Main Logs - 52 Traces - 1.9 Metrics - 35

PR Logs - 58 Traces - 1.8 Metrics - 38

@cijothomas Could you also mention if the numbers posted for Metrics were for the Counter or Histogram test? By default, the stress test for metrics is run for histogram.

@cijothomas
Copy link
Member

.NET 8, Windows
Main Logs - 52 Traces - 1.9 Metrics - 35
PR Logs - 58 Traces - 1.8 Metrics - 38

@cijothomas Could you also mention if the numbers posted for Metrics were for the Counter or Histogram test? By default, the stress test for metrics is run for histogram.

Histogram. I ran the stress test as default only.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
perf Performance related pkg:OpenTelemetry Issues related to OpenTelemetry NuGet package
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants