Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VReplication: Add throttler stats #15221

Merged
merged 11 commits into from
Feb 19, 2024

Conversation

mattlord
Copy link
Contributor

@mattlord mattlord commented Feb 13, 2024

Description

This pairs with #15223 to increase users' ability to observe how the tablet throttler and vreplication interact over time in order to understand the system and explain/debug observed behaviors/results.

Related Issue(s)

Checklist

  • "Backport to:" labels have been added if this change should be back-ported to release branches
  • If this change is to be back-ported to previous releases, a justification is included in the PR description
  • Tests were added or are not required
  • Did the new or modified tests pass consistently locally and on CI?
  • Documentation: Document new vreplication throttler stats website#1692

Copy link
Contributor

vitess-bot bot commented Feb 13, 2024

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • Ensure there is a link to an issue (except for internal cleanup and flaky test fixes), new features should have an RFC that documents use cases and test cases.

Tests

  • Bug fixes should have at least one unit or end-to-end test, enhancement and new features should have a sufficient number of tests.

Documentation

  • Apply the release notes (needs details) label if users need to know about this change.
  • New features should be documented.
  • There should be some code comments as to why things are implemented the way they are.
  • There should be a comment at the top of each new or modified test to explain what the test does.

New flags

  • Is this flag really necessary?
  • Flag names must be clear and intuitive, use dashes (-), and have a clear help text.

If a workflow is added or modified:

  • Each item in Jobs should be named in order to mark it as required.
  • If the workflow needs to be marked as required, the maintainer team must be notified.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • RPC changes should be compatible with vitess-operator
  • If a flag is removed, then it should also be removed from vitess-operator and arewefastyet, if used there.
  • vtctl command output order should be stable and awk-able.

@vitess-bot vitess-bot bot added NeedsBackportReason If backport labels have been applied to a PR, a justification is required NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsIssue A linked issue is missing for this Pull Request NeedsWebsiteDocsUpdate What it says labels Feb 13, 2024
@github-actions github-actions bot added this to the v20.0.0 milestone Feb 13, 2024
@mattlord mattlord force-pushed the vrepl_throttler_stats branch from feaaecb to d83a882 Compare February 13, 2024 19:09
Copy link

codecov bot commented Feb 13, 2024

Codecov Report

Attention: 16 lines in your changes are missing coverage. Please review.

Comparison is base (696fe0e) 67.41% compared to head (14168b2) 67.46%.
Report is 26 commits behind head on main.

Files Patch % Lines
go/vt/vttablet/tabletmanager/vreplication/stats.go 47.61% 11 Missing ⚠️
...vttablet/tabletmanager/vreplication/vreplicator.go 0.00% 4 Missing ⚠️
go/vt/binlog/binlogplayer/binlog_player.go 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #15221      +/-   ##
==========================================
+ Coverage   67.41%   67.46%   +0.04%     
==========================================
  Files        1560     1561       +1     
  Lines      192752   193219     +467     
==========================================
+ Hits       129952   130352     +400     
- Misses      62800    62867      +67     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mattlord mattlord force-pushed the vrepl_throttler_stats branch 3 times, most recently from 2dad697 to fcd5240 Compare February 13, 2024 19:55
Signed-off-by: Matt Lord <mattalord@gmail.com>
@mattlord mattlord force-pushed the vrepl_throttler_stats branch from fcd5240 to 2b89132 Compare February 13, 2024 23:39
Copy link
Contributor

@shlomi-noach shlomi-noach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a good approach!

func (vr *vreplicator) updateTimeThrottled(appThrottled throttlerapp.Name) error {
at := appThrottled.String()
vr.stats.ThrottledCounts.Add([]string{"tablet", at}, 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we want to always update stats in a goroutine so as not to make the metrics themselves affect the code's flow (the metric introduces an atomic write).

Copy link
Contributor Author

@mattlord mattlord Feb 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't do that in vreplication today. I would expect this to be faster/lighter than a log message, which we don't typically do in a goroutine (which has its own overhead). Your usage of the function that creates the underlying stat resources if needed (in the tablet throttler) I think makes more sense to do in a goroutine like you are.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm about to get rid of that usage 😛

go/vt/vttablet/tabletmanager/vreplication/vreplicator.go Outdated Show resolved Hide resolved
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
@mattlord mattlord added Component: VReplication Component: Observability Pull requests that touch tracing/metrics/monitoring NeedsBackportReason If backport labels have been applied to a PR, a justification is required Component: Throttler Type: Enhancement Logical improvement (somewhere between a bug and feature) and removed NeedsBackportReason If backport labels have been applied to a PR, a justification is required labels Feb 14, 2024
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
@mattlord mattlord force-pushed the vrepl_throttler_stats branch from be6f198 to 9857538 Compare February 16, 2024 04:32
@mattlord mattlord changed the title Add vreplication throttler stats VReplication: Add throttler stats Feb 16, 2024
@mattlord mattlord force-pushed the vrepl_throttler_stats branch from b2b7429 to aa6e67a Compare February 16, 2024 06:05
@mattlord mattlord force-pushed the vrepl_throttler_stats branch from aa6e67a to 8850009 Compare February 16, 2024 06:07
Signed-off-by: Matt Lord <mattalord@gmail.com>
@mattlord mattlord force-pushed the vrepl_throttler_stats branch from 8850009 to ac6f772 Compare February 16, 2024 06:12
@mattlord mattlord removed NeedsIssue A linked issue is missing for this Pull Request NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work labels Feb 16, 2024
@mattlord mattlord marked this pull request as ready for review February 16, 2024 06:18
mattlord added a commit to vitessio/website that referenced this pull request Feb 16, 2024
From: vitessio/vitess#15221

Signed-off-by: Matt Lord <mattalord@gmail.com>
@mattlord mattlord removed the NeedsWebsiteDocsUpdate What it says label Feb 16, 2024
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated fix for flaky tests seen in the code coverage workflow:

--- FAIL: TestMinimalMode (60.00s)
    main_flaky_test.go:79: 
        	Error Trace:	/home/runner/work/vitess/vitess/go/vt/vttablet/tabletserver/vstreamer/main_flaky_test.go:79
        	            				/home/runner/work/vitess/vitess/go/vt/vttablet/tabletserver/vstreamer/vstreamer_test.go:1855
        	Error:      	Received unexpected error:
        	            	could not launch mysql: signal: killed
        	Test:       	TestMinimalMode

@mattlord mattlord merged commit af38099 into vitessio:main Feb 19, 2024
102 checks passed
@mattlord mattlord deleted the vrepl_throttler_stats branch February 19, 2024 19:25
mattlord added a commit to vitessio/website that referenced this pull request Feb 19, 2024
From: vitessio/vitess#15221

Signed-off-by: Matt Lord <mattalord@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Observability Pull requests that touch tracing/metrics/monitoring Component: Throttler Component: VReplication Type: Enhancement Logical improvement (somewhere between a bug and feature)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants