Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Go routine leaks in streaming calls #15293

Merged
merged 3 commits into from
Feb 20, 2024

Conversation

GuptaManan100
Copy link
Member

@GuptaManan100 GuptaManan100 commented Feb 20, 2024

Description

This PR fixes the issue described in #14201.

On following the steps listed in the issue, it was noticed that we indeed had a bunch of go-routines being spawned off, that never really went away -

goroutine profile: total 102
31 @ 0x102aaf7e8 0x102ac2c38 0x103033f0c 0x102ae9dc4
#	0x103033f0b	google.golang.org/grpc.newClientStreamWithParams.func4+0x8b	google.golang.org/grpc@v1.61.1/stream.go:393

Notice the count of the go-routines which are all on the same line. Its 31, but running the queries in a loop causes it to increase by 1 for each iteration.

The problem causes go-routines to leak which will eventually lead to an OOM. The fix however was pretty straightforward.
In the StreamExecute RPC code, we have the following comment

// All streaming clients should follow the code pattern below.
// The first part of the function starts the stream while holding
// a lock on conn.mu. The second part receives the data and calls
// callback.
// A new cancelable context is needed because there's currently
// no direct API to end a stream from the client side. If callback
// returns an error, we return from the function. The deferred
// cancel will then cause the stream to be terminated.
ctx, cancel := context.WithCancel(ctx)
defer cancel()

This however wasn't being followed in the other streaming RPCs, like BeginStreamExecute causing the go routines to leak.

Related Issue(s)

Checklist

  • "Backport to:" labels have been added if this change should be back-ported to release branches
  • If this change is to be back-ported to previous releases, a justification is included in the PR description
  • Tests were added or are not required
  • Did the new or modified tests pass consistently locally and on CI?
  • Documentation was added or is not required

Deployment Notes

…there are no go-routine leaks

Signed-off-by: Manan Gupta <manan@planetscale.com>
Copy link
Contributor

vitess-bot bot commented Feb 20, 2024

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • Ensure there is a link to an issue (except for internal cleanup and flaky test fixes), new features should have an RFC that documents use cases and test cases.

Tests

  • Bug fixes should have at least one unit or end-to-end test, enhancement and new features should have a sufficient number of tests.

Documentation

  • Apply the release notes (needs details) label if users need to know about this change.
  • New features should be documented.
  • There should be some code comments as to why things are implemented the way they are.
  • There should be a comment at the top of each new or modified test to explain what the test does.

New flags

  • Is this flag really necessary?
  • Flag names must be clear and intuitive, use dashes (-), and have a clear help text.

If a workflow is added or modified:

  • Each item in Jobs should be named in order to mark it as required.
  • If the workflow needs to be marked as required, the maintainer team must be notified.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • RPC changes should be compatible with vitess-operator
  • If a flag is removed, then it should also be removed from vitess-operator and arewefastyet, if used there.
  • vtctl command output order should be stable and awk-able.

@vitess-bot vitess-bot bot added NeedsBackportReason If backport labels have been applied to a PR, a justification is required NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsIssue A linked issue is missing for this Pull Request NeedsWebsiteDocsUpdate What it says labels Feb 20, 2024
@github-actions github-actions bot added this to the v20.0.0 milestone Feb 20, 2024
Signed-off-by: Manan Gupta <manan@planetscale.com>
@GuptaManan100 GuptaManan100 added Type: Bug Component: Query Serving and removed NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsWebsiteDocsUpdate What it says NeedsIssue A linked issue is missing for this Pull Request NeedsBackportReason If backport labels have been applied to a PR, a justification is required labels Feb 20, 2024
@GuptaManan100 GuptaManan100 marked this pull request as ready for review February 20, 2024 10:24
Copy link

codecov bot commented Feb 20, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (696fe0e) 67.41% compared to head (316de39) 67.54%.
Report is 34 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #15293      +/-   ##
==========================================
+ Coverage   67.41%   67.54%   +0.12%     
==========================================
  Files        1560     1561       +1     
  Lines      192752   193371     +619     
==========================================
+ Hits       129952   130613     +661     
+ Misses      62800    62758      -42     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@GuptaManan100 GuptaManan100 added Backport to: release-16.0 Backport to: release-19.0 Needs to be back ported to release-19.0 labels Feb 20, 2024
@GuptaManan100
Copy link
Member Author

Leaking go routines is a serious issue, and therefore the fix is being backported.

@harshit-gangal
Copy link
Member

There are other methods as well
VStream, VStreamRows, VStreamTables, VStreamResults and GetSchema which should also have the same cancelable code block

Signed-off-by: Manan Gupta <manan@planetscale.com>
@GuptaManan100
Copy link
Member Author

@harshit-gangal Super catch! I've fixed them too 💯

@GuptaManan100 GuptaManan100 merged commit c1a176c into vitessio:main Feb 20, 2024
102 checks passed
@GuptaManan100 GuptaManan100 deleted the grpc-go-routine-leak branch February 20, 2024 14:10
vitess-bot pushed a commit that referenced this pull request Feb 20, 2024
Signed-off-by: Manan Gupta <manan@planetscale.com>
vitess-bot pushed a commit that referenced this pull request Feb 20, 2024
Signed-off-by: Manan Gupta <manan@planetscale.com>
dbussink pushed a commit that referenced this pull request Feb 20, 2024
Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
GuptaManan100 added a commit that referenced this pull request Feb 20, 2024
Signed-off-by: Manan Gupta <manan@planetscale.com>
GuptaManan100 pushed a commit that referenced this pull request Feb 20, 2024
Signed-off-by: Manan Gupta <manan@planetscale.com>
Co-authored-by: vitess-bot[bot] <108069721+vitess-bot[bot]@users.noreply.github.com>
GuptaManan100 pushed a commit that referenced this pull request Feb 20, 2024
Signed-off-by: Manan Gupta <manan@planetscale.com>
Co-authored-by: vitess-bot[bot] <108069721+vitess-bot[bot]@users.noreply.github.com>
GuptaManan100 added a commit that referenced this pull request Feb 20, 2024
Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
Co-authored-by: Manan Gupta <35839558+GuptaManan100@users.noreply.github.com>
dbussink pushed a commit that referenced this pull request Feb 20, 2024
Signed-off-by: Manan Gupta <manan@planetscale.com>
Co-authored-by: Manan Gupta <35839558+GuptaManan100@users.noreply.github.com>
Co-authored-by: Manan Gupta <manan@planetscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug Report: Potential leak of non-stopping goroutines
5 participants