Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: delete leftover heartbeat connections #1033

Conversation

df-wg
Copy link
Contributor

@df-wg df-wg commented Jan 17, 2025

Users reported seeing the below sigfault when a connection to a subgraph is interrupted over multipart:

cosmo-router     | 21:07:16 PM ERROR core/graphql_handler.go:380 Unable to write error response {"hostname": "ad1991cdcfd2", "pid": 1, "component": "@wundergraph/router", "service_version": "0.158.0", "request_id": "ad1991cdcfd2/RTnCLg1J8M-000022", "trace_id": "77476c6fe295cd7dafeb71e00183879b", "error": "context canceled"}
cosmo-router     | github.com/wundergraph/cosmo/router/core.(*GraphQLHandler).WriteError
cosmo-router     |      github.com/wundergraph/cosmo/router/core/graphql_handler.go:380
cosmo-router     | github.com/wundergraph/graphql-go-tools/v2/pkg/engine/resolve.(*Resolver).handleHeartbeat.func1
cosmo-router     |      github.com/wundergraph/graphql-go-tools/v2@v2.0.0-rc.136/pkg/engine/resolve/resolve.go:431

After investigating, the cause seemed to be a number of times where we deleted the subscription trigger but didn't clean up the heartbeat (which is running in a separate thread), causing it to write on a non-existent context. This PR cleans that up

Copy link
Collaborator

@StarpTech StarpTech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch, looks reasonable to me. Can we add a test for it here or in the router?

v2/pkg/engine/resolve/resolve.go Outdated Show resolved Hide resolved
v2/pkg/engine/resolve/resolve.go Outdated Show resolved Hide resolved
v2/pkg/engine/resolve/resolve.go Outdated Show resolved Hide resolved
@df-wg df-wg requested review from Noroth and StarpTech January 20, 2025 19:38
Copy link
Contributor

@Noroth Noroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Do you think we should have one more test with 50 subscriptions at once and make sure that all of them are deleted properly?

e.g.:
require.Eventually

@df-wg df-wg requested a review from Noroth January 21, 2025 19:01
@df-wg
Copy link
Contributor Author

df-wg commented Jan 21, 2025

Good point @Noroth , added a test like that

@df-wg df-wg merged commit f7492d3 into master Jan 23, 2025
9 checks passed
@df-wg df-wg deleted the dave/eng-6234-multipart-write-error-when-router-loses-its-connection-with branch January 23, 2025 07:20
df-wg pushed a commit that referenced this pull request Jan 23, 2025
🤖 I have created a release *beep* *boop*
---


##
[2.0.0-rc.143](v2.0.0-rc.142...v2.0.0-rc.143)
(2025-01-23)


### Bug Fixes

* delete leftover heartbeat connections
([#1033](#1033))
([f7492d3](f7492d3))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants