-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Map NO_ERROR
h2 code to Code.Internal
#1113
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -317,8 +317,28 @@ function h2Request( | |
} | ||
sentinel.reject(e); | ||
}); | ||
|
||
let receivedData = false; | ||
stream.on("response", () => { | ||
receivedData = true; | ||
}); | ||
stream.on("close", function h2StreamClose() { | ||
// Node APIs don't differentiate between a `RST_STREAM` with `NO_ERROR` and | ||
// a regular close. | ||
// | ||
// To check if this is an error we see if any data was received, if received | ||
// this is not an error. | ||
if ( | ||
stream.rstCode === http2.constants.NGHTTP2_NO_ERROR && | ||
!receivedData | ||
Comment on lines
+331
to
+332
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What happens if we've received data but no status with the gRPC protocol (which requires a "grpc-status" trailer)? Ideally, that would also result in an internal, so that if we receive message data but the stream is aborted before we get the final trailers, we can still report an error. When we were both looking at this, we spent a good bit of time also looking at the "GOAWAY" case. IIUC, this doesn't address that -- if there is a "GOAWAY" frame with a "NO_ERROR" status, any streams that were created before the frame was received but with a frame ID higher than the server-indicated max ID will just hang until the connection is closed (in which case they get a network error), right? I know that's a separate bug that doesn't need to be addressed here. Just confirming if my understanding is correct and that is indeed a separate bug. I think where we left it is that it needed some more testing to see exactly what the Node HTTP/2 would do. If there is indeed another semi-related issue with that, we should make sure there is an issue to track it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For the GOAWAY case we need testing to confirm the behavior.
If we already started receiving data will the server send a NO_ERROR? I understand it can do to lame duck in which case it is not an error for the current stream but to indicate no new connections? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The lame ducking uses GOAWAY frames, which are connection-wide, not stream-specific. I thought this chunk of code was called in the face of a RST_STREAM frame, which is for terminating a single stream/operation, not the whole connection. A server could send a RST_STREAM frame with NO_ERROR at any time, even if data has already been sent. It may not be common/expected in practice, but I thought that was actually the reported bug -- that it does happen and the client doesn't handle it correctly. I think the whole reason it gets classified as "internal" in the gRPC spec is because it is unexpected and shouldn't really happen in practice. There is one "expected" scenario where this happens which shouldn't (hopefully) trigger this code. It should actually be ignored by the client because the client should have already reported a status from the trailers/end of body. It happens if the server ends the response stream (and, in gRPC, sends back trailers with a status), but the client never half-closed the request side of the stream. In such an instance, the RST_STREAM frame tells the client that the stream is complete and to not bother trying to send anything else for the stream, even to end/close the request side. A Google search suggests that some HTTP/2 proxies may also do this to selectively cancel streams for which the downstream connection (to backend) has been closed or experienced some network failure (though they should really use INTERNAL_ERROR instead; but either way it would get reported to the client as an "internal" RPC error). So with the current logic, where we have to just ignore the error if data has been sent (since we can't distinguish the framework doing this to signal the stream is done vs. an actual RST_STREAM frame with NO_ERROR), we would hopefully detect that this is an error because we'd observe that the response was not complete (either incomplete/malformed body and/or missing trailers/status). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
To clarify: For a graceful shutdown (GOAWAY frame with NO_ERROR), the client keeps any open streams open until they finish, and opens a new connection for new streams. So any unfinished requests continue as normal, and new requests work as normal from the user's perspective. But you observed a different behavior with a frame ID that does not include one or more open streams, and the streams in question hang - as in they don't receive or transmit data anymore? If this is a situation we can expect in regular operation, this does indeed look like a bug. It would be helpful to have a reproducible example. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This isn't entirely correct. The GOAWAY frame includes a max stream ID. So any stream created by the client whose ID is greater than that should not be allowed to finish but should instead be retried on a different connection. With the current implementation in connect-es, the max stream ID is not inspected and nothing is done with those streams, so it is possible/likely that they will simply hang until the connection is actually closed. However, this warrants further testing since it is unclear whether the Node HTTP/2 implementation does some amount of handling of GOAWAY transparently 🤷. Seems unlikely, but we should try to repro first. It is likely that fixing this in the connect-es code could require a backwards-incompatible change to this transport interface, so we should verify whether it's a bug before finalizing v2. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
FWIW, this only happens when there is a race condition. For example, the client creates a stream and sends the headers frame. The client is responsible for assigning stream IDs, so it assigns ID X. The server sends the GOAWAY before processing the headers frame for stream X, so it may indicate a max stream ID of X - 2 (minus two, instead of one, since client-initiated streams are always odd). |
||
) { | ||
sentinel.reject( | ||
new ConnectError( | ||
`http/2 stream closed with error code NO_ERROR (0x0)`, | ||
Code.Internal, | ||
), | ||
); | ||
return; | ||
} | ||
const err = connectErrorFromH2ResetCode(stream.rstCode); | ||
if (err) { | ||
sentinel.reject(err); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's sensible to raise an error for a RST_STREAM with NO_ERROR received before response headers. But I don't think it fully addresses #1108. The issue mentions an open streaming RPC, so the situation is much more likely to occur after response headers have already been received.
Ideally, the logic would be to raise an error for any RST_STREAM with NO_ERROR, unless we already received a status in headers or trailers.
This is impossible to implement in the universal client, since it doesn't know the status (which could be encoded in the body, or in trailers, depending on the protocol).
But can't we verify that we receive a trailer with a grpc-status field here? https://github.com/connectrpc/connect-es/blob/v1.4.0/packages/connect/src/protocol-grpc/transport.ts#L248 It looks like we currently only raise an error from the function
validateTrailer
if we receive an error code, not if we're missing the field.This will be more complex to test (we should also check unary, gRPC-web, and Connect), but I think this should address the issue exhaustively.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So should we raise an error on each case of a missing trailer irrespective of RST stream?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, definitely. And we should add conformance test cases to verify this since it's possible that we have similar issues in other client implementations.
Looking at the specs for error codes, I think this would fall into the category of "Error parsing returned status", which would map to an UNKNOWN code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
grpcurl (as stated in the issue by the user) reports a code internal, grpc-js also uses internal (grpc/grpc-node#2569). I wonder what go does.