Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Map NO_ERROR h2 code to Code.Internal #1113

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 28 additions & 2 deletions packages/connect-node/src/node-universal-client.spec.ts
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ describe("node http/2 client closing with RST_STREAM with code CANCEL", function
}),
);
it("should send RST_STREAM frame to the server", async function () {
return new Promise<void>((resolve) => {
await new Promise<void>((resolve) => {
http2.connect(server.getUrl(), (session: http2.ClientHttp2Session) => {
const stream = session.request(
{
Expand Down Expand Up @@ -87,7 +87,7 @@ describe("universal node http client", function () {
});

describe("against a server that closes immediately", function () {
describe("over http/2", function () {
describe("over http/2 with code CANCEL", function () {
let serverReceivedRequest = false;
const server = useNodeServer(() =>
http2.createServer((request, response) => {
Expand All @@ -113,6 +113,32 @@ describe("universal node http client", function () {
expect(serverReceivedRequest).toBeTrue();
});
});
describe("over http/2 with code NO_ERROR", function () {
let serverReceivedRequest = false;
const server = useNodeServer(() =>
http2.createServer((request, response) => {
serverReceivedRequest = true;
response.stream.close(http2.constants.NGHTTP2_NO_ERROR);
}),
);
it("should reject the response promise with Code.Internal", async function () {
const client = server.getClient();
try {
await client({
url: server.getUrl(),
method: "POST",
header: new Headers(),
});
fail("expected error");
} catch (e) {
expect(e).toBeInstanceOf(ConnectError);
expect(ConnectError.from(e).message).toBe(
"[internal] http/2 stream closed with error code NO_ERROR (0x0)",
);
}
expect(serverReceivedRequest).toBeTrue();
});
});
describe("over http/1.1", function () {
let serverReceivedRequest = false;
const server = useNodeServer(() =>
Expand Down
22 changes: 21 additions & 1 deletion packages/connect-node/src/node-universal-client.ts
Original file line number Diff line number Diff line change
Expand Up @@ -317,8 +317,28 @@ function h2Request(
}
sentinel.reject(e);
});

let receivedData = false;
stream.on("response", () => {
receivedData = true;
});
Comment on lines +321 to +323
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's sensible to raise an error for a RST_STREAM with NO_ERROR received before response headers. But I don't think it fully addresses #1108. The issue mentions an open streaming RPC, so the situation is much more likely to occur after response headers have already been received.

Ideally, the logic would be to raise an error for any RST_STREAM with NO_ERROR, unless we already received a status in headers or trailers.

This is impossible to implement in the universal client, since it doesn't know the status (which could be encoded in the body, or in trailers, depending on the protocol).

But can't we verify that we receive a trailer with a grpc-status field here? https://github.com/connectrpc/connect-es/blob/v1.4.0/packages/connect/src/protocol-grpc/transport.ts#L248 It looks like we currently only raise an error from the function validateTrailer if we receive an error code, not if we're missing the field.

This will be more complex to test (we should also check unary, gRPC-web, and Connect), but I think this should address the issue exhaustively.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So should we raise an error on each case of a missing trailer irrespective of RST stream?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, definitely. And we should add conformance test cases to verify this since it's possible that we have similar issues in other client implementations.

Looking at the specs for error codes, I think this would fall into the category of "Error parsing returned status", which would map to an UNKNOWN code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grpcurl (as stated in the issue by the user) reports a code internal, grpc-js also uses internal (grpc/grpc-node#2569). I wonder what go does.

stream.on("close", function h2StreamClose() {
// Node APIs don't differentiate between a `RST_STREAM` with `NO_ERROR` and
// a regular close.
//
// To check if this is an error we see if any data was received, if received
// this is not an error.
if (
stream.rstCode === http2.constants.NGHTTP2_NO_ERROR &&
!receivedData
Comment on lines +331 to +332
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if we've received data but no status with the gRPC protocol (which requires a "grpc-status" trailer)? Ideally, that would also result in an internal, so that if we receive message data but the stream is aborted before we get the final trailers, we can still report an error.

When we were both looking at this, we spent a good bit of time also looking at the "GOAWAY" case. IIUC, this doesn't address that -- if there is a "GOAWAY" frame with a "NO_ERROR" status, any streams that were created before the frame was received but with a frame ID higher than the server-indicated max ID will just hang until the connection is closed (in which case they get a network error), right? I know that's a separate bug that doesn't need to be addressed here. Just confirming if my understanding is correct and that is indeed a separate bug. I think where we left it is that it needed some more testing to see exactly what the Node HTTP/2 would do. If there is indeed another semi-related issue with that, we should make sure there is an issue to track it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the GOAWAY case we need testing to confirm the behavior.

What happens if we've received data but no status with the gRPC protocol (which requires a "grpc-status" trailer)?

If we already started receiving data will the server send a NO_ERROR? I understand it can do to lame duck in which case it is not an error for the current stream but to indicate no new connections?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lame ducking uses GOAWAY frames, which are connection-wide, not stream-specific. I thought this chunk of code was called in the face of a RST_STREAM frame, which is for terminating a single stream/operation, not the whole connection.

A server could send a RST_STREAM frame with NO_ERROR at any time, even if data has already been sent. It may not be common/expected in practice, but I thought that was actually the reported bug -- that it does happen and the client doesn't handle it correctly. I think the whole reason it gets classified as "internal" in the gRPC spec is because it is unexpected and shouldn't really happen in practice.

There is one "expected" scenario where this happens which shouldn't (hopefully) trigger this code. It should actually be ignored by the client because the client should have already reported a status from the trailers/end of body. It happens if the server ends the response stream (and, in gRPC, sends back trailers with a status), but the client never half-closed the request side of the stream. In such an instance, the RST_STREAM frame tells the client that the stream is complete and to not bother trying to send anything else for the stream, even to end/close the request side.

A Google search suggests that some HTTP/2 proxies may also do this to selectively cancel streams for which the downstream connection (to backend) has been closed or experienced some network failure (though they should really use INTERNAL_ERROR instead; but either way it would get reported to the client as an "internal" RPC error).

So with the current logic, where we have to just ignore the error if data has been sent (since we can't distinguish the framework doing this to signal the stream is done vs. an actual RST_STREAM frame with NO_ERROR), we would hopefully detect that this is an error because we'd observe that the response was not complete (either incomplete/malformed body and/or missing trailers/status).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there is a "GOAWAY" frame with a "NO_ERROR" status, any streams that were created before the frame was received but with a frame ID higher than the server-indicated max ID will just hang until the connection is closed (in which case they get a network error), right?

To clarify: For a graceful shutdown (GOAWAY frame with NO_ERROR), the client keeps any open streams open until they finish, and opens a new connection for new streams. So any unfinished requests continue as normal, and new requests work as normal from the user's perspective.

But you observed a different behavior with a frame ID that does not include one or more open streams, and the streams in question hang - as in they don't receive or transmit data anymore?

If this is a situation we can expect in regular operation, this does indeed look like a bug. It would be helpful to have a reproducible example.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify: For a graceful shutdown (GOAWAY frame with NO_ERROR), the client keeps any open streams open until they finish, and opens a new connection for new streams. So any unfinished requests continue as normal, and new requests work as normal from the user's perspective.

This isn't entirely correct. The GOAWAY frame includes a max stream ID. So any stream created by the client whose ID is greater than that should not be allowed to finish but should instead be retried on a different connection. With the current implementation in connect-es, the max stream ID is not inspected and nothing is done with those streams, so it is possible/likely that they will simply hang until the connection is actually closed. However, this warrants further testing since it is unclear whether the Node HTTP/2 implementation does some amount of handling of GOAWAY transparently 🤷. Seems unlikely, but we should try to repro first. It is likely that fixing this in the connect-es code could require a backwards-incompatible change to this transport interface, so we should verify whether it's a bug before finalizing v2.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So any stream created by the client whose ID is greater than that

FWIW, this only happens when there is a race condition. For example, the client creates a stream and sends the headers frame. The client is responsible for assigning stream IDs, so it assigns ID X. The server sends the GOAWAY before processing the headers frame for stream X, so it may indicate a max stream ID of X - 2 (minus two, instead of one, since client-initiated streams are always odd).

) {
sentinel.reject(
new ConnectError(
`http/2 stream closed with error code NO_ERROR (0x0)`,
Code.Internal,
),
);
return;
}
const err = connectErrorFromH2ResetCode(stream.rstCode);
if (err) {
sentinel.reject(err);
Expand Down