Skip to content

IA-4893: don't materialize the entire response #62

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 14, 2024

Conversation

davidangb
Copy link
Contributor

@davidangb davidangb commented May 14, 2024

Jira Ticket: https://broadworkbench.atlassian.net/browse/IA-4893

Summary of changes:

Eliminate code that materializes the entire HTTP response body into a Java variable; hopefully this eliminates OutOfMemory errors in the listener.

What

When writing the HTTP response to the caller, the previous code used InputStream.readAllBytes(), which creates a single byte array containing the entire HTTP response. In this PR, we use StreamUtils.copy instead, which works in 4096-byte buffers. Each buffer is read from the response, written to the caller, and then discarded.

Why

We have seen multiple instances of the listener crashing with OOMs, especially when WDS has a large response such as a TSV file. This code change is a hopeful fix for that.

Testing strategy

  • tested by kubectl edit-ing these changes into a running WDS, downloading a largeish TSV, and watching listener logs

when(targetHttpResponse.getStatusCode()).thenReturn(200);
when(context.getResponse()).thenReturn(listenerResponse);
when(targetHttpResponse.getCallerResponseOutputStream()).thenReturn(responseStream);

processor.writeTargetResponseOnCaller(targetHttpResponse);

verify(responseStream).close();
verify(mockBody).close();
verify(spyBody).close();
Copy link
Contributor Author

@davidangb davidangb May 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what was going on with this test. It previous used mockBody, which was a @Mock InputStream … but at no point was that mock set up to contain any content or have its .read() etc. methods give any responses. This is also the only test that used mockBody.

So, I changed it to use a spy with some actual content, and made the spy local to the test.

EDIT: I should note that before the spy-related changes, this test failed in the presence of the runtime StreamUtils.copy change


assertThat(new String(responseData.getValue()), equalTo(BODY_CONTENT));
// StreamUtils.copy buffers its writes in arrays of 4096 bytes (by default). Since this
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naive question, but would it be possible / useful to set up a test response that is on the order of size of what triggered the OOM?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added new unit tests for this.

@mlilynolting
Copy link
Contributor

This approach looks good to me.

Copy link

logging:
level:
com.microsoft.azure.relay.RelayLogger: WARN

Copy link
Contributor Author

@davidangb davidangb May 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewers: how do you feel about this change?

Before this PR, downloading my test TSV generated ~24 lines of logs:

-05-14 15:06:54.496   INFO 1 --- [t@1995042520-31] com.microsoft.azure.relay.RelayLogger    : ClientWebSocket(TrackingId:f815e59d-af78-4777-979c-06d0a0651e6e, Address:sb://lz44191250ee6bd7010acb08e426ed275d637ed9148c147a2e.servicebus.windows.net): received text from remote. Total length: 2900
2024-05-14 15:06:54.496   INFO 1 --- [t@1995042520-31] com.microsoft.azure.relay.RelayLogger    : HybridHttpConnection(TrackingId:384de61a-5f6a-42ab-a838-dba2ed2da761_G3, Address:https://lz44191250ee6bd7010acb08e426ed275d637ed9148c147a2e.servicebus.windows.net): request initializing.
2024-05-14 15:06:54.498   INFO 1 --- [      Thread-35] com.microsoft.azure.relay.RelayLogger    : HybridHttpConnection(TrackingId:384de61a-5f6a-42ab-a838-dba2ed2da761_G3, Address:https://lz44191250ee6bd7010acb08e426ed275d637ed9148c147a2e.servicebus.windows.net): Request method: GET.
2024-05-14 15:06:54.498   INFO 1 --- [      Thread-35] com.microsoft.azure.relay.RelayLogger    : HybridHttpConnection(TrackingId:384de61a-5f6a-42ab-a838-dba2ed2da761_G3, Address:https://lz44191250ee6bd7010acb08e426ed275d637ed9148c147a2e.servicebus.windows.net): Invoking user RequestHandler.
2024-05-14 15:06:54.499   INFO 1 --- [oundedElastic-2] o.b.l.r.inspectors.InspectorsProcessor   : Inspection result for the HTTP request. Result: true, URI:sb://lz44191250ee6bd7010acb08e426ed275d637ed9148c147a2e.servicebus.windows.net/wds-fb5608c8-66ee-47f6-9ce2-e663fec12442-fb5608c8-66ee-47f6-9ce2-e663fec12442/fb5608c8-66ee-47f6-9ce2-e663fec12442/tsv/v0.2/sample
2024-05-14 15:06:54.542   INFO 1 --- [oundedElastic-5] o.b.l.relay.inspectors.RequestLogger     : RELAY_REQUEST_RESPONSE - 70.16.76.24 "9ec01dd3-ddec-4273-8e34-39fdd2a480ab" "-" "davidan.dev@gmail.com" "-" [2024-05-14T15:06:54.540322994Z] "GET sb://lz44191250ee6bd7010acb08e426ed275d637ed9148c147a2e.servicebus.windows.net/wds-fb5608c8-66ee-47f6-9ce2-e663fec12442-fb5608c8-66ee-47f6-9ce2-e663fec12442/fb5608c8-66ee-47f6-9ce2-e663fec12442/tsv/v0.2/sample" 200 "https://bvdp-saturn-dev.appspot.com/" "https://bvdp-saturn-dev.appspot.com" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36"
2024-05-14 15:06:55.007   INFO 1 --- [oundedElastic-5] com.microsoft.azure.relay.RelayLogger    : HybridHttpConnection(TrackingId:384de61a-5f6a-42ab-a838-dba2ed2da761_G3, Address:https://lz44191250ee6bd7010acb08e426ed275d637ed9148c147a2e.servicebus.windows.net)+ResponseStream+ResponseStream: WriteAsync(count=1963561)
2024-05-14 15:06:55.007   INFO 1 --- [oundedElastic-5] com.microsoft.azure.relay.RelayLogger    : HybridHttpConnection(TrackingId:384de61a-5f6a-42ab-a838-dba2ed2da761_G3, Address:https://lz44191250ee6bd7010acb08e426ed275d637ed9148c147a2e.servicebus.windows.net)+ResponseStream+ResponseStream: FlushCoreAsync(reason=BUFFER_FULL)
2024-05-14 15:06:55.007   INFO 1 --- [oundedElastic-5] com.microsoft.azure.relay.RelayLogger    : HybridHttpConnection(TrackingId:384de61a-5f6a-42ab-a838-dba2ed2da761_G3, Address:https://lz44191250ee6bd7010acb08e426ed275d637ed9148c147a2e.servicebus.windows.net): Creating the rendezvous connection.
2024-05-14 15:06:55.009   INFO 1 --- [ool-1-thread-16] com.microsoft.azure.relay.RelayLogger    : ClientWebSocket(TrackingId:384de61a-5f6a-42ab-a838-dba2ed2da761_G3, Address:https://lz44191250ee6bd7010acb08e426ed275d637ed9148c147a2e.servicebus.windows.net) is connecting.
2024-05-14 15:06:55.406   INFO 1 --- [t@263943838-114] com.microsoft.azure.relay.RelayLogger    : ClientWebSocket(TrackingId:384de61a-5f6a-42ab-a838-dba2ed2da761_G3, Address:https://lz44191250ee6bd7010acb08e426ed275d637ed9148c147a2e.servicebus.windows.net) is connected.
2024-05-14 15:06:55.407   INFO 1 --- [      Thread-44] com.microsoft.azure.relay.RelayLogger    : HybridHttpConnection(TrackingId:384de61a-5f6a-42ab-a838-dba2ed2da761_G3, Address:https://lz44191250ee6bd7010acb08e426ed275d637ed9148c147a2e.servicebus.windows.net): Sending the response command on the rendezvous connection, status: 200.
2024-05-14 15:06:55.407   INFO 1 --- [      Thread-44] com.microsoft.azure.relay.RelayLogger    : ClientWebSocket(TrackingId:384de61a-5f6a-42ab-a838-dba2ed2da761_G3, Address:https://lz44191250ee6bd7010acb08e426ed275d637ed9148c147a2e.servicebus.windows.net): starting to write to remote. Writemode: TEXT
2024-05-14 15:06:55.408   INFO 1 --- [ool-1-thread-18] com.microsoft.azure.relay.RelayLogger    : ClientWebSocket(TrackingId:384de61a-5f6a-42ab-a838-dba2ed2da761_G3, Address:https://lz44191250ee6bd7010acb08e426ed275d637ed9148c147a2e.servicebus.windows.net): finished writing 774 bytes to remote.
2024-05-14 15:06:55.504   INFO 1 --- [ool-1-thread-18] com.microsoft.azure.relay.RelayLogger    : HybridHttpConnection(TrackingId:384de61a-5f6a-42ab-a838-dba2ed2da761_G3, Address:https://lz44191250ee6bd7010acb08e426ed275d637ed9148c147a2e.servicebus.windows.net): Finished sending the response command on the rendezvous connection, status: 200.
2024-05-14 15:06:55.504   INFO 1 --- [ool-1-thread-18] com.microsoft.azure.relay.RelayLogger    : ClientWebSocket(TrackingId:384de61a-5f6a-42ab-a838-dba2ed2da761_G3, Address:https://lz44191250ee6bd7010acb08e426ed275d637ed9148c147a2e.servicebus.windows.net): starting to write to remote. Writemode: BINARY
2024-05-14 15:06:56.010   INFO 1 --- [ool-1-thread-18] com.microsoft.azure.relay.RelayLogger    : ClientWebSocket(TrackingId:384de61a-5f6a-42ab-a838-dba2ed2da761_G3, Address:https://lz44191250ee6bd7010acb08e426ed275d637ed9148c147a2e.servicebus.windows.net): finished writing 1963561 bytes to remote.
2024-05-14 15:06:56.010   INFO 1 --- [ool-1-thread-18] com.microsoft.azure.relay.RelayLogger    : HybridHttpConnection(TrackingId:384de61a-5f6a-42ab-a838-dba2ed2da761_G3, Address:https://lz44191250ee6bd7010acb08e426ed275d637ed9148c147a2e.servicebus.windows.net): Sending 0 bytes on the rendezvous connection
2024-05-14 15:06:56.010   INFO 1 --- [oundedElastic-5] com.microsoft.azure.relay.RelayLogger    : HybridHttpConnection(TrackingId:384de61a-5f6a-42ab-a838-dba2ed2da761_G3, Address:https://lz44191250ee6bd7010acb08e426ed275d637ed9148c147a2e.servicebus.windows.net)+ResponseStream is closing.
2024-05-14 15:06:56.010   INFO 1 --- [oundedElastic-5] com.microsoft.azure.relay.RelayLogger    : HybridHttpConnection(TrackingId:384de61a-5f6a-42ab-a838-dba2ed2da761_G3, Address:https://lz44191250ee6bd7010acb08e426ed275d637ed9148c147a2e.servicebus.windows.net) is closing.
2024-05-14 15:06:56.010   INFO 1 --- [oundedElastic-5] com.microsoft.azure.relay.RelayLogger    : ClientWebSocket(TrackingId:384de61a-5f6a-42ab-a838-dba2ed2da761_G3, Address:https://lz44191250ee6bd7010acb08e426ed275d637ed9148c147a2e.servicebus.windows.net): is closing. Close reason: NormalClosure
2024-05-14 15:06:57.425   INFO 1 --- [t@263943838-112] com.microsoft.azure.relay.RelayLogger    : ClientWebSocket(TrackingId:384de61a-5f6a-42ab-a838-dba2ed2da761_G3, Address:https://lz44191250ee6bd7010acb08e426ed275d637ed9148c147a2e.servicebus.windows.net): is closed. Close reason: Normal Closure
2024-05-14 15:06:57.425   INFO 1 --- [t@263943838-112] com.microsoft.azure.relay.RelayLogger    : HybridHttpConnection(TrackingId:384de61a-5f6a-42ab-a838-dba2ed2da761_G3, Address:https://lz44191250ee6bd7010acb08e426ed275d637ed9148c147a2e.servicebus.windows.net) is closed.
2024-05-14 15:06:57.426   INFO 1 --- [oundedElastic-5] o.b.l.r.t.RelayedRequestPipeline         : Processed request with the following result: SUCCESS

After this PR, since we buffer writes to the caller, my test TSV exploded into 8359 lines of logs, with a ton of repetition like:

2024-05-14 15:03:27.710   INFO 1 --- [pool-1-thread-8] com.microsoft.azure.relay.RelayLogger    : HybridHttpConnection(TrackingId:1e926a8c-88ad-443f-ac2d-9d13828cda7f_G26, Address:https://lz44191250ee6bd7010acb08e426ed275d637ed9148c147a2e.servicebus.windows.net): Sending 0 bytes on the rendezvous connection
2024-05-14 15:03:27.710   INFO 1 --- [oundedElastic-3] com.microsoft.azure.relay.RelayLogger    : HybridHttpConnection(TrackingId:1e926a8c-88ad-443f-ac2d-9d13828cda7f_G26, Address:https://lz44191250ee6bd7010acb08e426ed275d637ed9148c147a2e.servicebus.windows.net)+ResponseStream+ResponseStream: WriteAsync(count=975)
2024-05-14 15:03:27.710   INFO 1 --- [oundedElastic-3] com.microsoft.azure.relay.RelayLogger    : ClientWebSocket(TrackingId:1e926a8c-88ad-443f-ac2d-9d13828cda7f_G26, Address:https://lz44191250ee6bd7010acb08e426ed275d637ed9148c147a2e.servicebus.windows.net): starting to write to remote. Writemode: BINARY
2024-05-14 15:03:27.710   INFO 1 --- [pool-1-thread-7] com.microsoft.azure.relay.RelayLogger    : ClientWebSocket(TrackingId:1e926a8c-88ad-443f-ac2d-9d13828cda7f_G26, Address:https://lz44191250ee6bd7010acb08e426ed275d637ed9148c147a2e.servicebus.windows.net): finished writing 975 bytes to remote.
2024-05-14 15:03:27.710   INFO 1 --- [pool-1-thread-7] com.microsoft.azure.relay.RelayLogger    : HybridHttpConnection(TrackingId:1e926a8c-88ad-443f-ac2d-9d13828cda7f_G26, Address:https://lz44191250ee6bd7010acb08e426ed275d637ed9148c147a2e.servicebus.windows.net): Sending 0 bytes on the rendezvous connection
2024-05-14 15:03:27.710   INFO 1 --- [oundedElastic-3] com.microsoft.azure.relay.RelayLogger    : HybridHttpConnection(TrackingId:1e926a8c-88ad-443f-ac2d-9d13828cda7f_G26, Address:https://lz44191250ee6bd7010acb08e426ed275d637ed9148c147a2e.servicebus.windows.net)+ResponseStream+ResponseStream: WriteAsync(count=973)
2024-05-14 15:03:27.710   INFO 1 --- [oundedElastic-3] com.microsoft.azure.relay.RelayLogger    : ClientWebSocket(TrackingId:1e926a8c-88ad-443f-ac2d-9d13828cda7f_G26, Address:https://lz44191250ee6bd7010acb08e426ed275d637ed9148c147a2e.servicebus.windows.net): starting to write to remote. Writemode: BINARY
2024-05-14 15:03:27.710   INFO 1 --- [pool-1-thread-8] com.microsoft.azure.relay.RelayLogger    : ClientWebSocket(TrackingId:1e926a8c-88ad-443f-ac2d-9d13828cda7f_G26, Address:https://lz44191250ee6bd7010acb08e426ed275d637ed9148c147a2e.servicebus.windows.net): finished writing 973 bytes to remote.
2024-05-14 15:03:27.710   INFO 1 --- [pool-1-thread-8] com.microsoft.azure.relay.RelayLogger    : HybridHttpConnection(TrackingId:1e926a8c-88ad-443f-ac2d-9d13828cda7f_G26, Address:https://lz44191250ee6bd7010acb08e426ed275d637ed9148c147a2e.servicebus.windows.net): Sending 0 bytes on the rendezvous connection
2024-05-14 15:03:27.710   INFO 1 --- [oundedElastic-3] com.microsoft.azure.relay.RelayLogger    : HybridHttpConnection(TrackingId:1e926a8c-88ad-443f-ac2d-9d13828cda7f_G26, Address:https://lz44191250ee6bd7010acb08e426ed275d637ed9148c147a2e.servicebus.windows.net)+ResponseStream+ResponseStream: WriteAsync(count=975)
2024-05-14 15:03:27.710   INFO 1 --- [oundedElastic-3] com.microsoft.azure.relay.RelayLogger    : ClientWebSocket(TrackingId:1e926a8c-88ad-443f-ac2d-9d13828cda7f_G26, Address:https://lz44191250ee6bd7010acb08e426ed275d637ed9148c147a2e.servicebus.windows.net): starting to write to remote. Writemode: BINARY
2024-05-14 15:03:27.804   INFO 1 --- [pool-1-thread-7] com.microsoft.azure.relay.RelayLogger    : ClientWebSocket(TrackingId:1e926a8c-88ad-443f-ac2d-9d13828cda7f_G26, Address:https://lz44191250ee6bd7010acb08e426ed275d637ed9148c147a2e.servicebus.windows.net): finished writing 975 bytes to remote.
2024-05-14 15:03:27.804   INFO 1 --- [pool-1-thread-7] com.microsoft.azure.relay.RelayLogger    : HybridHttpConnection(TrackingId:1e926a8c-88ad-443f-ac2d-9d13828cda7f_G26, Address:https://lz44191250ee6bd7010acb08e426ed275d637ed9148c147a2e.servicebus.windows.net): Sending 0 bytes on the rendezvous connection
2024-05-14 15:03:27.804   INFO 1 --- [oundedElastic-3] com.microsoft.azure.relay.RelayLogger    : HybridHttpConnection(TrackingId:1e926a8c-88ad-443f-ac2d-9d13828cda7f_G26, Address:https://lz44191250ee6bd7010acb08e426ed275d637ed9148c147a2e.servicebus.windows.net)+ResponseStream+ResponseStream: WriteAsync(count=452)
2024-05-14 15:03:27.804   INFO 1 --- [oundedElastic-3] com.microsoft.azure.relay.RelayLogger    : ClientWebSocket(TrackingId:1e926a8c-88ad-443f-ac2d-9d13828cda7f_G26, Address:https://lz44191250ee6bd7010acb08e426ed275d637ed9148c147a2e.servicebus.windows.net): starting to write to remote. Writemode: BINARY

But with this change to logging level, it is now 3 lines:

2024-05-14 16:46:32.015   INFO 1 --- [oundedElastic-2] o.b.l.r.inspectors.InspectorsProcessor   : Inspection result for the HTTP request. Result: true, URI:sb://lz44191250ee6bd7010acb08e426ed275d637ed9148c147a2e.servicebus.windows.net/wds-fb5608c8-66ee-47f6-9ce2-e663fec12442-fb5608c8-66ee-47f6-9ce2-e663fec12442/fb5608c8-66ee-47f6-9ce2-e663fec12442/tsv/v0.2/sample
2024-05-14 16:46:32.049   INFO 1 --- [oundedElastic-4] o.b.l.relay.inspectors.RequestLogger     : RELAY_REQUEST_RESPONSE - 70.16.76.24 "9ec01dd3-ddec-4273-8e34-39fdd2a480ab" "-" "davidan.dev@gmail.com" "-" [2024-05-14T16:46:32.047737387Z] "GET sb://lz44191250ee6bd7010acb08e426ed275d637ed9148c147a2e.servicebus.windows.net/wds-fb5608c8-66ee-47f6-9ce2-e663fec12442-fb5608c8-66ee-47f6-9ce2-e663fec12442/fb5608c8-66ee-47f6-9ce2-e663fec12442/tsv/v0.2/sample" 200 "https://bvdp-saturn-dev.appspot.com/" "https://bvdp-saturn-dev.appspot.com" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36"
2024-05-14 16:46:38.367   INFO 1 --- [oundedElastic-4] o.b.l.r.t.RelayedRequestPipeline         : Processed request with the following result: SUCCESS

We have less info for debugging, but also we reduce log ingestion cost to the end user and keep our logs less cluttered.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the other recent issues around log costs, this seems like a good idea.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense to me, there is no way that 8000+ lines of repetitive logs will be useful for debugging anyways

@davidangb davidangb requested a review from LizBaldo May 14, 2024 17:33
Copy link
Collaborator

@LizBaldo LizBaldo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for adding the couple of test cases based on the size of the request, and the change to only log warnings make sense to me too.

@davidangb davidangb merged commit 76d982c into main May 14, 2024
@davidangb davidangb deleted the da_IA-4893_responseOOM branch May 14, 2024 19:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants