-
Notifications
You must be signed in to change notification settings - Fork 7
storage: adapt Netty Reactor HTTP client as GCS storage client #410
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
storage: adapt Netty Reactor HTTP client as GCS storage client #410
Conversation
Notes: * Uses direct memory buffers. Recommend to run Diskless with "-Dio.netty.maxDirectMemory=0" to have the Netty cleaner running. * Has static 96 max connections pool. * Has static 32 worker thread pool. * "SO_KEEPALIVE" set for sockets and keep alive header for HTTP. * Compression disabled, producer compression recommended and compressing again likely not beneficial. * GCS client handles redirects, Netty Reactor client following disabled. * Can use static BoringSSL library to offload SSL to OpenSSL. * Zero-copy until the response handling where direct memory buffer bytes are copied to heap manager byte array.
239c92c to
697fdfd
Compare
agrawal-siddharth
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's some feedback and a suggested approach from Google GCS SDK team.
GCS has a gRPC based API, which the GCS Java SDK supports[1] out of the box. My recommendation would be for them to use that if they are wanting an async transport core where we've already done the work to bridge async to streaming (and sometimes non-blocking) channels and to use less CPU compared to the default NetHttpTransport.
I'm curious what they hope to gain by using netty as the lower level transport layer. Netty can be a great http client, but the GCS java SDK is not async in the vast majority of operations and will block the invoking thread as necessary, additionally we've already done a notable amount of work the past few years to reduce memory usage and unnecessary allocations to the heap.
That said, if I were to attempt something like this, I would attempt to get it working in the test suite[1] we already have for the GCS Java SDK. Create a branch of the repo, then modify HttpStorageOptions.HttpStorageDefault#getDefaultTransportOptions()[2] to return their Netty based transport. Then running the integration test suite mvn -Dmaven.test.skip.exec=true -Penable-integration-tests clean verify
From a superficial standpoint, one challenge they will likely run into is the need for streaming of large amount of bytes. The GCS Java SDK does not have any client side limits on how large objects or their streams can be, this can be a challenge for Netty due to Netty operating primarily on ByteBufs. I know of users who are uploading multiple gigabyte objects on a single stream, and similarly reading many gigabytes in a streaming mode often with application backpressure.
[1] https://cloud.google.com/storage/docs/enable-grpc-api
[2] https://github.com/googleapis/java-storage/tree/main/google-cloud-storage/src/test/java/com/google/cloud/storage
[3] https://github.com/googleapis/java-storage/blob/main/google-cloud-storage/src/main/java/com/google/cloud/storage/HttpStorageOptions.java#L341-L343
|
@agrawal-siddharth Thank you for the comments. The intent here is to reduce CPU usage and byte copying when loading from GCS, so main change is the possibility to offload the SSL handling to OpenSSL. The SSL handling is dominant in the CPU usage graphs. Also the Java HTTP client used by the GCS client by default is not very easy to control, so this provides better control. I'll definitely look the gRPC option which seems to be generally available now and would provide single socket and multiple streams. The size of the files is not very large, based on the benchmarks I have run they vary between 2 MiB and 8 MiB, upper limit is 16 MiB. But I agree that for general use case this must be able to handle large files. |
|
This PR is being marked as stale since it has not had any activity in 90 days. If you If you are having difficulty finding a reviewer, please reach out on the [mailing list](https://kafka.apache.org/contact). If this PR is no longer valid or desired, please feel free to close it. If no activity occurs in the next 30 days, it will be automatically closed. |
Notes: