Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the ability to configure SocketOptions on the AwsCrtAsyncHttpClient #3457

Merged
merged 2 commits into from
Oct 6, 2022

Conversation

nikp
Copy link
Contributor

@nikp nikp commented Oct 3, 2022

Motivation and Context

This is necessary for any clients with long-running connections that exceed default socket timeouts of services along the call path, and need to enable keep-alive settings which the CRT client supports, but the Java client wasn't exposing to callers

Modifications

Adds a new configuration parameter object to encapsulate SocketOptions to the AwsCrtAsyncHttpClient DefaultBuilder

Testing

Unit testing - the change is very isolated from the rest of the package

Screenshots (if appropriate)

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)

Checklist

  • I have read the CONTRIBUTING document
  • [] Local run of mvn install succeeds
  • My mvn install does not succeed due to [ERROR] Failed to execute goal org.apache.maven.plugins:maven-shade-plugin:3.1.0:shade (default) on project third-party-jackson-core: Error creating shaded jar: duplicate entry: META-INF/services/software.amazon.awssdk.thirdparty.jackson.core.JsonFactory, however the code I added is independent and I verified that the new code unit tests pass
  • My code follows the code style of this project
  • My change requires a change to the Javadoc documentation
  • I have updated the Javadoc documentation accordingly
  • I have added tests to cover my changes
  • All new and existing tests passed
  • I have added a changelog entry. Adding a new entry must be accomplished by running the scripts/new-change script and following the instructions. Commit the new file created by the script in .changes/next-release with your changes.
  • My change is to implement 1.11 parity feature and I have updated LaunchChangelog

^ I'm actually not sure about the last one - @JonathanHenson guided us to this change but I do see the LaunchChangelog references TCP Keep-Alive. Is it related?

License

  • I confirm that this pull request can be released under the Apache 2 license

@nikp nikp requested a review from a team as a code owner October 3, 2022 21:47
Copy link
Contributor

@JonathanHenson JonathanHenson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!, someone from the Java SDK team could you have a look?

@L-Applin
Copy link
Contributor

L-Applin commented Oct 4, 2022

Just noting that this PR would also resolve resolve #3251, and make #3456 obsolete as this supersedes it.

Comment on lines 143 to 145
private int connectTimeoutMs = 3000;
private int keepAliveIntervalSecs = 0;
private int keepAliveTimeoutSecs = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of defining defaults here, I think we should put the configured values into the standard options map like below. The SDK will use the global defaults automatically for any options that are not configured

public Builder connectionTimeout(Duration timeout) {
Validate.isPositive(timeout, "connectionTimeout");
standardOptions.put(SdkHttpConfigurationOption.CONNECTION_TIMEOUT, timeout);
return this;

public static final AttributeMap GLOBAL_HTTP_DEFAULTS = AttributeMap
.builder()
.put(READ_TIMEOUT, DEFAULT_SOCKET_READ_TIMEOUT)
.put(WRITE_TIMEOUT, DEFAULT_SOCKET_WRITE_TIMEOUT)
.put(CONNECTION_TIMEOUT, DEFAULT_CONNECTION_TIMEOUT)
.put(CONNECTION_ACQUIRE_TIMEOUT, DEFAULT_CONNECTION_ACQUIRE_TIMEOUT)
.put(CONNECTION_MAX_IDLE_TIMEOUT, DEFAULT_CONNECTION_MAX_IDLE_TIMEOUT)
.put(CONNECTION_TIME_TO_LIVE, DEFAULT_CONNECTION_TIME_TO_LIVE)
.put(MAX_CONNECTIONS, DEFAULT_MAX_CONNECTIONS)
.put(MAX_PENDING_CONNECTION_ACQUIRES, DEFAULT_MAX_CONNECTION_ACQUIRES)
.put(PROTOCOL, DEFAULT_PROTOCOL)
.put(TRUST_ALL_CERTIFICATES, DEFAULT_TRUST_ALL_CERTIFICATES)
.put(REAP_IDLE_CONNECTIONS, DEFAULT_REAP_IDLE_CONNECTIONS)
.put(TCP_KEEPALIVE, DEFAULT_TCP_KEEPALIVE)
.put(TLS_KEY_MANAGERS_PROVIDER, DEFAULT_TLS_KEY_MANAGERS_PROVIDER)
.put(TLS_TRUST_MANAGERS_PROVIDER, DEFAULT_TLS_TRUST_MANAGERS_PROVIDER)
.put(TLS_NEGOTIATION_TIMEOUT, DEFAULT_TLS_NEGOTIATION_TIMEOUT)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can move those there, sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the discussion in #3457 (comment), i think it doesn't make sense to list defaults for these

* @param domain socket domain
* @return Builder
*/
Builder domain(SocketOptions.SocketDomain domain);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need domain and type? If I understand correctly, there's no use-case for configuring these two with AWS services, right? We can always add them in the future if needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They probably need to be kept for later http/3, and other alternative transports, but yes assuming this is for building an http client, it's currently always TCP

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up keeping them for the advanced users, since it's all going to the same place in the CRT, but decoupled from both connection timeouts, and the keep-alive configuration so that SDK users discovering this can get their iteratively

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we should only expose public APIs if needed to keep it a 2-way door because it'd be a lot harder to remove/change them in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you saying I should remove SocketOptionsConfiguration from this change so we can decide later how to expose them? I'm fine with that

Comment on lines 124 to 131
Builder keepAliveIntervalSecs(int keepAliveIntervalSecs);

/**
* Sets the number of seconds to wait for a keepalive response before considering the connection timed out
* @param keepAliveTimeoutSecs number of seconds to wait for a keepalive response before considering the connection timed out
* @return Builder
*/
Builder keepAliveTimeoutSecs(int keepAliveTimeoutSecs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we expect customers to configure keepAliveIntervalSecs and keepAliveTimeoutSecs? If not, should we just introduce tcpKeepAlive(Boolean tcpKeepAlive) and pass default values defined in the SDK to CRT? Because exposing too many knobs may make the SDK more complex to use.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will default to your judgement there. I suspect just being able to enable keepAlive should be enough.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, keep alive is worthless without setting those values. The kernel defaults are never what anyone needs for these use-cases. So, it needs to be here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is that these settings may be too "advanced", so some customers may just set some random values w/o doing any research. I'm fine with adding these two, but I think we should still introduce a boolean tcpKeepAlive and define some "safe" defaults for tcpInterval and tcpKeepAliveIntervalSecs if they are not configured. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO I think the right way to do this would be to separate KeepAliveConfiguration into a dedicated object.

If it's not set, that means customers want the default, disabled. If they DO set it, then they should be setting both the interval and the timeout, and they're "opting-in" to knowing what they're doing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's either add these values or don't bother. The kernel defaults will not meet anyone's needs. It's on the scale of hours before the first probe gets sent. It IS too advanced, but the answer isn't to do it for the user. By the time you're tuning keep alive, both of these values need to be a conscious decision by the user.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nikp I think a separate class is fine. Can we name it TcpKeepAliveConfiguration though? because people may confuse it with HTTP keep alive (we've had customers asking for clarification before)

@JonathanHenson I think that's fair. We'd probably need to provide some guidance on how to tune them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm gonna submit a new revision with that approach - I think it'll work nicely :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I submitted TcpKeepAliveConfiguration, please take a look @zoewangg

@nikp
Copy link
Contributor Author

nikp commented Oct 4, 2022

OK I think I can address the requested comments but there are 3 questions for @zoewangg in my mind that are not clear and worth leaving a record of for future extension.

  1. @L-Applin added connection timeout setting. I added keep-alive setting. Should they be top-level fields on AwsCrtAsyncHttpClient.Builder (peers to connectionMaxIdleTime), or should they be in a sub-configuration object.
  2. Connection timeout already has a default in SdkHttpConfigurationOption of 2 seconds, however the underlying SocketOptions that sets the CRT options defaults to 3. Since it looks like today the DEFAULT_CONNECTION_TIMEOUT configuration isn't being applied, that means the effective default is 3 seconds. What would be considered the safe non-breaking way to resolve this inconsistency?
  3. You mentioned that NettyNioAsyncHttpClient already supports configuration of both connectionTimeout and enableTcpKeepAlive and that the CRT client could share a common interface or base calss. Do you want to extract this interface as part of this CR (I assume to SdkAsyncHttpClient/SdkAsyncHttpClient.Builder, or leave it for a future change?

@zoewangg
Copy link
Contributor

zoewangg commented Oct 4, 2022

@L-Applin added connection timeout setting. I added keep-alive setting. Should they be top-level fields on AwsCrtAsyncHttpClient.Builder (peers to connectionMaxIdleTime), or should they be in a sub-configuration object.

I think they should be top-level fields. To be clear, we are going to just add connectionTimeout(Duration connectionTimeout) and tcpKeepAlive(Boolean tcpKeepAlive) right?

Connection timeout already has a default in SdkHttpConfigurationOption of 2 seconds, however the underlying SocketOptions that sets the CRT options defaults to 3. Since it looks like today the DEFAULT_CONNECTION_TIMEOUT configuration isn't being applied, that means the effective default is 3 seconds. What would be considered the safe non-breaking way to resolve this inconsistency?

I think this should be fine. CRT HTTP Client is still in Developer Preview, so breaking change is expected

You mentioned that NettyNioAsyncHttpClient already supports configuration of both connectionTimeout and enableTcpKeepAlive and that the CRT client could share a common interface or base calss. Do you want to extract this interface as part of this CR (I assume to SdkAsyncHttpClient/SdkAsyncHttpClient.Builder, or leave it for a future change?

They should not share a common interface actually. It's by design because not every HTTP client supports the same client options.

@nikp
Copy link
Contributor Author

nikp commented Oct 4, 2022

I think they should be top-level fields. To be clear, we are going to just add connectionTimeout(Duration connectionTimeout) and tcpKeepAlive(Boolean tcpKeepAlive) right?

Maybe, I'm starting to second-guess it. Your reference to NettyNioAsyncHttpClient having a boolean field made me think that we already have configured defaults somewhere for keepAliveInterval and keepAliveTimeout but we do not - actually it looks like https://github.com/aws/aws-sdk-java-v2/blob/master/http-clients/netty-nio-client/src/main/java/software/amazon/awssdk/http/nio/netty/NettyNioAsyncHttpClient.java lets tcpKeepAlive be set on the Builder but isn't actually being forwarded to the client.

Which means that we either need to add sensible defaults for those values, or we let customers configure them. Since a sensible keep-alive interval very much depends on the underlying connection handler and their timeouts, I think it makes more sense to let customers configure them. Which creates weird validation edge-cases. Would it be better to just have the 2 Duration fields and if they're not set we treat it as tcpKeepAlive=false, and if they're set, true?

I think this should be fine. CRT HTTP Client is still in Developer Preview, so breaking change is expected

That's fine, but we should be explicit. We can't change the defaults for the underlying SocketOptions since that's in the CRT, and would affect other languages. But is DEFAULT_CONNECTION_TIMEOUT in the Java SDK used for other purposes or in other packages?

They should not share a common interface actually. It's be design because not every HTTP client supports the same client options.

An exercise for a future reader then.

@zoewangg
Copy link
Contributor

zoewangg commented Oct 4, 2022

Your reference to NettyNioAsyncHttpClient having a boolean field made me think that we already have configured defaults somewhere for keepAliveInterval and keepAliveTimeout but we do not

Yes, we do not have defaults for keepAliveInterval and keepAliveTimeout because the existing HTTP clients don't support them.

actually it looks like https://github.com/aws/aws-sdk-java-v2/blob/master/http-clients/netty-nio-client/src/main/java/software/amazon/awssdk/http/nio/netty/NettyNioAsyncHttpClient.java lets tcpKeepAlive be set on the Builder but isn't actually being forwarded to the client.

It it forwarded to the client through AttributeMap

But is DEFAULT_CONNECTION_TIMEOUT in the Java SDK used for other purposes or in other packages?

Yes, it's used by all HTTP clients that support connect timeout, i.e., Netty and Apache HTTP client.

Which means that we either need to add sensible defaults for those values, or we let customers configure them. Since a sensible keep-alive interval very much depends on the underlying connection handler and their timeouts, I think it makes more sense to let customers configure them. Which creates weird validation edge-cases. Would it be better to just have the 2 Duration fields and if they're not set we treat it as tcpKeepAlive=false, and if they're set, true?

I'd vote for adding sensible defaults and keeping tcpKeepAlive. Moved this conversion to #3457 (comment)

@zoewangg
Copy link
Contributor

zoewangg commented Oct 4, 2022

What would be considered the safe non-breaking way to resolve this inconsistency?

We could define an CRT HTTP defaults attribute map that has connect timeout of 3 seconds and merge the map in the build method. Example in netty:

private static final AttributeMap NETTY_HTTP_DEFAULTS =
AttributeMap.builder()
.put(SdkHttpConfigurationOption.CONNECTION_MAX_IDLE_TIMEOUT, Duration.ofSeconds(5))
.build();

return new NettyNioAsyncHttpClient(this, standardOptions.build()
.merge(serviceDefaults)
.merge(NETTY_HTTP_DEFAULTS)
.merge(SdkHttpConfigurationOption.GLOBAL_HTTP_DEFAULTS));

@nikp nikp force-pushed the master branch 3 times, most recently from 3f0ec56 to 1f8981e Compare October 5, 2022 03:31
@nikp
Copy link
Contributor Author

nikp commented Oct 5, 2022

Submitted new revision:

  • Standard SDK configuration now brought from SdkHttpConfigurationOption defaults
  • Consistent interface with Netty client for Duration connectionTimeout and Boolean tcpKeepAlive
  • Additional configuration objects to emphasize more advanced functionality:
    • TcpKeepAliveConfiguration for callers that want to enable keep-alive in the client so need to specify the interval/timeout
    • SocketOptionsConfiguration for callers that want to specify the advanced socket domain/type

@nikp
Copy link
Contributor Author

nikp commented Oct 5, 2022

OK I've removed SocketOptionsConfiguration to leave that for a future change, and applied the rest of the feedback.

@nikp nikp force-pushed the master branch 2 times, most recently from 5afb2ee to 252993f Compare October 5, 2022 21:08
@zoewangg
Copy link
Contributor

zoewangg commented Oct 5, 2022

LGTM. Running the builds now

@zoewangg
Copy link
Contributor

zoewangg commented Oct 5, 2022

@nikp looks like there are some checkstyle errors. We can also take it over here if that works better

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-checkstyle-plugin:3.1.2:check (checkstyle) on project aws-crt-client: Failed during checkstyle execution: 
There are 4 errors reported by Checkstyle 8.42 with software/amazon/awssdk/checkstyle.xml ruleset. -> [Help 1]

You can try locally using the following commands:

mvn clean install -pl :aws-crt-client -P quick --am
mvn clean install -pl :bom-internal
mvn clean install -pl :aws-crt-client

@nikp nikp force-pushed the master branch 2 times, most recently from 7533fee to 9d25519 Compare October 6, 2022 04:55
… and advanced SocketOptions on the AwsCrtAsyncHttpClient. This is necessary for any clients with long-running connections that exceed default socket timeouts of services along the call path, and need to enable keep-alive settings which the CRT client supports, but the Java client wasn't exposing to callers
Copy link
Contributor

@zoewangg zoewangg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution!

@sonarqubecloud
Copy link

sonarqubecloud bot commented Oct 6, 2022

SonarCloud Quality Gate failed.    Quality Gate failed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 6 Code Smells

65.9% 65.9% Coverage
0.0% 0.0% Duplication

@zoewangg zoewangg merged commit 8cd6f5b into aws:master Oct 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants