Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression: Proxy Configuration from Environment Variables breaks custom proxy support #4728

Open
abatkin opened this issue Nov 30, 2023 · 8 comments
Labels
bug This issue is a bug. p2 This is a standard priority issue proxy This issue is related to a proxy configuration

Comments

@abatkin
Copy link

abatkin commented Nov 30, 2023

Describe the bug

We have code to customize all AWS SDK Clients to set proxies appropriately. For example, we can use VPC Endpoints for some services (which means ensuring that a proxy server is not used), whereas for other services, we must use a proxy server. To further complicate matters, for S3, we can use a VPC Endpoint (no-proxy) from some network locations but not from others (because it is a Gateway endpoint). Until recently, this worked fine, but with #4467, this no longer works, since the default behavior is to use environment variables (which are always set, since Java isn't the only language in our environment).

Expected Behavior

The expected behavior would be for the AWS Java SDK to default to not using environment variables for configuring proxies, as anything else is a breaking change.

Current Behavior

The AWS Java SDK now defaults to using environment variables, which overrides our configuration (which explicitly does not set a proxy for some services).

Reproduction Steps

First, set HTTP_PROXY and HTTPS_PROXY with the expectation that they will not be used by Java (since they weren't used by Java up until a few weks ago).

Then, customize an AWS SDK client to explicitly not use a proxy:

ProxyConfiguration proxyConfiguration = ProxyConfiguration.builder()
  .useSystemPropertyValues(false)
  .build(); // This proxy configuration (previously) told the SDK to NOT use a proxy, no matter what
ApacheHttpClient.Builder httpClientBuilder = ...
httpClientBuilder.proxyConfiguration(proxyConfiguration);
...
// construct SDK Client using the httpClientBuilder

When that client is used, notice that it will use the proxy, even though it should not have. Now re-run with an older AWS SDK version and notice that it will not use the proxy.

Possible Solution

Change the default to not use environment variables, and allow users to opt-in.

Additional Information/Context

Note that this is a breaking change. But I think it is reasonable:

AWS Java SDK version used

2.21.23

JDK version used

Any

Operating System and version

Any

@abatkin abatkin added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Nov 30, 2023
@joviegas
Copy link
Contributor

joviegas commented Nov 30, 2023

@abatkin
Looking at a generic side, the application running in an environment should automatically pick up environment settings for that environment if that ENV is supported. This behavior is in line with the JAVA V1 behaviour

Could you please set 'useEnvironmentVariableValues' to 'false' in order to disable this?"

ProxyConfiguration proxyConfiguration = ProxyConfiguration.builder()
  .useSystemPropertyValues(false)
  .useEnvironmentVariableValues(false)
  .build();

Disabling it by default may inconvenience users relying on environment variables. For instance, in a production scenario, users can effortlessly modify proxies for specific hosts by simply adding/Deleting environment settings to impacted or target host without requiring any code changes.

@abatkin
Copy link
Author

abatkin commented Nov 30, 2023

I'm well aware of how to fix it (after all, we do the same thing with the system properties). My point is that for anyone that was trying to programmatically disable proxy usage, this change is a regression. Before the change, they would not use a proxy, after the change they would use a proxy.

And since the mechanism being used requires a new Java API, it also means that we can't write code that is forwards and backwards compatible between pre and post change. Our proxy configuration code lives in a shared library that is used by hundreds of applications.

In other words, we are forced to tell our users that they must pin to an old SDK version or update to our new library version together with the new AWS SDK version. Whereas (at least in our environment) we only required new AWS SDK versions when a new (end-user facing) feature was required (and also, people could update their AWS SDK versions at their leisure, without fear of it suddenly breaking).

This is a regression. It certainly makes the behavior more consistent with other programming languages (and maybe even AWS Java SDK V1) but it is still a regression.

@abatkin
Copy link
Author

abatkin commented Dec 1, 2023

This is also a potential regression for existing applications that are not doing any custom proxy configuration and are relying on VPC Endpoints for security/compliance/cost (or whatever) reasons. Those applications might stop functioning. Or worse, an application that was previously tested and confirmed to be using a VPC Endpoint might suddenly start sending traffic through a proxy server (and out over the internet), without anyone even noticing.

@joviegas
Copy link
Contributor

joviegas commented Dec 1, 2023

Thank you for the information, @abatkin. We do understand this is regression, will review this with the team and take appropriate action.

  • Could you please specify the HTTP client used in this case? The CRT HTTP client, it traditionally utilizes environment variables by default. We will be maintaining this default behavior at-least for CRT http clients

  • To better understand the use case, could you elaborate on it further? For instance, in the scenario code to customize all AWS SDK Clients to set proxies appropriately, if proxy settings differ for each client, why use a global environment variable instead of configuring it within individual clients? Is the proxy environment variable set once per host or does each individual processes override the environment variable ?

@abatkin
Copy link
Author

abatkin commented Dec 1, 2023

We use the Apache HTTP Client.

I'm not sure what you mean in the second question. Due to the networking topology (i.e. Interface VPC Endpoints, Gateway VPC Endpoint for S3 and a dedicated proxy server just for S3, plus a "regular" proxy server for other AWS services/general internet traffic) it is impossible to use environment variables (for example, there is no mechanism to specify different proxy servers depending on destination). Therefore, we customize all AWS SDK Clients to configure proxy servers, as well as a few other things (including authenticating using our federated IdP). For ease-of-use we use the SdkBuilder.applyMutation() function and pass in a single "customizer" that can be used across any AWS SDK Client.

Proxy environment variables are mostly managed and pushed out systematically/automatically for all workstations, servers, docker images, etc.

@joviegas
Copy link
Contributor

joviegas commented Dec 1, 2023

We talked as a team and we really want honoring the proxy environment variables to be the SDK’s default. That matches a lot of standards, including those of many other AWS SDKs and the AWS SDK for Java 1.x.
Are there any other knobs or configuration we could provide in the SDK to make it an easier transition?

@abatkin
Copy link
Author

abatkin commented Dec 1, 2023

I think the best knob (and the one that would prevent the least amount of surprise for everyone, including us) would be to at least fix #4703 and follow the standard for parsing NO_PROXY (since you can't argue that you want Java to "match a lot of standards" if you deliberately don't do that).

Something else (but a bigger change) would be if there was some way to express "don't use a proxy here" since as of now there is no way to do that (instead the current process consists of not setting a proxy, disabling the use of system properties, and now disabling the use of environment variables). This may not be worth it, unless you think there will be other things in the future that may override the user's intent to not use a proxy.

Thinking out loud: It might have been nice if the "use environment variables" feature could have defaulted to off for a period of time (6 months) before switching to "on". Then we could have a better transition window where we can provide semi-backward-compatible tools to our users. Given the current state of things, this may not be worth it either.

@debora-ito debora-ito removed the needs-triage This issue or PR still needs to be triaged. label Dec 1, 2023
@joviegas
Copy link
Contributor

joviegas commented Dec 2, 2023

Thanks @abatkin for the recommendation.
Apologies for NO_PROXY bug , somehow this went completely out of my radar.
Added the fix for NO_PROXY with #4735.

@debora-ito debora-ito added the p2 This is a standard priority issue label Dec 23, 2023
@debora-ito debora-ito added the proxy This issue is related to a proxy configuration label Jun 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. p2 This is a standard priority issue proxy This issue is related to a proxy configuration
Projects
None yet
Development

No branches or pull requests

3 participants