Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(aws): add tls ca provision #5097

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

hjoshi123
Copy link
Contributor

@hjoshi123 hjoshi123 commented Feb 15, 2025

Description

Adds TLS CA, TLS Client Cert and key to AWS Provider.

Uses custom http transport to feed to aws config default options if tls config is provided.

Fixes #5026

Checklist

  • Unit tests updated
  • End user documentation updated

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign szuecs for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 15, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @hjoshi123. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Feb 15, 2025
@mloiseleur
Copy link
Contributor

/retitle feat(aws): add tls ca provision
/ok-to-test
@hjoshi123 Would you please update documentation explaining users that it exists and how to use it ?

@k8s-ci-robot k8s-ci-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Feb 15, 2025
@k8s-ci-robot k8s-ci-robot changed the title feat: added tls ca provision to aws provider feat(aws): add tls ca provision Feb 15, 2025
@k8s-ci-robot k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Feb 15, 2025
provider/aws/config_test.go Show resolved Hide resolved
pkg/tlsutils/tlsconfig.go Outdated Show resolved Hide resolved
provider/aws/config.go Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Feb 15, 2025
@hjoshi123
Copy link
Contributor Author

@ivankatliarchuk @mloiseleur not sure what is up with the lint job. In my local make go-lint is working without any errors. This is the error in CI. The tests are passing now

jsonschema: "linters-settings" does not validate with "/properties/linters-settings/additionalProperties": additional properties 'maligned' not allowed
jsonschema: "run" does not validate with "/properties/run/additionalProperties": additional properties 'exclude-files' not allowed
Failed executing command with error: the configuration contains invalid elements
, Error: Command failed: /home/runner/golangci-lint-1.63.4-linux-amd64/golangci-lint config verify
jsonschema: "linters-settings" does not validate with "/properties/linters-settings/additionalProperties": additional properties 'maligned' not allowed
jsonschema: "run" does not validate with "/properties/run/additionalProperties": additional properties 'exclude-files' not allowed
Failed executing command with error: the configuration contains invalid elements

    at genericNodeError (node:internal/errors:984:15)
    at wrappedFn (node:internal/errors:5[38](https://github.com/kubernetes-sigs/external-dns/actions/runs/13348358593/job/37281685858?pr=5097#step:7:41):14)
    at ChildProcess.exithandler (node:child_process:422:12)
    at ChildProcess.emit (node:events:519:28)
    at maybeClose (node:internal/child_process:1105:16)
    at ChildProcess._handle.onexit (node:internal/child_process:305:5)
Error: Command failed: /home/runner/golangci-lint-1.63.4-linux-amd64/golangci-lint config verify
jsonschema: "linters-settings" does not validate with "/properties/linters-settings/additionalProperties": additional properties 'maligned' not allowed
jsonschema: "run" does not validate with "/properties/run/additionalProperties": additional properties 'exclude-files' not allowed
Failed executing command with error: the configuration contains invalid elements

@ivankatliarchuk
Copy link
Contributor

Apologies there going to be a lot of text, but I would suggest to extract roots, _ = x509.SystemCertPool() logic into separate pull request, as I could not approve it, and this require approval/review from project owners.

So what most likely should go in documentation in some form as a recomendation

  • The recommended approach is not to package certicates with the docker/code/system unless it's a close-source software.
  • In scope of work, the open source service external-dns is going to make a call to a hyperscaler via a proxy, and external-dns know nothing about certicate that is required for proxy, so in theory it should not be available on system level. I made this up, not sure how to explain correctly.
  • To alligh with Industry standards is to issue a short lived intermediate certificates and mount them in the same mount or split them by TTL lifetime
  • ^ one could re-package application, if there is a requirement to package application and certificate instead of mounting them at runtime
  • It very nice to support multiple mount point location for certificates, for cases with multiple issuers, or different TTL, but let's not overcomplicate this, so we could ignore that

@ivankatliarchuk
Copy link
Contributor

I would reference here The Adobe Common Controls Framework (CCF), here is a link https://www.adobe.com/trust/compliance/adobe-ccf.html, there are
around 4000+ controls, but we only interested in few very relevant

The Go code involving x509.CertPool relates to several key security domains within CCF.

Relevant CCF Controls for x509.CertPool Security

1. Cryptographic Controls & Certificate Management

Related CCF Domains:

  • CCF-CM-1: Use of Strong Cryptographic Standards
  • CCF-CM-2: Certificate and Key Management
  • CCF-CM-3: Trusted Certificate Authorities

How It Relates to the Code:

  • Using x509.SystemCertPool() ensures system-trusted certificates are used, aligning with CCF’s requirement to rely on approved CAs.
  • Using an empty x509.CertPool and adding/mounting CAs aligns with stricter cryptographic controls by allowing explicit trust settings.

Recommendation:

  • Ensure certificates are sourced from a trusted and validated CA.
  • Regularly update and rotate CA certificates to comply with lifecycle management policies.

2. System & Network Security

Related CCF Domains:

  • CCF-NS-3: Secure TLS Configurations
  • CCF-NS-5: Prevent Unauthorized Certificate Authorities

How It Relates to Your Code:

  • If the system CA store is compromised, x509.SystemCertPool() could trust unverified CAs.
  • A custom CertPool prevents unauthorized certificates, improving security.

Recommendation:

  • Use TLS best practices (e.g., enforcing strong ciphers and TLS 1.2+).
  • Monitor system CA stores for unauthorized modifications.
  • Prefer explicit trust models (Option 2) in high-security environments.

3. Error Handling & Secure Coding Practices

Related CCF Domains:

  • CCF-SD-1: Secure Software Development Lifecycle (SDLC)
  • CCF-SD-3: Proper Error Handling

How It Relates to Code:

  • Ignoring errors (_ = x509.SystemCertPool()) is a bad security practice.
  • If x509.SystemCertPool() fails (e.g., Windows lacks support), a nil roots could cause panics or security failures.

Recommendation:

  • Always check for errors when calling x509.SystemCertPool().
  • Implement logging and alerting for failures.
Option Security Implication CCF Compliance Alignment
Option 1 (SystemCertPool()) Trusts system CAs (convenient but can be risky) ✅ Meets CCF-CM-3 (Trusted CAs) but ❌ may not meet CCF-NS-5 (Unauthorized CAs)
Option 2 (Empty CertPool) Only explicitly added CAs are trusted ✅ Stronger security, aligns with CCF-NS-5 and CCF-CM-2

@hjoshi123
Copy link
Contributor Author

@ivankatliarchuk sure I can change it back to start form empty cert pool no issues.. We can make that as part of a separate discussion

@hjoshi123
Copy link
Contributor Author

@ivankatliarchuk do you have any idea about the ci lint job error?

@ivankatliarchuk
Copy link
Contributor

I'll try to find a solution for linter. We recently added few things, looks like a bug was introduced

@hjoshi123
Copy link
Contributor Author

@ivankatliarchuk I can debug and try to fix it in a different PR if its fine by you?

@ivankatliarchuk
Copy link
Contributor

A fix for this issue appears to be available in another pull request (https://github.com/kubernetes-sigs/external-dns/pull/5085/files). Hopefully, it will be reviewed and approved soon.

@hjoshi123
Copy link
Contributor Author

A fix for this issue appears to be available in another pull request (https://github.com/kubernetes-sigs/external-dns/pull/5085/files). Hopefully, it will be reviewed and approved soon.

Ah I see.. Let's wait for it then.. otherwise does the PR look good to you?

@ivankatliarchuk
Copy link
Contributor

ivankatliarchuk commented Feb 15, 2025

A fix for this issue appears to be available in another pull request (https://github.com/kubernetes-sigs/external-dns/pull/5085/files). Hopefully, it will be reviewed and approved soon.

Ah I see.. Let's wait for it then.. otherwise does the PR look good to you?

Few things to consider, and something like a test on real-cluster/local-cluster will speed up review.

As @mloiseleur pointed out, you need to consider what to add to a documentation, maybe as well provide an example setup with a proxy of your choice in the docs on top of the description/recomendations.

@hjoshi123
Copy link
Contributor Author

A fix for this issue appears to be available in another pull request (https://github.com/kubernetes-sigs/external-dns/pull/5085/files). Hopefully, it will be reviewed and approved soon.

Ah I see.. Let's wait for it then.. otherwise does the PR look good to you?

Few things to consider, and something like a test on real-cluster/local-cluster will speed up review.

As @mloiseleur pointed out, you need to consider what to add to a documentation, maybe as well provide an example setup with a proxy of your choice in the docs on top of the description/recomendations.

Hmm.. I am not sure how to test it in real world since I dont have a proxy working or any real world ca-certs. I can work on the documentation though.
My unit test actually tries to mimic the real world scenario that the issue was talking about.. it creates a root CA first and then uses the rootCA to issue child certs and child keys which is then fed to aws config through http transport.. but if you have better suggestions might help. Meanwhile I can check on the documentation

@ivankatliarchuk
Copy link
Contributor

I'll throw some ideas

  • There is an issue, so worth to ask a requestor to share his setup, he may be able to point out or explain things around how to configure proxy and certificates.
  • How AWS credentials provided to external-dns when it's expected to run behind a proxy
  • Instrument a local proxy with certificates in minikube or some other cluster of you choice
  • Review this code there is an example how to instrument aws-sdk-go-v2 behind a proxy. Not as useful code, but go with proxy example
  • The unit tests should have a proxy setup and aws stubs something like test -> HTTP proxy -> calls to fake AWS provider

@hjoshi123
Copy link
Contributor Author

hjoshi123 commented Feb 15, 2025

@ivankatliarchuk I didnt think it needed to be that detailed since the idea is similar to the rfc2136 provider where similar kind of tests are being done for TLS. @mloiseleur what do you think?
Definitely worth asking the requestor if he can share his setup

From the credentials perspective I dont think anything changes since we are just providing TLS certs for the traffic to flow.. the environmental values or the way AWS picks up creds should be same

@ivankatliarchuk
Copy link
Contributor

ivankatliarchuk commented Feb 15, 2025

Np, I think I shared enough information to consider. Could be that I did not spend enough time to understand the original issue, and this is my understanding of client with forward proxy

Screenshot 2025-02-15 at 22 31 44

I found this code for older go-sdk version, that nicely describes go client behind a proxy case https://gist.github.com/jakexks/2f876697dfca1fe15b92f7bb6032780d

@hjoshi123
Copy link
Contributor Author

@ivankatliarchuk that makes sense what you sent to check if the proxy is working and if the request coming from the proxy with the TLS CA Cert is trusted by AWS to generate the config.
Imo getting the raw credentials through profile or environmental variables wouldnt change since that is dictated by the actual client the only thing that changes is how the request is made.

In summary, I agree we need a better way to test the use case of proxy, but I feel from a unit test perspective that is beyond the scope of unit test. @mloiseleur What do you suggest?

It would be nice if @cavdhut could give us more insights on how he uses the setup and if the changes made accommodate the use case he spoke about.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add client TLS flags for AWS provider
4 participants