Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Create a aws.Config with region to be able to work different AWS partition (like gov cloud or china AWS partition) #588

Merged
merged 3 commits into from
Mar 7, 2024

Conversation

calvix
Copy link

@calvix calvix commented Feb 29, 2024

towards giantswarm/roadmap#3254

if this makes sense to you I will make a PR upstream

test in china galaxy and on golem work as expected

@calvix calvix changed the title 🐛 Create a aws.Config with region to be able to use different AWS partition (like gov or china AWS partition) 🐛 Create a aws.Config with region to be able to work different AWS partition (like gov or china AWS partition) Feb 29, 2024
@calvix calvix changed the title 🐛 Create a aws.Config with region to be able to work different AWS partition (like gov or china AWS partition) 🐛 Create a aws.Config with region to be able to work different AWS partition (like gov cloud or china AWS partition) Feb 29, 2024
@calvix calvix marked this pull request as ready for review February 29, 2024 10:27
@calvix calvix self-assigned this Feb 29, 2024
@calvix calvix requested a review from a team February 29, 2024 10:28
@AndiDog
Copy link

AndiDog commented Mar 5, 2024

Where was the bug here? You're now supplying the region explicitly, but I guess that should be provided by CAPA to every AWS "create {instance,NAT gateway,...}" request. What's the relation to the partition (which is aws or aws-cn, for instance)?

@calvix
Copy link
Author

calvix commented Mar 6, 2024

Where was the bug here? You're now supplying the region explicitly, but I guess that should be provided by CAPA to every AWS "create {instance,NAT gateway,...}" request. What's the relation to the partition (which is aws or aws-cn, for instance)?

The bug is not providing the region to the AWS client because if you do not do that on the client creation it will fail with INVALID credentials, as it cannot validate the credentials - since it is connecting to the AWS global and not connecting to the AWS CHINA partition

Access keys generated in the China partition are not valid access keys in AWS global and by trying to check the validity of keys without using the region(which will tell the AWS client which partition to use) it will always use AWS global for and in our case it will fail to validate them. Resulting in non-recoverable error which stops any reconcilation.

@calvix
Copy link
Author

calvix commented Mar 6, 2024

here is the error log, after lot of debugging and injecting a debug output I localized it into that function where the PR fixes it

 > controller="awscluster" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSCluster" AWSCluster="org-giantswarm/galaxy" namespace="org-giantswarm" name="galaxy" reconcileID="e4da8234-95e0-476b-868f-d7c4aeda498c"
E0306 10:02:50.109642       1 controller.go:324] "Reconciler error" err=<
	error getting infra provider cluster or control plane object: failed to create aws session: Failed to retrieve identity credentials: InvalidClientTokenId: The security token included in the request is invalid.
		status code: 403, request id: 391bd0ff-52b9-412d-8d03-82aa8d720bd5
 > controller="awsmachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSMachine" AWSMachine="org-giantswarm/galaxy-control-plane-12c432c9-qrtwx" namespace="org-giantswarm" name="galaxy-control-plane-12c432c9-qrtwx" reconcileID="74834d11-ff72-464a-b63c-df993020ad38"
E0306 10:02:50.187222       1 controller.go:324] "Reconciler error" err=<
	error getting infra provider cluster or control plane object: failed to create aws session: Failed to retrieve identity credentials: InvalidClientTokenId: The security token included in the request is invalid.
		status code: 403, request id: 63ba0c98-c34e-42c4-a276-4fbccc855908
 > controller="awsmachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSMachine" AWSMachine="org-giantswarm/galaxy-control-plane-12c432c9-hpgm5" namespace="org-giantswarm" name="galaxy-control-plane-12c432c9-hpgm5" reconcileID="9246fa92-2b5f-40ab-8e82-5ca5cc464133"
E0306 10:02:50.202148       1 controller.go:324] "Reconciler error" err=<
	error getting infra provider cluster or control plane object: failed to create aws session: Failed to retrieve identity credentials: InvalidClientTokenId: The security token included in the request is invalid.
		status code: 403, request id: 2ff4274b-1075-454f-ad2a-9002de92f501
 > controller="awsmachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSMachine" AWSMachine="org-giantswarm/galaxy-control-plane-12c432c9-bdtfw" namespace="org-giantswarm" name="galaxy-control-plane-12c432c9-bdtfw" reconcileID="80073767-a7ff-44b0-b866-7ef1d16c676d"
E0306 10:08:18.496207       1 controller.go:324] "Reconciler error" err=<
	failed to create scope: failed to create aws session: Failed to retrieve identity credentials: InvalidClientTokenId: The security token included in the request is invalid.
		status code: 403, request id: b1c30992-fdce-48e9-a66b-f951a4fd223d
	sigs.k8s.io/cluster-api-provider-aws/v2/pkg/cloud/scope.NewClusterScope
		/workspace/pkg/cloud/scope/cluster.go:76
	sigs.k8s.io/cluster-api-provider-aws/v2/controllers.(*AWSClusterReconciler).Reconcile
		/workspace/controllers/awscluster_controller.go:173
	sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
		/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.1/pkg/internal/controller/controller.go:118
	sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
		/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.1/pkg/internal/controller/controller.go:314
	sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
		/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.1/pkg/internal/controller/controller.go:265
	sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
		/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.1/pkg/internal/controller/controller.go:226
	runtime.goexit
		/usr/local/go/src/runtime/asm_amd64.s:1650

@@ -140,6 +140,7 @@ github.com/ahmetb/gen-crd-api-reference-docs v0.3.0/go.mod h1:TdjdkYhlOifCQWPs1U
github.com/alcortesm/tgz v0.0.0-20161220082320-9c5fe88206d7 h1:uSoVVbwJiQipAclBbw+8quDsfcvFjOpI5iCf4p/cqCs=
github.com/alcortesm/tgz v0.0.0-20161220082320-9c5fe88206d7/go.mod h1:6zEj6s6u/ghQa61ZWa/C2Aw3RkjiTBOix7dkqa1VLIs=
github.com/alecthomas/assert/v2 v2.3.0 h1:mAsH2wmvjsuvyBvAmCtm7zFsBlb8mIHx5ySLVdDZXL0=
github.com/alecthomas/participle/v2 v2.1.0/go.mod h1:Y1+hAs8DHPmc3YUFzqllV+eSQ9ljPTk0ZkPMtEdAx2c=
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this smells like a go mod tidy is needed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually run go mod tidy and it did nothing

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will just remove this , maybe my IDE does something fishy

Copy link

@whites11 whites11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds legit

@calvix calvix requested a review from AndiDog March 7, 2024 08:37
Copy link

@AndiDog AndiDog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable to me. I think this is hard to test. Let's include the hotfix in our fork, and then please try to upstream it. Maybe a maintainer has more experience with this part of the code and can tell if it's problematic.

Did you test this change with cluster creation in China and non-China? That should be the minimum we do before merging.

@calvix
Copy link
Author

calvix commented Mar 7, 2024

Sounds reasonable to me. I think this is hard to test. Let's include the hotfix in our fork, and then please try to upstream it. Maybe a maintainer has more experience with this part of the code and can tell if it's problematic.

Did you test this change with cluster creation in China and non-China? That should be the minimum we do before merging.

Yes, as stated in the first message I tested in China and on golem

@calvix calvix merged commit 84dfbd6 into release-2.3 Mar 7, 2024
5 checks passed
@calvix calvix deleted the create-aws-client-with-region branch March 7, 2024 08:47
fiunchinho pushed a commit that referenced this pull request Jul 1, 2024
…partition (like gov cloud or china AWS partition) (#588)

* create-aws-client-with-region
fiunchinho added a commit that referenced this pull request Jul 4, 2024
* Add Giant Swarm fork modifications

* Push to Azure registry

* aws-cni-deleted-helm-managed-resources

* import-order

* Filter CNI subnets when creating EKS NodeGroup

* add godoc

* 🐛 Create a `aws.Config` with region to be able to work different AWS partition (like gov cloud or china AWS partition) (#588)

* create-aws-client-with-region

* 🐛 Add ID to secondary subnets (#589)

* give name to secondary subnets

* make linter happy

* Add non root volumes to AWSMachineTemplate

* Support adding custom secondary VPC CIDR blocks in `AWSCluster` (backport) (#590)

* S3 user data support for `AWSMachinePool` (#592)

* Delete machine pool user data files that did not get deleted yet by the lifecycle policy (#593)

* Delete machine pool user data files that did not get deleted yet by the lifecycle policy

* Use paging for S3 results

* Log S3 list operation

* Handle NotFound

* Remove duplicated argument

* Add `make test` to Circle CI build, S3 test fixes (#596)

* Cancel instance refresh on any relevant change to ASG instead of blocking until previous one is finished (which may have led to failing nodes due to outdated join token) (#598)

* Use feature gate for S3 storage (#599)

* Fixes after cherry-pick our customizations

---------

Co-authored-by: Andreas Sommer <andreas@giantswarm.io>
Co-authored-by: calvix <vaclav@giantswarm.io>
Co-authored-by: Mario Nitchev <marionitchev@gmail.com>
Co-authored-by: calvix <rozsypalek.vaclav@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants