Skip to content

Conversation

mikedosborne
Copy link
Contributor

@mikedosborne mikedosborne commented Feb 10, 2025

[sc-269517]
https://app.shortcut.com/dataiku/story/269517/eks-make-the-registry-a-parameter-for-private-cluster-in-case-autoscaling-is-required

Certain images are pulled from the official Kubernetes registry such as autoscaling images.
However, fully private clusters (or airgapped) do not have access to the internet and require the use of an ECR within the VPC.
Since the auto-scaling image registry is hard coded, we need to make it customisable (falling back to the official K8s registry by default).

Also the path of the images within the repository must be documented. For autoscaling, it'll be: /autoscaling/cluster-autoscaler:%(autoscalerimageversion)s where autoscalerimageversion depends on the k8s version according to the following map:

  • ≤ 1.24 ⇒ "v1.24.3"
  • 1.25 ⇒ "v1.25.3"
  • 1.26 ⇒ "v1.26.4"
  • 1.27 ⇒ "v1.27.3"
  • ≥ 1.28 ⇒ "v1.28.0"

@vrutz vrutz self-assigned this Oct 2, 2025
@vrutz vrutz added this to the 14.2.2 milestone Oct 2, 2025
@vrutz vrutz marked this pull request as draft October 2, 2025 16:42
@vrutz vrutz changed the title Feature/custom autoscaling registry url Add custom image registry URL to the configuration for fully private clusters Oct 2, 2025
@vrutz vrutz requested a review from a team October 2, 2025 16:56
@vrutz vrutz marked this pull request as ready for review October 6, 2025 08:14
@pjestin-dku pjestin-dku self-requested a review October 8, 2025 15:57
@amandineslx amandineslx self-requested a review October 9, 2025 14:52
Copy link
Contributor

@pjestin-dku pjestin-dku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM!
I tried with the camel case change: I was able to provision a fully private EKS cluster with no access to the internet. Here is what I did:

  • Created a separate VPC with 2 private subnets
  • Created peering between my fleet and the EKS VPC
  • Set the correct routes
  • Created an SG to allow only private traffic
  • Pulled this image and pushed it in a custom ECR repo: https://gallery.ecr.aws/bitnami/cluster-autoscaler
  • Created a cluster with the plugin, setting the above SG in "Security Groups" and "Control plane SG", setting the cluster as fully private (without skipping the endpoint creation) and setting the URL for the ECR repo created above
  • Once provisioned, I was able to run jobs on the cluster
  • I checked the autoscaler pod, and it is able pull the image from the custom ECR repo:
$ kubectl get po -n kube-system cluster-autoscaler-6cc8754b6b-dkzch
NAME                                  READY   STATUS    RESTARTS   AGE
cluster-autoscaler-6cc8754b6b-dkzch   1/1     Running   0          11m
  • I confirmed that the pod's image is the custom one:
$ kubectl get po -n kube-system cluster-autoscaler-6cc8754b6b-dkzch -o yaml | grep -C 5 image
    - --stderrthreshold=info
    - --cloud-provider=aws
    - --skip-nodes-with-local-storage=false
    - --expander=least-waste
    - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/pjestin-as
    image: 236706865914.dkr.ecr.eu-west-1.amazonaws.com/pjestin/autoscaling/cluster-autoscaler:v1.28.0
    imagePullPolicy: Always
    name: cluster-autoscaler
    resources:
      limits:
        cpu: 100m
        memory: 600Mi

Copy link
Contributor

@amandineslx amandineslx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK for me

Review

  • deploy DSS 14.2.0-beta5 in elastic design fleet through FM in AWS
  • upgrade EKS plugin with the one built from the PR (commit 889f384e4aa6d5ecd5599b42802fc957c258e0e2)
  • switch plugin to dev mode and change default value for the autoscaler registry to toto
  • create and start cluster with autoscaler and autoscalerRegistryURL=registry.k8s.io/
    • ✅ autoscaler is correctly created from the right registry
  • stop cluster
  • ✅ start cluster without autoscaler
    • no autoscaler installed
  • ✅ add autoscaler with the macro and autoscalerRegistryURL=registry.k8s.io/
    • ✅ autoscaler is correctly created from the right registry
  • stop cluster
  • ✅ start cluster without autoscaler
    • no autoscaler installed
  • ✅ add a node pool with autoscaling and autoscalerRegistryURL=registry.k8s.io/
    • ✅ autoscaler is correctly created from the right registry
  • stop and delete cluster

@vrutz vrutz merged commit d17c6bf into master Oct 10, 2025
@vrutz vrutz deleted the feature/custom_autoscaling_registry_url branch October 10, 2025 07:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants