Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No ability to set dynamic value for kubeReserved cpu #6665

Open
jukie opened this issue Aug 6, 2024 · 8 comments
Open

No ability to set dynamic value for kubeReserved cpu #6665

jukie opened this issue Aug 6, 2024 · 8 comments
Labels
feature New feature or request needs-triage Issues that need to be triaged

Comments

@jukie
Copy link
Contributor

jukie commented Aug 6, 2024

Description

What problem are you trying to solve?
I'd like the ability to set a common kubelet configuration value for maxPods and kubeReserved.memory but would like to retain the dynamic cpu configuration that's used in the eks bootstrap.sh
. This is because settings kubelet.maxPods alone doesn't lead to any changes in kubelet.kubeReserved as those values will still be generated based on /etc/eks/eni-max-pods.txt

If I only set kubeReserved.memory kubelet will fail to start due to an invalid config and if I override the value in bootstrap.sh during userdata execution Karpenter is then unaware of the actual values when making scheduling decisions.

Is there a way to configure a NodePool in a way that would allow for setting a static value for kubeReserved.memory while also allowing for the dynamic CPU configuration from bootstrap.sh?

# Calculates the amount of CPU to reserve for kubeReserved in millicores from the total number of vCPUs available on the instance.
# From the total core capacity of this worker node, we calculate the CPU resources to reserve by reserving a percentage
# of the available cores in each range up to the total number of cores available on the instance.
# We are using these CPU ranges from GKE (https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-architecture#node_allocatable):
# 6% of the first core
# 1% of the next core (up to 2 cores)
# 0.5% of the next 2 cores (up to 4 cores)
# 0.25% of any cores above 4 cores
# Return:
#   CPU resources to reserve in millicores (m)
get_cpu_millicores_to_reserve() {
  local total_cpu_on_instance=$(($(nproc) * 1000))
  local cpu_ranges=(0 1000 2000 4000 $total_cpu_on_instance)
  local cpu_percentage_reserved_for_ranges=(600 100 50 25)
  cpu_to_reserve="0"
  for i in "${!cpu_percentage_reserved_for_ranges[@]}"; do
    local start_range=${cpu_ranges[$i]}
    local end_range=${cpu_ranges[(($i + 1))]}
    local percentage_to_reserve_for_range=${cpu_percentage_reserved_for_ranges[$i]}
    cpu_to_reserve=$(($cpu_to_reserve + $(get_resource_to_reserve_in_range $total_cpu_on_instance $start_range $end_range $percentage_to_reserve_for_range)))
  done
  echo $cpu_to_reserve
}

How important is this feature to you?
Very important and would be willing to contribute this feature

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@jukie jukie added feature New feature or request needs-triage Issues that need to be triaged labels Aug 6, 2024
@jukie
Copy link
Contributor Author

jukie commented Aug 6, 2024

This probably should've been opened under karpenter-core instead so I've created kubernetes-sigs/karpenter#1518 and can close whichever one makes most sense to keep.

@jukie jukie closed this as completed Aug 7, 2024
@jukie jukie reopened this Aug 7, 2024
@jukie
Copy link
Contributor Author

jukie commented Aug 7, 2024

Reopening as this may be doable within the aws provider. This function and/or the logic above it could be adjusted to consider each resource. It currently just checks whether kubelet.kubereserved is non-null and requires everything to be set but if it instead checked each resource that could give the desired result.

Looking further that should be happening already it seems so maybe we just need to explicitly call each kubereserved flag?

KubeReserved: kubeReservedResources(cpu(info), pods(ctx, info, amiFamily, maxPods, podsPerCore), ENILimitedPods(ctx, info), amiFamily, kubeReserved),

@njtran
Copy link
Contributor

njtran commented Aug 12, 2024

As discussed in the upstream issue, looking to discuss this in just the provider. Can you give more details on why you want to override one part of the kube-reserved resources, but not all? Please correct me if that's not your request.

@jukie
Copy link
Contributor Author

jukie commented Aug 12, 2024

As you mentioned in kubernetes-sigs/karpenter#1518 (comment) memory scales with pod density and that's what I'd like to change as the default kubelet.kubeReserved.memory value isn't applied based on kubelet.maxPods but instead is applied by bootstrap.sh which always uses the max_pods value in /etc/eks/eni-max-pods.txt:
https://github.com/awslabs/amazon-eks-ami/blob/62c3103c9e282b6f67fbd5865512c962f93ae784/templates/al2/runtime/bootstrap.sh#L510.

Ideally what I'm asking for is to have setting maxPods lead to a properly evaluated value for kubeReserved.memory. I think the easiest way for this to be implemented would be for Karpenter to always provide --kube-reserved for KUBELET_EXTRA_ARGS during user data execution, and merge user provided values with the computed instance type's defaults.

@stevehipwell
Copy link
Contributor

I think this is a duplicate of #1803.

@jukie
Copy link
Contributor Author

jukie commented Aug 18, 2024

Slightly different but depending on the solution could solve both. Technically my issue is that I'm its current form kube-reserves config is all or nothing, and I want karpenter to provide computed defaults for any empty fields.

I don't think solving this through userdata or changes in bootstrap.sh is ideal as there should be a common way to configure across cloud providers or even when using custom Ami's without bootstrap.sh. Additionally, having it computed by Karpenter is preferable to handle node scheduling and make sure the perceived allocatable resources matches actual. That's a separate discussion but I think the vm overhead calculation logic could be tweaked as well.

Karpenter already has the logic for calculating kube-reserved so I think it should simply always provide these to kubelet args.

@stevehipwell
Copy link
Contributor

@jukie kube reserved is a kubelet setting so you have to integrate it via user data.

@jukie
Copy link
Contributor Author

jukie commented Aug 26, 2024

Correct, and when a user provides kube reserved values these are already handled that way by Karpenter. What I'm suggesting is that when partial fields are provided the empty ones should have defaults injected or take it further to always provide them.
The main point is that in certain scenarios the Karpenter perceived values will differ from the bootstrap.sh generated values or if using a custom ami without bootstrap.sh these values will go unconfigured.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request needs-triage Issues that need to be triaged
Projects
None yet
Development

No branches or pull requests

3 participants