This guide helps you calculate and apply appropriate CPU and memory requests/limits for workloads in our Kubernetes clusters based on actual usage patterns.
The calculate-resources.rb script:
- Collects current resource configurations from the cluster
- Queries Prometheus for 7-day usage metrics (95th and 99th percentiles)
- Calculates recommendations based on actual usage + safety multipliers + minimum thresholds
- Outputs a CSV with current vs recommended values and diff calculations
Recommendations are based on:
- CPU Requests: 95th percentile usage × 1.3
- CPU Limits: 99th percentile usage × 1.5
- Memory Requests: 95th percentile usage × 1.2
- Memory Limits: 99th percentile usage × 1.3
With category-specific minimums:
- Rails apps (web, worker, cable): 250m-1000m CPU, 2Gi-4Gi memory
- Java apps (FITS, Solr, Elasticsearch): 500m-1000m CPU, 1Gi-2Gi memory
- Fcrepo: 250m-1000m CPU, 3Gi-4Gi memory (Java heap + non-heap)
- Databases (PostgreSQL, MySQL): 100m-500m CPU, 512Mi-1Gi memory
- Cache (Redis, Memcached): 50m-250m CPU, 256Mi-512Mi memory
- Utility: 50m-100m CPU, 128Mi-256Mi memory
All values are rounded to sensible increments for easier comparison and maintenance.
kubectlconfigured with cluster accesskubectxfor context management- Ruby 3.x with
jsonandcsvgems (built-in) curlfor Prometheus queries
- Read access to all namespaces in the target cluster
- Access to Prometheus in
cattle-monitoring-systemnamespace
Edit the script to set your target cluster:
CLUSTER = 'r2-friends' # or 'r2-besties'Set your kubectl context:
kubectx r2-friendsThe script will verify you're in the correct context before running.
In a separate terminal, create a port-forward to Prometheus:
kubectl port-forward -n cattle-monitoring-system \
svc/rancher-monitoring-prometheus 9090:9090Important: Keep this running while the script executes.
ruby calculate-resources.rbExpected runtime: 2-3 minutes
Output:
r2-friends-resources-with-recommendations.csv- Main resultsr2-friends-cpu-p95.csv- 95th percentile CPU metricsr2-friends-cpu-p99.csv- 99th percentile CPU metricsr2-friends-mem-p95.csv- 95th percentile memory metricsr2-friends-mem-p99.csv- 99th percentile memory metrics
The CSV contains these key columns:
namespace- Kubernetes namespaceowner_type- Deployment or StatefulSetowner_name- Name of the deployment/statefulsetcontainer- Container name within the podcontainer_type- Detected category (rails_app, java_app, etc.)
cpu_request_current- Current CPU requestcpu_limit_current- Current CPU limitmem_request_current- Current memory requestmem_limit_current- Current memory limit
cpu_p95_m- 95th percentile CPU usage (millicores)cpu_p99_m- 99th percentile CPU usage (millicores)mem_p95_mi- 95th percentile memory usage (Mi)mem_p99_mi- 99th percentile memory usage (Mi)
cpu_request_recommended- Recommended CPU requestcpu_limit_recommended- Recommended CPU limitmem_request_recommended- Recommended memory requestmem_limit_recommended- Recommended memory limit
cpu_request_diff- Change needed for CPU request (+/- values)cpu_limit_diff- Change needed for CPU limit (+/- values)mem_request_diff- Change needed for memory request (+/- values)mem_limit_diff- Change needed for memory limit (+/- values)
stanza- Ready-to-paste YAML for Helm values files
Interpreting diffs:
- Positive values (e.g.,
+512Mi): Need to increase resources - Negative values (e.g.,
-1Gi): Can reduce resources (over-provisioned) - Empty: No current value set
Most applications are deployed via Helm charts with values in ops/*-deploy.tmpl.yaml files.
- Open the application's values file
# Example: crash-world-cake
cd crash_world_cake
vim ops/friends-deploy.tmpl.yaml-
Update resource values
Copy the
stanzacolumn value for each container and paste into the appropriate section:
# Main web application
resources:
limits:
memory: "4Gi"
cpu: "1000m"
requests:
memory: "2Gi"
cpu: "350m"
# Worker
worker:
resources:
limits:
memory: "4Gi"
cpu: "1000m"
requests:
memory: "2Gi"
cpu: "250m"
# Redis (Bitnami chart - note the master. prefix)
redis:
master:
resources:
limits:
memory: "512Mi"
cpu: "250m"
requests:
memory: "256Mi"
cpu: "50m"
# Fcrepo PostgreSQL (Bitnami subchart - note the primary. prefix)
fcrepo:
postgresql:
primary:
resources:
limits:
memory: "1Gi"
cpu: "500m"
requests:
memory: "512Mi"
cpu: "100m"- Commit and push changes
git add ops/friends-deploy.tmpl.yaml
git commit -m "Update resource limits based on usage analysis"
git push-
Deploy via CI/CD
Your GitHub Actions workflow will build and deploy with the new values.
For quick testing without rebuilding images:
NAMESPACE="your-namespace"
CURRENT_TAG=$(kubectl get deployment your-deployment -n $NAMESPACE \
-o jsonpath='{.spec.template.spec.containers[0].image}' | cut -d':' -f2)
kubectl patch deployment your-deployment -n $NAMESPACE --type strategic --patch "
spec:
template:
spec:
containers:
- name: your-container
image: your-image:$CURRENT_TAG
resources:
limits:
memory: \"4Gi\"
cpu: \"1000m\"
requests:
memory: \"2Gi\"
cpu: \"350m\"
"Note: This is for testing only. Changes made via kubectl will be overwritten by the next Helm deployment.
Bitnami charts (Redis, PostgreSQL) use nested resource paths:
# Redis
redis:
master: # Not just 'resources:'
resources:
limits: ...
# PostgreSQL
postgresql:
primary: # Not just 'resources:'
resources:
limits: ...Java apps (fcrepo, FITS, Solr) need memory limits that account for:
- JVM heap size (
-Xmx) - Non-heap memory (metaspace, thread stacks, etc.)
Rule of thumb: Memory limit should be 1.3-1.5x the heap size.
The recommendations are based on steady-state usage. Rails applications may need:
- 2-3x more memory during startup (gem loading, asset compilation)
- Higher CPU during startup (initialization, migrations)
Our minimums (2Gi-4Gi for Rails) account for this, but monitor startup behavior after applying changes.
Edit the MINIMUMS hash in the script to adjust category minimums:
MINIMUMS = {
rails_app: { cpu_request: 250, cpu_limit: 1000, mem_request: 2048, mem_limit: 4096 },
# ... other categories
}All memory values are in Mi, CPU values are in millicores.
Cause: kubectl context doesn't match the CLUSTER variable in the script.
Solution:
kubectx r2-friends # or r2-bestiesCause: Prometheus port-forward is not running or has disconnected.
Solution:
# In a separate terminal
kubectl port-forward -n cattle-monitoring-system \
svc/rancher-monitoring-prometheus 9090:9090Cause: Invalid Prometheus query or connectivity issue.
Solution:
- Verify Prometheus is accessible:
curl http://localhost:9090/-/healthy - Check Prometheus logs:
kubectl logs -n cattle-monitoring-system -l app=prometheus
Symptom: A container is categorized incorrectly (e.g., Redis as rails_app).
Cause: The pattern matching in container_type() needs adjustment.
Solution: Update the pattern matching order in the script. Specific patterns should come before generic ones:
def container_type(pod_name, container_name, owner_name)
combined = "#{pod_name} #{container_name} #{owner_name}".downcase
# Specific services FIRST
if combined =~ /redis|memcached/
return :cache
# ...
# Generic patterns LAST
elsif combined =~ /hyrax|hyku|rails/
return :rails_app
# ...
endSymptom: cpu_p95_m and similar columns are empty.
Cause: Deployment is less than 7 days old or Prometheus hasn't collected metrics yet.
Solution:
- Wait for more data to accumulate (ideally 7 days)
- Rely on minimums for the container type
- Monitor and adjust after deployment based on actual usage
Symptom: Recommended values don't match your expectations.
Investigation:
- Check the raw metrics columns (
cpu_p95_m,mem_p95_mi) - Verify the
container_typeis correct - Review the multipliers (1.3x for requests, 1.5x for limits)
- Consider if the 7-day period captured unusual load
Solution:
- Adjust minimums if needed for specific categories
- Re-run during a more representative time period
- Test recommendations on staging first
Symptom: Cannot deploy because Helm shows pending-upgrade or pending-rollback.
Solution:
NAMESPACE="your-namespace"
RELEASE_SECRET=$(kubectl get secrets -n $NAMESPACE \
-l name=your-release,owner=helm \
--sort-by=.metadata.creationTimestamp | tail -1 | awk '{print $1}')
kubectl delete secret $RELEASE_SECRET -n $NAMESPACE
helm list -n $NAMESPACE # Verify status is now "failed" or previous revision- Update the cluster name:
CLUSTER = 'r2-besties'- Switch context:
kubectx r2-besties- Run the script:
ruby bin/calculate_resources.rb- Apply recommendations following the same process as above.
Setting appropriate resource limits:
- Reduces waste from over-provisioned workloads
- Prevents OOMKills and restarts from under-provisioning
- Improves cluster efficiency and bin-packing
- Enables cost attribution per client/application
Review the diff columns to estimate total resource impact:
- Sum positive diffs to see total increase needed
- Sum negative diffs to see potential savings
- Use this to inform capacity planning
- Review classifications - Always check the
container_typecolumn for accuracy - Test on staging first - Apply to r2-friends before r2-besties
- Monitor after changes - Watch for OOMKills, restarts, or performance issues
- Iterate gradually - Don't change all applications at once
- Document decisions - Note why you deviated from recommendations in commit messages
- Re-run periodically - Usage patterns change over time (quarterly recommended)
- Validate startup - Restart a few pods after applying to ensure startup succeeds
After applying recommendations to r2-friends and r2-besties:
- Document client costs - Use resource allocations for cost attribution
- Set up monitoring - Alert on containers approaching limits
- Implement LimitRanges - Set cluster-wide defaults for new deployments
- Consider HPA - For variable workloads, use Horizontal Pod Autoscaler