Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Kubernetes Integration] Investigate Elastic Agent API calls and check memory consumption #4122

Closed
Tracked by #3801
MichaelKatsoulis opened this issue Jan 23, 2024 · 4 comments
Assignees
Labels
Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team

Comments

@MichaelKatsoulis
Copy link
Contributor

MichaelKatsoulis commented Jan 23, 2024

Leverage APM Tracing(#2612 (comment)) in order to investigate:

  • Elastic Agent API calls towards k8s API
  • check memory consumption

Update

Background

get the base line of current resource usage (8.12.x) and the method to measure it, just for the internal understanding

Goals

Understand internally how to get metrics:
Measure:
a) Memory
we are not splitting sub-processes for now, just elastic-agent

b) CPU
c) API calls (from both agent and underlying beats)

Actions

  1. How to get all the required Measurements
  • are we required to use system integration - yes, we want to get info over time

  • but how to get mem info for the elastic-agent - k8s related providers

  • cluster only with agent with empty policy - check the base line of resource usage + api calls (with audit logs)

  • enable k8s integration - check the mem/cpu change

  1. The scenarios to measure memory, cpu usage and API calls:
  2. 1 node cluster:
  1. elastic-agent with default metrics 1 node cluster and 50 pods
  • leave system integration enabled
  1. elastic-agent with logs 1 node cluster: X rate of logs with 50pods
  1. Repeat above with 5 node cluster with 50pods
  1. elastic-agent with default metrics 1 node cluster and 50 pods
  2. elastic-agent with logs 1 node cluster: X rate of logs with 50pods
@MichaelKatsoulis MichaelKatsoulis added the Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team label Jan 23, 2024
@tetianakravchenko tetianakravchenko changed the title [Kubernetes Integration] Leverage APM Tracing in order to investigate Elastic Agent API calls and check memory consumption [Kubernetes Integration] Investigate Elastic Agent API calls and check memory consumption Feb 12, 2024
@tetianakravchenko
Copy link
Contributor

tetianakravchenko commented Feb 12, 2024

Elastic Agent API calls towards k8s API:

  1. Leverage APM Tracing

Some information on this:

  1. Universal profiling

On kind (1.29) getting this error:

time="2024-02-12T16:40:34.235861244Z" level=error msg="Failed to load eBPF tracer: failed to read kernel modules: unexpected line in modules: 'selfowner 36864 - - Live 0xffffffffc05cd000 (O)'"

Tested on eks:
Screenshot 2024-02-12 at 18 21 13

  • possible to filter by the container name
  • but how to filter specific metricbeat?
  • Universal Profiling does not yet cover or provide memory usage, disk I/O or network bandwidth
  1. Audit logs analysis
    @gsantoro is working on it

check memory consumption

  • Universal Profiling does not yet cover or provide memory usage

Memory consumption consist of 2 parts:

  • metricbeat process - all kubernetes related datastreams will be running as a dedicated process:
root          40  0.2  5.1 1502252 202888 ?      Sl   16:46   0:08 /usr/share/elastic-agent/data/elastic-agent-9db552/components/metricbeat -E setup.ilm.enabled=false -E setup.template.enabled=false -E management.enabled=true -E management.restart_on_output_change=true -E logging.level=info -E logging.to_stderr=true -E gc_percent=${METRICBEAT_GOGC:100} -E metricbeat.config.modules.enabled=false -E http.enabled=true -E http.host=unix:///etc/elastic-agent/data/tmp/wyZuDAx8vsqXD4cvx932213NPR-EMC2J.sock -E http.pprof.enabled=true -E path.data=/etc/elastic-agent/data/run/kubernetes/metrics-default

so can be used top -p <pid>, ps -p <pid> -o %mem,%cpu or the system integration (need to check)

For more info:

curl -v -o mem.pprof.gz --unix-socket /etc/elastic-agent/data/tmp/wyZuDAx8vsqXD4cvx932213NPR-EMC2J.sock http://localhost/debug/pprof/heap
  • elastic-agent - I believe there is no other way except to the agent pprof

FYI:

  • it is also possible to get the cpu pprof for elastic-agent:
elastic-agent diagnostics --cpu-profile

@axw
Copy link
Member

axw commented Feb 14, 2024

one of the main goals of the #3223 was to propagate APM tracing configuration to sub-processes (beats processes) via the control protocol - but the instrumentation must be added to the beats (seems that instrumentation is only available for Elasticsearch output - https://www.elastic.co/guide/en/beats/metricbeat/current/configuration-instrumentation.html)

Right, so there's a couple of other things we need beyond #3223:

@tetianakravchenko
Copy link
Contributor

Test scenario 1:
1 node k8s cluster
stack - 8.12.1
Additionally installes:

  • KSM
  • metrics-server:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
kubectl patch -n kube-system deployment metrics-server --type=json \
  -p '[{"op":"add","path":"/spec/template/spec/containers/0/args/-","value":"--kubelet-insecure-tls"}]'
  • kubernetes-dashboard:
helm upgrade --install kubernetes-dashboard kubernetes-dashboard/kubernetes-dashboard --create-namespace --namespace kubernetes-dashboard --set=service.externalPort=8080,resources.limits.cpu=200m,metricsScraper.enabled=true
(also need to create token to access)
  • deploy extra pods: ./stress_test_k8s --deployments=4 --namespaces=10 --podlabels=4 --podannotations=4
  1. empty policy:
  • API calls:
date; cat /var/log/kubernetes/kube-apiserver-audit.log | grep -a '"stage":"ResponseComplete"' | grep '"username":"system:serviceaccount:kube-system:elastic-agent"' | grep -v "/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/elastic-agent-cluster-leader" | wc -l
Fri Mar  8 13:07:19 UTC 2024 - 47     <---- after the start
Fri Mar  8 13:10:05 UTC 2024 - 47
Fri Mar  8 13:12:18 UTC 2024 - 47
Fri Mar  8 13:12:49 UTC 2024 - 49
Fri Mar  8 13:13:10 UTC 2024 - 51
Fri Mar  8 13:17:54 UTC 2024 - 60

Note: skipping leader-election api calls - it cause 1 api call/sec

  • cpu/memory within: 3m 224Mi-230Mi
ea-empty-policy
  1. Add kubernetes integration
  • api calls
date; cat /var/log/kubernetes/kube-apiserver-audit.log | grep -a '"stage":"ResponseComplete"' | grep '"username":"system:serviceaccount:kube-system:elastic-agent"' | grep -v "/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/elastic-agent-cluster-leader" | wc -l
Fri Mar  8 13:26:36 UTC 2024 - 205
Fri Mar  8 13:27:35 UTC 2024 - 207
Fri Mar  8 13:28:50 UTC 2024 - 209
Fri Mar  8 13:50:13 UTC 2024 - 455
Fri Mar  8 14:20:20 UTC 2024 - 797
  • cpu/mem: 46m-206m 592Mi- 665Mi
Screenshot 2024-03-08 at 15 21 02

@tetianakravchenko
Copy link
Contributor

tetianakravchenko commented Apr 16, 2024

Logs only:
1 node k8s cluster
stack - 8.13.1.
the same setup as above

  1. API calls:
root@kind-control-plane:/# date; cat /var/log/kubernetes/kube-apiserver-audit.log | grep -a '"stage":"ResponseComplete"' | grep '"username":"system:serviceaccount:kube-system:elastic-agent"' | grep -v "/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/elastic-agent-cluster-leader" | wc -l
Tue Apr 16 08:09:13 UTC 2024 - 42 <---- after the start
Tue Apr 16 08:15:41 UTC 2024 - 47
Tue Apr 16 08:24:25 UTC 2024 - 69
Tue Apr 16 08:46:46 UTC 2024 - 116   <-- more traffic enabled
Tue Apr 16 08:54:48 UTC 2024 - 135
Tue Apr 16 09:06:54 UTC 2024 - 158
Tue Apr 16 09:16:42 UTC 2024 - 180
kubectl top pod elastic-agent-crbdp
NAME                  CPU(cores)   MEMORY(bytes)   
elastic-agent-crbdp   16m          379Mi           (about 100-150 logs/min)

NAME                  CPU(cores)   MEMORY(bytes)   
elastic-agent-crbdp   463m-600m         473Mi-530Mi           (~180.000 logs/min)

after enabling more traffic/increasin amount of logs:
Screenshot 2024-04-16 at 11 02 51
Screenshot 2024-04-16 at 11 02 32

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team
Projects
None yet
Development

No branches or pull requests

3 participants