Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High "mapped" memory usage and disk IO when tail-based sampling is enabled #13463

Closed
carsonip opened this issue Jun 20, 2024 · 2 comments · Fixed by #13464
Closed

High "mapped" memory usage and disk IO when tail-based sampling is enabled #13463

carsonip opened this issue Jun 20, 2024 · 2 comments · Fixed by #13464
Assignees
Labels

Comments

@carsonip
Copy link
Member

APM Server version (apm-server version): confirmed on 8.13.2 but affects all versions including the latest 8.14.1

Description of the problem including expected versus actual behavior:

When tail-based sampling (TBS) is enabled, the memory usage will go as high as the local TBS database storage size. When viewing /proc/meminfo, most of the memory usage shows up as "Mapped". This is particularly noticeable in setups which consist of multiple apm-servers and receive high load.

Steps to reproduce:

Please include a minimal but complete recreation of the problem,
including server configuration, agent(s) used, etc. The easier you make it
for us to reproduce it, the more likely that somebody will take the time to
look at it.

  1. start 2 separate apm-servers (calling them A and B) and send load to them independently using apmsoak. Wait until their local TBS database size grows to >1GB.
  2. stop A, wait for 10mins, then restart A without sending load to it.
  3. observe that the A's memory usage increases to approximately to its local TBS database size
@carsonip carsonip added the bug label Jun 20, 2024
@carsonip carsonip self-assigned this Jun 20, 2024
@carsonip
Copy link
Member Author

Upon investigation, this is likely related to the prefetch behavior of local TBS badger database iterator, triggered by ReadTraceEvents which is called on every sampling decision received (both local and remote decisions). ReadTraceEvents cannot use the table's bloom filter because we are searching for events using trace ID, while a full key consists of both trace ID and txn/span ID, it has to use the iterator with a prefix. Prefetch behavior is enabled by default and set to 100 values, and it fetches values from vlog when using an iterator. Unfortunately, its implementation does not respect the prefix, meaning that even when prefix does not match, it still fetches 100 values from vlog.

This is mostly affecting setups with multiple apm-servers because e.g. apm-server A receive sampling decisions made by a remote apm-server B. And it is likely that the sampling decision is for a trace that A does not know and does not store. The right thing here to do is to scan the in-memory LSM tree to see if there's a prefix match, but in the current implementation, due to prefetch, it still scans vlogs for 100 values of irrelevant keys. As vlog files are mmap-ed, the scans will likely cause lots of page faults and read IO. These pages are stored in memory until they are evicted by OS. In a busy environment, and as trace IDs are randomly distributed, receiving lots of remote sampling decisions will likely scan all the vlogs and cause all the vlogs to stay in-memory, hence the memory usage that approximates to the size of local TBS database.

@carsonip
Copy link
Member Author

Here's a minimal reproducible example of the issue, with memory and disk IO measurements: https://github.com/carsonip/tbs-badger-playground/tree/main/prefetch

@carsonip carsonip changed the title High memory usage and disk IO when tail-based smapling is enabled High memory usage and disk IO when tail-based sampling is enabled Jun 24, 2024
@carsonip carsonip changed the title High memory usage and disk IO when tail-based sampling is enabled High "mapped" memory usage and disk IO when tail-based sampling is enabled Jun 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant