-
Notifications
You must be signed in to change notification settings - Fork 962
Description
BUG REPORT
Describe the bug
A critical "Large Object Retention" (Type I) memory leak exists in LegacyHierarchicalLedgerManager.java. The class uses a static ThreadLocal<StringBuilder> named threadLocalNodeBuilder to build Zookeeper paths to avoid object allocation.
Inside getLedgerRangeByLevel(), the code clears the builder using nodeBuilder.setLength(0) before reusing it. However, setLength(0) only resets the character counter; it does not shrink or release the underlying char[] backing array. If a worker thread processes an abnormally long path or experiences a burst of data, the internal array dynamically expands to a "high-water mark".
Because ThreadLocal.remove() is never invoked, these bloated char[] arrays are permanently anchored to long-lived worker threads. Over time, they promote to the Old Generation, causing severe heap fragmentation and unbounded memory expansion.
To Reproduce
Steps to reproduce the behavior:
- Run Apache BookKeeper under high metadata load involving exceptionally long Znode paths.
- Trigger operations that continuously invoke
getLedgerRangeByLevel(level1, level2)across the thread pool. - The
threadLocalNodeBuilderexpands its internal array to accommodate the largest processed path. - Take a JVM Heap Dump and observe large, mostly empty
char[]instances lingering in the Old Generation, strongly referenced by theThreadLocalMapof the I/O worker threads.
Expected behavior
Memory should not be permanently retained by thread pools. To fix this, the system should either:
- Allocate a local
StringBuilderinside the method. Modern JVM Escape Analysis handles short-lived object allocation extremely efficiently. - If caching is strictly necessary, enforce a maximum capacity limit. If the
StringBuilder's capacity exceeds a safe threshold (e.g., 4KB) after use, it should be discarded and replaced with a new instance rather than blindly re-queued.
Screenshots
Not applicable (Heap dump footprint analysis describes the issue).
Additional context
This is a well-known anti-pattern in high-throughput environments. Relying on ThreadLocal to cache dynamically expanding buffers without a strict shrinkage policy inevitably leads to physical memory deadlocks in long-lived thread pools.