Skip to content

[Cleanup]: Remove unnecessary ThreadLocal StringBuilder in LegacyHierarchicalLedgerManager #4714

@QiuYucheng2003

Description

@QiuYucheng2003

BUG REPORT

Describe the bug
A critical "Large Object Retention" (Type I) memory leak exists in LegacyHierarchicalLedgerManager.java. The class uses a static ThreadLocal<StringBuilder> named threadLocalNodeBuilder to build Zookeeper paths to avoid object allocation.

Inside getLedgerRangeByLevel(), the code clears the builder using nodeBuilder.setLength(0) before reusing it. However, setLength(0) only resets the character counter; it does not shrink or release the underlying char[] backing array. If a worker thread processes an abnormally long path or experiences a burst of data, the internal array dynamically expands to a "high-water mark".

Because ThreadLocal.remove() is never invoked, these bloated char[] arrays are permanently anchored to long-lived worker threads. Over time, they promote to the Old Generation, causing severe heap fragmentation and unbounded memory expansion.

To Reproduce
Steps to reproduce the behavior:

  1. Run Apache BookKeeper under high metadata load involving exceptionally long Znode paths.
  2. Trigger operations that continuously invoke getLedgerRangeByLevel(level1, level2) across the thread pool.
  3. The threadLocalNodeBuilder expands its internal array to accommodate the largest processed path.
  4. Take a JVM Heap Dump and observe large, mostly empty char[] instances lingering in the Old Generation, strongly referenced by the ThreadLocalMap of the I/O worker threads.

Expected behavior
Memory should not be permanently retained by thread pools. To fix this, the system should either:

  1. Allocate a local StringBuilder inside the method. Modern JVM Escape Analysis handles short-lived object allocation extremely efficiently.
  2. If caching is strictly necessary, enforce a maximum capacity limit. If the StringBuilder's capacity exceeds a safe threshold (e.g., 4KB) after use, it should be discarded and replaced with a new instance rather than blindly re-queued.

Screenshots
Not applicable (Heap dump footprint analysis describes the issue).

Additional context
This is a well-known anti-pattern in high-throughput environments. Relying on ThreadLocal to cache dynamically expanding buffers without a strict shrinkage policy inevitably leads to physical memory deadlocks in long-lived thread pools.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions