Skip to content

Conversation

@luoluoyuyu
Copy link
Member

@luoluoyuyu luoluoyuyu commented Dec 3, 2025

Description

When loading multiple TsFiles, all file resources were loaded into memory simultaneously, causing excessive memory consumption and frequent GC pauses.

This commit introduces batch execution for multi-file loading scenarios:

  1. Split LoadTsFileStatement/LoadTsFile into sub-statements, each handling one TsFile, to avoid loading all file resources at once
  2. Refactor duplicate code in ClientRPCServiceImpl by extracting helper methods for both tree model and table model
  3. Add progress logging to track the loading status of each file
  4. Support both synchronous and asynchronous loading modes

Changes:

  • Added getSubStatement() method to LoadTsFileStatement and LoadTsFile for splitting multi-file statements
  • Extracted shouldSplitLoadTsFileStatement() and shouldSplitTableLoadTsFile() to determine if splitting is needed
  • Extracted executeBatchLoadTsFile() and executeBatchTableLoadTsFile() to handle batch execution with progress logging
  • Applied the optimization to 4 execution paths (tree/table model, sync/async loading)

This fix significantly reduces memory pressure and improves system stability when loading large numbers of TsFiles.


This PR has:

  • been self-reviewed.
    • concurrent read
    • concurrent write
    • concurrent read and write
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods.
  • added or updated version, license, or notice information
  • added comments explaining the "why" and the intent of the code wherever would not be obvious
    for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold
    for code coverage.
  • added integration tests.
  • been tested in a test IoTDB cluster.

Key changed/added classes (or packages if there are too many classes) in this PR

When loading multiple TsFiles, all file resources were loaded into memory
simultaneously, causing excessive memory consumption and frequent GC pauses.

This commit introduces batch execution for multi-file loading scenarios:

1. Split LoadTsFileStatement/LoadTsFile into sub-statements, each handling
   one TsFile, to avoid loading all file resources at once
2. Refactor duplicate code in ClientRPCServiceImpl by extracting helper
   methods for both tree model and table model
3. Add progress logging to track the loading status of each file
4. Support both synchronous and asynchronous loading modes

Changes:
- Added getSubStatement() method to LoadTsFileStatement and LoadTsFile
  for splitting multi-file statements
- Extracted shouldSplitLoadTsFileStatement() and shouldSplitTableLoadTsFile()
  to determine if splitting is needed
- Extracted executeBatchLoadTsFile() and executeBatchTableLoadTsFile()
  to handle batch execution with progress logging
- Applied the optimization to 4 execution paths (tree/table model,
  sync/async loading)

This fix significantly reduces memory pressure and improves system
stability when loading large numbers of TsFiles.
@jt2594838 jt2594838 merged commit bc4f8e9 into apache:master Dec 9, 2025
28 checks passed
jt2594838 pushed a commit that referenced this pull request Dec 10, 2025
…s at once (#16853) (#16867)

* Fix excessive GC caused by loading too many TsFiles at once

When loading multiple TsFiles, all file resources were loaded into memory
simultaneously, causing excessive memory consumption and frequent GC pauses.

This commit introduces batch execution for multi-file loading scenarios:

1. Split LoadTsFileStatement/LoadTsFile into sub-statements, each handling
   one TsFile, to avoid loading all file resources at once
2. Refactor duplicate code in ClientRPCServiceImpl by extracting helper
   methods for both tree model and table model
3. Add progress logging to track the loading status of each file
4. Support both synchronous and asynchronous loading modes

Changes:
- Added getSubStatement() method to LoadTsFileStatement and LoadTsFile
  for splitting multi-file statements
- Extracted shouldSplitLoadTsFileStatement() and shouldSplitTableLoadTsFile()
  to determine if splitting is needed
- Extracted executeBatchLoadTsFile() and executeBatchTableLoadTsFile()
  to handle batch execution with progress logging
- Applied the optimization to 4 execution paths (tree/table model,
  sync/async loading)

This fix significantly reduces memory pressure and improves system
stability when loading large numbers of TsFiles.

# Conflicts:
#	iotdb-core/datanode/src/main/java/org/apache/iotdb/db/protocol/thrift/impl/ClientRPCServiceImpl.java
#	iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/relational/sql/ast/LoadTsFile.java

* fix

# Conflicts:
#	iotdb-core/datanode/src/main/java/org/apache/iotdb/db/audit/AuditLogOperation.java
#	iotdb-core/datanode/src/main/java/org/apache/iotdb/db/conf/IoTDBDescriptor.java
#	iotdb-core/datanode/src/main/java/org/apache/iotdb/db/protocol/thrift/impl/ClientRPCServiceImpl.java
#	iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/relational/sql/ast/LoadTsFile.java
#	iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/statement/crud/LoadTsFileStatement.java

* update

* update
JackieTien97 pushed a commit that referenced this pull request Dec 10, 2025
…6853)

* Fix excessive GC caused by loading too many TsFiles at once

When loading multiple TsFiles, all file resources were loaded into memory
simultaneously, causing excessive memory consumption and frequent GC pauses.

This commit introduces batch execution for multi-file loading scenarios:

1. Split LoadTsFileStatement/LoadTsFile into sub-statements, each handling
   one TsFile, to avoid loading all file resources at once
2. Refactor duplicate code in ClientRPCServiceImpl by extracting helper
   methods for both tree model and table model
3. Add progress logging to track the loading status of each file
4. Support both synchronous and asynchronous loading modes

Changes:
- Added getSubStatement() method to LoadTsFileStatement and LoadTsFile
  for splitting multi-file statements
- Extracted shouldSplitLoadTsFileStatement() and shouldSplitTableLoadTsFile()
  to determine if splitting is needed
- Extracted executeBatchLoadTsFile() and executeBatchTableLoadTsFile()
  to handle batch execution with progress logging
- Applied the optimization to 4 execution paths (tree/table model,
  sync/async loading)

This fix significantly reduces memory pressure and improves system
stability when loading large numbers of TsFiles.

* fix

* update

(cherry picked from commit bc4f8e9)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants