Bug Description
What happened:
HiveIncrementalPuller.executeIncrementalSQL opens a Scanner on the incremental SQL file without closing it (new Scanner(new File(...)).useDelimiter("\Z").next()), leaking the file
handle on every invocation. Additionally, SQL validation (checking that the file references the correct source table and contains the _hoodie_commit_time predicate) is embedded
inside executeIncrementalSQL, which is only called after a JDBC connection has already been established. A misconfigured SQL file therefore causes the connection to be opened and a
temp table drop to be issued before the error is caught. The method is also private, making it impossible to unit-test the SQL rendering logic without a live Hive server.
What you expected:
- The Scanner should be closed after reading the SQL file (try-with-resources).
- SQL validation should happen eagerly — before any JDBC connection is opened — so a bad config file is caught at the cheapest possible point.
- executeIncrementalSQL should be testable in isolation (mocked Statement) without requiring a live Hive connection.
Steps to reproduce:
- Configure HiveIncrementalPuller with an incrementalSQLFile that references the wrong source table (e.g., wrong sourceDb.sourceTable).
- Call saveDelta().
- Observe that a JDBC connection is opened and DROP TABLE IF EXISTS is executed before the validation exception is thrown — and that the Scanner opened on the SQL file
is never closed.
Environment
Hudi version: master
Query engine: (Spark/Flink/Trino etc): hive
Relevant configs:
Logs and Stack Trace
No response
Bug Description
What happened:
HiveIncrementalPuller.executeIncrementalSQL opens a Scanner on the incremental SQL file without closing it (new Scanner(new File(...)).useDelimiter("\Z").next()), leaking the file
handle on every invocation. Additionally, SQL validation (checking that the file references the correct source table and contains the _hoodie_commit_time predicate) is embedded
inside executeIncrementalSQL, which is only called after a JDBC connection has already been established. A misconfigured SQL file therefore causes the connection to be opened and a
temp table drop to be issued before the error is caught. The method is also private, making it impossible to unit-test the SQL rendering logic without a live Hive server.
What you expected:
Steps to reproduce:
is never closed.
Environment
Hudi version: master
Query engine: (Spark/Flink/Trino etc): hive
Relevant configs:
Logs and Stack Trace
No response