Skip to content

[BUG] HiveIncrementalPuller leaks Scanner file handle and validates SQL after opening JDBC connection #18440

@linliu-code

Description

@linliu-code

Bug Description

What happened:
HiveIncrementalPuller.executeIncrementalSQL opens a Scanner on the incremental SQL file without closing it (new Scanner(new File(...)).useDelimiter("\Z").next()), leaking the file
handle on every invocation. Additionally, SQL validation (checking that the file references the correct source table and contains the _hoodie_commit_time predicate) is embedded
inside executeIncrementalSQL, which is only called after a JDBC connection has already been established. A misconfigured SQL file therefore causes the connection to be opened and a
temp table drop to be issued before the error is caught. The method is also private, making it impossible to unit-test the SQL rendering logic without a live Hive server.

What you expected:

  1. The Scanner should be closed after reading the SQL file (try-with-resources).
  2. SQL validation should happen eagerly — before any JDBC connection is opened — so a bad config file is caught at the cheapest possible point.
  3. executeIncrementalSQL should be testable in isolation (mocked Statement) without requiring a live Hive connection.

Steps to reproduce:

  1. Configure HiveIncrementalPuller with an incrementalSQLFile that references the wrong source table (e.g., wrong sourceDb.sourceTable).
  2. Call saveDelta().
  3. Observe that a JDBC connection is opened and DROP TABLE IF EXISTS is executed before the validation exception is thrown — and that the Scanner opened on the SQL file
    is never closed.

Environment

Hudi version: master
Query engine: (Spark/Flink/Trino etc): hive
Relevant configs:

Logs and Stack Trace

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    type:bugBug reports and fixes

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions