HIVE-28578: Concurrency issue in updateTableColumnStatistics #5929

dengzhhu653 · 2025-07-04T10:36:46Z

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

No

How was this patch tested?

Tested using the Repro.java against MySQL and Postgres, I didn't see the issue any more.

...e-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java

sonarqubecloud · 2025-07-04T13:07:33Z

Quality Gate passed

Issues
2 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

deniskuzZ · 2025-07-04T22:12:32Z

There was another PR for the same: #5567
cc @InvisibleProgrammer
I lean toward using retries rather than introducing locking. Non-blocking, better throughput under normal conditions.

SqlRetryHandler sqlRetryHandler = new SqlRetryHandler(conf, dbType);
...
return sqlRetryHandler.executeWithRetry(
    new SqlRetryCallProperties()
        .withCallerId("updateTableColumnStatistics")
        .withRetryOnDuplicateKey(true),
    () -> updateTableColumnStatisticsInternal(
            colStats, validWriteIds, writeId, catName, dbName, tableName));

And since HMS provides RetryingMetaStoreClient, this might be redundant.

-1 on the locking approach

dengzhhu653 · 2025-07-07T04:49:46Z

Why locking approach isn't good?
Retries might bring another problem: it might cause the param key COLUMN_STATS_ACCURATE inaccurate or missing the marker for some columns, for example:
step 1: client1 has col1 updated, and set COLUMN_STATS_ACCURATE to col1: true
step 2: client2 for col2 and client3 for col3 both throw the duplicate key exception, then retry...
step 3: client2 and client 3 both read COLUMN_STATS_ACCURATE as col1: true, then update the param key separately.
step 4: After step 3 either col2 : true or col3: true is missing in COLUMN_STATS_ACCURATE.

We can use Optimistic Locking instead - higher throughput and concurrency

UPDATE TBS
SET last_updated = NOW()
WHERE TBL_ID = <tblId> AND last_updated = '<timestamp>'; -- The exact timestamp you originally read

and retry if update affects 0 rows

The same strategy is used in NextCompactionFunction

dengzhhu653 · 2025-07-11T02:12:38Z

UPDATE TBS
SET last_updated = NOW()
WHERE TBL_ID = AND last_updated = ''; -- The exact timestamp you originally read

As far as I know in MySQL this will block others for the same row(the same for update), if the 0 row returns, means we have acquired the X tblId row lock, why do we need to retry again?

But this inspires me we can alter the table during the updating the column statistics, so we don't ask for the row lock
explicitly, I will try and test this approach. Thank you!

dengzhhu653 · 2025-07-11T04:33:39Z

UPDATE TBS
SET last_updated = NOW()
WHERE TBL_ID = AND last_updated = ''; -- The exact timestamp you originally read

As far as I know in MySQL this will block others for the same row(the same for update), if the 0 row returns, means we have acquired the X tblId row lock, why do we need to retry again?

But this inspires me we can alter the table during the updating the column statistics, so we don't ask for the row lock explicitly, I will try and test this approach. Thank you!

After some research, I find it's hard to use DataNucleus without the pure "UPDATE" query,

      mTable.setLastAccessTime((int) (System.currentTimeMillis()/1000));
      pm.flush(); // here might flush the old MTable into data store
      pm.refresh(mTable);

In pm.flush() the old state of the table can get overwritten the one in data store, resulting to some columns missing in COLUMN_STATS_ACCURATE, for example:
{"COLUMN_STATS":{"col_0":"true","col_1":"true","col_2":"true","col_3":"true","col_5":"true","col_6":"true","col_7":"true","col_8":"true"}}
col_4 and col_9 are missing.

deniskuzZ · 2025-08-07T12:55:43Z

please give me some time to review and reply

github-actions · 2025-10-07T00:27:42Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
Feel free to reach out on the dev@hive.apache.org list if the patch is in need of reviews.

dengzhhu653 · 2025-10-07T23:38:57Z

@nrg4878 @saihemanth-cloudera @zhangbutao @wecharyu could you check this PR as well please?

...e-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java

saihemanth-cloudera

LGTM +1

dengzhhu653 · 2025-10-25T07:23:43Z

...e-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java


+      Map<String, MPartitionColumnStatistics> oldStats = getPartitionColStats(table, statsDesc
+          .getPartName(), colNames, colStats.getEngine());
+      lockForUpdate("PARTITIONS", "PART_ID", Optional.of("\"PART_ID\" = " + partitionIds.getFirst()));


change it to look up the partition by primary key

deniskuzZ

-1

As far as I know in MySQL this will block others for the same row(the same for update), if the 0 row returns, means we have acquired the X tblId row lock, why do we need to retry again?

Using retries instead of blocking aligns better with optimistic concurrency control, allowing transactions to proceed without waiting and only retry if a conflict actually occurs.

The same approach is applied in several areas, including Iceberg’s conflict resolution (HIVE-26882) and compaction.

dengzhhu653 · 2025-10-26T12:07:09Z

-1

As far as I know in MySQL this will block others for the same row(the same for update), if the 0 row returns, means we have acquired the X tblId row lock, why do we need to retry again?

Using retries instead of blocking aligns better with optimistic concurrency control, allowing transactions to proceed without waiting and only retry if a conflict actually occurs.

The same approach is applied in several areas, including Iceberg’s conflict resolution (HIVE-26882) and compaction.

I don't think retry is the good way, it introduces another issue, the COLUMN_STATS in TABLE_PARAMS would be got overwritten, and lead to some column markers missing, so it might make the stats import useless.
For updating the stats, we have already had the in-process lock, it doesn't solve the parallel imports across the HMS instances, somehow we need the distributed lock to achieve it, the DB lock is one way.

dengzhhu653 · 2025-10-27T03:49:28Z

For optimistic concurrency control, IMO this is more suitable for read skew scenario, otherwise it's a waste of the resource to retry on conflict

deniskuzZ · 2025-10-27T13:07:24Z

I don't think retry is the good way, it introduces another issue, the COLUMN_STATS in TABLE_PARAMS would be got overwritten, and lead to some column markers missing, so it might make the stats import useless. For updating the stats, we have already had the in-process lock, it doesn't solve the parallel imports across the HMS instances, somehow we need the distributed lock to achieve it, the DB lock is one way.

Locking TBLS and PARTITIONS isn’t necessarily a better solution. In large-scale, highly concurrent systems, locking often becomes a scalability bottleneck. Instead, most systems adopt optimistic locking strategies. Each operation assumes no conflict and only validates at commit, reducing contention and avoiding blocking.

The idea is to balance consistency with throughput. Extending this principle to stats import (which btw is not a core functionality) we could consider a mechanism based on versioning rather than traditional locking. Such an approach generally scales far better in multi-instance HMS environments than depending solely on table or partition locks.

dengzhhu653 · 2025-10-27T14:03:13Z

I don't think retry is the good way, it introduces another issue, the COLUMN_STATS in TABLE_PARAMS would be got overwritten, and lead to some column markers missing, so it might make the stats import useless. For updating the stats, we have already had the in-process lock, it doesn't solve the parallel imports across the HMS instances, somehow we need the distributed lock to achieve it, the DB lock is one way.

Locking TBLS and PARTITIONS isn’t necessarily a better solution. In large-scale, highly concurrent systems, locking often becomes a scalability bottleneck. Instead, most systems adopt optimistic locking strategies. Each operation assumes no conflict and only validates at commit, reducing contention and avoiding blocking.

The idea is to balance consistency with throughput. Extending this principle to stats import (which btw is not a core functionality) we could consider a mechanism based on versioning rather than traditional locking. Such an approach generally scales far better in multi-instance HMS environments than depending solely on table or partition locks.

Currently the HMS is lack of versioning, the RDMS nowadays provides MVCC for writes not block any reads, so the write is the main concern here.
The lock provides here is the row-level lock, it only blocks the write operation against the same table, such as the alter/drop the same table/partition, drop the db, or update the stats of this table/partition.

Even without the explicit lock, we still have some exclusive row lock under the scenes, the RDMS in background ensures strong consistency and reliability the HMS benefits from and relies on, and provides a milliseconds to seconds query execution time, that can also have a satisfied throughput.

deniskuzZ · 2025-10-27T14:58:07Z

Currently the HMS is lack of versioning, the RDMS nowadays provides MVCC for writes not block any reads, so the write is the main concern here.

MVCC (Multi-Version Concurrency Control) and SELECT ... FOR UPDATE (S4U) operate at completely different conceptual levels — they’re not comparable mechanisms even though both relate to concurrency.

-- MVCC style (optimistic)
BEGIN;
SELECT balance FROM accounts WHERE id = 1; -- snapshot read
-- some processing
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
COMMIT;
-- If another transaction modified same row → commit fails (conflict)

-- Pessimistic locking
BEGIN;
SELECT balance FROM accounts WHERE id = 1 FOR UPDATE;
-- locks row immediately
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
COMMIT;

The lock provides here is the row-level lock, it only blocks the write operation against the same table, such as the alter/drop the same table/partition, drop the db, or update the stats of this table/partition.

Every iceberg table commits involves alter table operation and it's non-blocking ATM.

Even without the explicit lock, we still have some exclusive row lock under the scenes, the RDMS in background ensures strong consistency and reliability the HMS benefits from and relies on, and provides a milliseconds to seconds query execution time, that can also have a satisfied throughput.

That’s the opposite of what CU experienced on a highly loaded MySQL cluster with S4U on the NEXT_TXN_ID table.

dengzhhu653 · 2025-10-28T01:14:19Z

-- MVCC style (optimistic)
BEGIN;
SELECT balance FROM accounts WHERE id = 1; -- snapshot read
-- some processing
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
COMMIT;
-- If another transaction modified same row → commit fails (conflict)

Here we still obtained a row-level lock on accounts(id=1) in MySQL/Postgres until the transaction is ended.

As I explained, this would cause lost-update,
After some research, I find it's hard to use DataNucleus without the pure "UPDATE" query,

  mTable.setLastAccessTime((int) (System.currentTimeMillis()/1000));
  pm.flush(); // here might flush the old MTable into data store
  pm.refresh(mTable);

In pm.flush() the old state of the table can get overwritten the one in data store, resulting to some columns missing in COLUMN_STATS_ACCURATE, for example:
{"COLUMN_STATS":{"col_0":"true","col_1":"true","col_2":"true","col_3":"true","col_5":"true","col_6":"true","col_7":"true","col_8":"true"}}
col_4 and col_9 are missing.

Every iceberg table commits involves alter table operation and it's non-blocking ATM.

The Iceberg commit relies on the DB transaction atomicity, it should involve the row lock behind the scenes, though the lock is quite small(TBL_ID, PARAM_KEY), if the table has multiple commits at the same time, only one is allowed to alter the TABLE_PARAMS.

That’s the opposite of what CU experienced on a highly loaded MySQL cluster with S4U on the NEXT_TXN_ID table.

This is due to every HMS request needs to lock the only one NEXT_TXN_ID, compared the same level lock distributed among different tables and affects the alter/drop request of the target table/partition

sonarqubecloud · 2025-10-28T05:25:33Z

Quality Gate passed

Issues
1 New issue
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

dengzhhu653 · 2025-10-28T07:48:36Z

I guess I'm a bit get your optimistic concurrency control, let me try

dengzhhu653 · 2025-10-30T08:05:11Z

Filed another PR: #6159 to catch the nowait for update

asf-ci-hive added the tests pending label Jul 4, 2025

deniskuzZ reviewed Jul 4, 2025

View reviewed changes

...e-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java Outdated Show resolved Hide resolved

deniskuzZ reviewed Jul 4, 2025

View reviewed changes

...e-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java Outdated Show resolved Hide resolved

asf-ci-hive added tests passed and removed tests pending labels Jul 4, 2025

dengzhhu653 requested a review from deniskuzZ August 5, 2025 03:17

github-actions bot added the stale label Oct 7, 2025

github-actions bot removed the stale label Oct 8, 2025

saihemanth-cloudera reviewed Oct 9, 2025

View reviewed changes

...e-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java Outdated Show resolved Hide resolved

dengzhhu653 requested a review from saihemanth-cloudera October 22, 2025 23:27

saihemanth-cloudera approved these changes Oct 23, 2025

View reviewed changes

dengzhhu653 force-pushed the HIVE-28578 branch from 3df3cbf to c77f11f Compare October 25, 2025 07:17

asf-ci-hive added tests pending tests failed and removed tests passed tests pending labels Oct 25, 2025

dengzhhu653 commented Oct 25, 2025

View reviewed changes

HIVE-28578: Concurrency issue in updateTableColumnStatistics

207853c

dengzhhu653 force-pushed the HIVE-28578 branch from c77f11f to 207853c Compare October 26, 2025 04:27

asf-ci-hive added tests pending and removed tests failed tests pending labels Oct 26, 2025

asf-ci-hive added the tests unstable label Oct 26, 2025

deniskuzZ requested changes Oct 26, 2025

View reviewed changes

map PART_ID

2c414ae

asf-ci-hive added tests pending and removed tests unstable labels Oct 27, 2025

asf-ci-hive added tests unstable tests pending and removed tests pending tests unstable labels Oct 27, 2025

fix ordering

3499d20

dengzhhu653 force-pushed the HIVE-28578 branch from 3cbbf2c to 3499d20 Compare October 28, 2025 04:09

asf-ci-hive added tests pending and removed tests unstable labels Oct 28, 2025

asf-ci-hive added tests passed and removed tests pending labels Oct 28, 2025

HIVE-28578: Concurrency issue in updateTableColumnStatistics #5929

Are you sure you want to change the base?

HIVE-28578: Concurrency issue in updateTableColumnStatistics #5929

Conversation

dengzhhu653 commented Jul 4, 2025

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Uh oh!

Uh oh!

sonarqubecloud bot commented Jul 4, 2025

Quality Gate passed

Uh oh!

deniskuzZ commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dengzhhu653 commented Jul 7, 2025 • edited by deniskuzZ Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dengzhhu653 commented Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dengzhhu653 commented Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

deniskuzZ commented Aug 7, 2025

Uh oh!

github-actions bot commented Oct 7, 2025

Uh oh!

dengzhhu653 commented Oct 7, 2025

Uh oh!

Uh oh!

saihemanth-cloudera left a comment

Choose a reason for hiding this comment

Uh oh!

dengzhhu653 Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

deniskuzZ left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dengzhhu653 commented Oct 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dengzhhu653 commented Oct 27, 2025

Uh oh!

deniskuzZ commented Oct 27, 2025

Uh oh!

dengzhhu653 commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

deniskuzZ commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dengzhhu653 commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sonarqubecloud bot commented Oct 28, 2025

Quality Gate passed

Uh oh!

dengzhhu653 commented Oct 28, 2025

Uh oh!

dengzhhu653 commented Oct 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

deniskuzZ commented Jul 4, 2025 •

edited

Loading

dengzhhu653 commented Jul 7, 2025 •

edited by deniskuzZ

Loading

dengzhhu653 commented Jul 11, 2025 •

edited

Loading

dengzhhu653 commented Jul 11, 2025 •

edited

Loading

deniskuzZ left a comment •

edited

Loading

dengzhhu653 commented Oct 26, 2025 •

edited

Loading

dengzhhu653 commented Oct 27, 2025 •

edited

Loading

deniskuzZ commented Oct 27, 2025 •

edited

Loading

dengzhhu653 commented Oct 28, 2025 •

edited

Loading