Skip to content

Conversation

@dengzhhu653
Copy link
Member

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

No

How was this patch tested?

Tested using the Repro.java against MySQL and Postgres, I didn't see the issue any more.

@sonarqubecloud
Copy link

sonarqubecloud bot commented Jul 4, 2025

@deniskuzZ
Copy link
Member

deniskuzZ commented Jul 4, 2025

There was another PR for the same: #5567
cc @InvisibleProgrammer
I lean toward using retries rather than introducing locking. Non-blocking, better throughput under normal conditions.

SqlRetryHandler sqlRetryHandler = new SqlRetryHandler(conf, dbType);
...
return sqlRetryHandler.executeWithRetry(
    new SqlRetryCallProperties()
        .withCallerId("updateTableColumnStatistics")
        .withRetryOnDuplicateKey(true),
    () -> updateTableColumnStatisticsInternal(
            colStats, validWriteIds, writeId, catName, dbName, tableName));

And since HMS provides RetryingMetaStoreClient, this might be redundant.

-1 on the locking approach

@dengzhhu653
Copy link
Member Author

dengzhhu653 commented Jul 7, 2025

Why locking approach isn't good?
Retries might bring another problem: it might cause the param key COLUMN_STATS_ACCURATE inaccurate or missing the marker for some columns, for example:
step 1: client1 has col1 updated, and set COLUMN_STATS_ACCURATE to col1: true
step 2: client2 for col2 and client3 for col3 both throw the duplicate key exception, then retry...
step 3: client2 and client 3 both read COLUMN_STATS_ACCURATE as col1: true, then update the param key separately.
step 4: After step 3 either col2 : true or col3: true is missing in COLUMN_STATS_ACCURATE.

We can use Optimistic Locking instead - higher throughput and concurrency

UPDATE TBS
SET last_updated = NOW()
WHERE TBL_ID = <tblId> AND last_updated = '<timestamp>'; -- The exact timestamp you originally read

and retry if update affects 0 rows

The same strategy is used in NextCompactionFunction

@dengzhhu653
Copy link
Member Author

dengzhhu653 commented Jul 11, 2025

UPDATE TBS
SET last_updated = NOW()
WHERE TBL_ID = AND last_updated = ''; -- The exact timestamp you originally read

As far as I know in MySQL this will block others for the same row(the same for update), if the 0 row returns, means we have acquired the X tblId row lock, why do we need to retry again?

But this inspires me we can alter the table during the updating the column statistics, so we don't ask for the row lock
explicitly, I will try and test this approach. Thank you!

@dengzhhu653
Copy link
Member Author

dengzhhu653 commented Jul 11, 2025

UPDATE TBS
SET last_updated = NOW()
WHERE TBL_ID = AND last_updated = ''; -- The exact timestamp you originally read

As far as I know in MySQL this will block others for the same row(the same for update), if the 0 row returns, means we have acquired the X tblId row lock, why do we need to retry again?

But this inspires me we can alter the table during the updating the column statistics, so we don't ask for the row lock explicitly, I will try and test this approach. Thank you!

After some research, I find it's hard to use DataNucleus without the pure "UPDATE" query,

      mTable.setLastAccessTime((int) (System.currentTimeMillis()/1000));
      pm.flush(); // here might flush the old MTable into data store
      pm.refresh(mTable);

In pm.flush() the old state of the table can get overwritten the one in data store, resulting to some columns missing in COLUMN_STATS_ACCURATE, for example:
{"COLUMN_STATS":{"col_0":"true","col_1":"true","col_2":"true","col_3":"true","col_5":"true","col_6":"true","col_7":"true","col_8":"true"}}
col_4 and col_9 are missing.

@dengzhhu653 dengzhhu653 requested a review from deniskuzZ August 5, 2025 03:17
@deniskuzZ
Copy link
Member

please give me some time to review and reply

@github-actions
Copy link

github-actions bot commented Oct 7, 2025

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
Feel free to reach out on the dev@hive.apache.org list if the patch is in need of reviews.

@github-actions github-actions bot added the stale label Oct 7, 2025
@dengzhhu653
Copy link
Member Author

@nrg4878 @saihemanth-cloudera @zhangbutao @wecharyu could you check this PR as well please?

@github-actions github-actions bot removed the stale label Oct 8, 2025
Copy link
Contributor

@saihemanth-cloudera saihemanth-cloudera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM +1


Map<String, MPartitionColumnStatistics> oldStats = getPartitionColStats(table, statsDesc
.getPartName(), colNames, colStats.getEngine());
lockForUpdate("PARTITIONS", "PART_ID", Optional.of("\"PART_ID\" = " + partitionIds.getFirst()));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change it to look up the partition by primary key

Copy link
Member

@deniskuzZ deniskuzZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-1

As far as I know in MySQL this will block others for the same row(the same for update), if the 0 row returns, means we have acquired the X tblId row lock, why do we need to retry again?

Using retries instead of blocking aligns better with optimistic concurrency control, allowing transactions to proceed without waiting and only retry if a conflict actually occurs.

The same approach is applied in several areas, including Iceberg’s conflict resolution (HIVE-26882) and compaction.

@dengzhhu653
Copy link
Member Author

dengzhhu653 commented Oct 26, 2025

-1

As far as I know in MySQL this will block others for the same row(the same for update), if the 0 row returns, means we have acquired the X tblId row lock, why do we need to retry again?

Using retries instead of blocking aligns better with optimistic concurrency control, allowing transactions to proceed without waiting and only retry if a conflict actually occurs.

The same approach is applied in several areas, including Iceberg’s conflict resolution (HIVE-26882) and compaction.

I don't think retry is the good way, it introduces another issue, the COLUMN_STATS in TABLE_PARAMS would be got overwritten, and lead to some column markers missing, so it might make the stats import useless.
For updating the stats, we have already had the in-process lock, it doesn't solve the parallel imports across the HMS instances, somehow we need the distributed lock to achieve it, the DB lock is one way.

@dengzhhu653
Copy link
Member Author

For optimistic concurrency control, IMO this is more suitable for read skew scenario, otherwise it's a waste of the resource to retry on conflict

@deniskuzZ
Copy link
Member

I don't think retry is the good way, it introduces another issue, the COLUMN_STATS in TABLE_PARAMS would be got overwritten, and lead to some column markers missing, so it might make the stats import useless. For updating the stats, we have already had the in-process lock, it doesn't solve the parallel imports across the HMS instances, somehow we need the distributed lock to achieve it, the DB lock is one way.

Locking TBLS and PARTITIONS isn’t necessarily a better solution. In large-scale, highly concurrent systems, locking often becomes a scalability bottleneck. Instead, most systems adopt optimistic locking strategies. Each operation assumes no conflict and only validates at commit, reducing contention and avoiding blocking.

The idea is to balance consistency with throughput. Extending this principle to stats import (which btw is not a core functionality) we could consider a mechanism based on versioning rather than traditional locking. Such an approach generally scales far better in multi-instance HMS environments than depending solely on table or partition locks.

@dengzhhu653
Copy link
Member Author

dengzhhu653 commented Oct 27, 2025

I don't think retry is the good way, it introduces another issue, the COLUMN_STATS in TABLE_PARAMS would be got overwritten, and lead to some column markers missing, so it might make the stats import useless. For updating the stats, we have already had the in-process lock, it doesn't solve the parallel imports across the HMS instances, somehow we need the distributed lock to achieve it, the DB lock is one way.

Locking TBLS and PARTITIONS isn’t necessarily a better solution. In large-scale, highly concurrent systems, locking often becomes a scalability bottleneck. Instead, most systems adopt optimistic locking strategies. Each operation assumes no conflict and only validates at commit, reducing contention and avoiding blocking.

The idea is to balance consistency with throughput. Extending this principle to stats import (which btw is not a core functionality) we could consider a mechanism based on versioning rather than traditional locking. Such an approach generally scales far better in multi-instance HMS environments than depending solely on table or partition locks.

Currently the HMS is lack of versioning, the RDMS nowadays provides MVCC for writes not block any reads, so the write is the main concern here.
The lock provides here is the row-level lock, it only blocks the write operation against the same table, such as the alter/drop the same table/partition, drop the db, or update the stats of this table/partition.

Even without the explicit lock, we still have some exclusive row lock under the scenes, the RDMS in background ensures strong consistency and reliability the HMS benefits from and relies on, and provides a milliseconds to seconds query execution time, that can also have a satisfied throughput.

@deniskuzZ
Copy link
Member

deniskuzZ commented Oct 27, 2025

Currently the HMS is lack of versioning, the RDMS nowadays provides MVCC for writes not block any reads, so the write is the main concern here.

MVCC (Multi-Version Concurrency Control) and SELECT ... FOR UPDATE (S4U) operate at completely different conceptual levels — they’re not comparable mechanisms even though both relate to concurrency.

-- MVCC style (optimistic)
BEGIN;
SELECT balance FROM accounts WHERE id = 1; -- snapshot read
-- some processing
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
COMMIT;
-- If another transaction modified same row → commit fails (conflict)

-- Pessimistic locking
BEGIN;
SELECT balance FROM accounts WHERE id = 1 FOR UPDATE;
-- locks row immediately
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
COMMIT;

The lock provides here is the row-level lock, it only blocks the write operation against the same table, such as the alter/drop the same table/partition, drop the db, or update the stats of this table/partition.

Every iceberg table commits involves alter table operation and it's non-blocking ATM.

Even without the explicit lock, we still have some exclusive row lock under the scenes, the RDMS in background ensures strong consistency and reliability the HMS benefits from and relies on, and provides a milliseconds to seconds query execution time, that can also have a satisfied throughput.

That’s the opposite of what CU experienced on a highly loaded MySQL cluster with S4U on the NEXT_TXN_ID table.

@dengzhhu653
Copy link
Member Author

dengzhhu653 commented Oct 28, 2025

-- MVCC style (optimistic)
BEGIN;
SELECT balance FROM accounts WHERE id = 1; -- snapshot read
-- some processing
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
COMMIT;
-- If another transaction modified same row → commit fails (conflict)

Here we still obtained a row-level lock on accounts(id=1) in MySQL/Postgres until the transaction is ended.

As I explained, this would cause lost-update,
After some research, I find it's hard to use DataNucleus without the pure "UPDATE" query,

  mTable.setLastAccessTime((int) (System.currentTimeMillis()/1000));
  pm.flush(); // here might flush the old MTable into data store
  pm.refresh(mTable);

In pm.flush() the old state of the table can get overwritten the one in data store, resulting to some columns missing in COLUMN_STATS_ACCURATE, for example:
{"COLUMN_STATS":{"col_0":"true","col_1":"true","col_2":"true","col_3":"true","col_5":"true","col_6":"true","col_7":"true","col_8":"true"}}
col_4 and col_9 are missing.

Every iceberg table commits involves alter table operation and it's non-blocking ATM.

The Iceberg commit relies on the DB transaction atomicity, it should involve the row lock behind the scenes, though the lock is quite small(TBL_ID, PARAM_KEY), if the table has multiple commits at the same time, only one is allowed to alter the TABLE_PARAMS.

That’s the opposite of what CU experienced on a highly loaded MySQL cluster with S4U on the NEXT_TXN_ID table.

This is due to every HMS request needs to lock the only one NEXT_TXN_ID, compared the same level lock distributed among different tables and affects the alter/drop request of the target table/partition

@sonarqubecloud
Copy link

@dengzhhu653
Copy link
Member Author

I guess I'm a bit get your optimistic concurrency control, let me try

@dengzhhu653
Copy link
Member Author

Filed another PR: #6159 to catch the nowait for update

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants