Skip to content

Conversation

juntaozhang
Copy link
Contributor

Purpose

Spark data evolution table can appear inconsistent before and after compaction. Example:

CREATE TABLE s (id INT, b INT);
INSERT INTO s VALUES (1, 11), (2, 22);

CREATE TABLE t (id INT, b INT, c INT) TBLPROPERTIES ('row-tracking.enabled' = 'true', 'data-evolution.enabled' = 'true');
INSERT INTO t VALUES (2, 2, 2), (3, 3, 3);
MERGE INTO t
USING s
ON t.id = s.id
WHEN MATCHED THEN UPDATE SET t.b = s.b
WHEN NOT MATCHED THEN INSERT (id, b, c) VALUES (id, b, 0);
select *, _ROW_ID, _SEQUENCE_NUMBER from t order by _ROW_ID asc;
CALL sys.compact(table => 't');
select *, _ROW_ID, _SEQUENCE_NUMBER from t order by _ROW_ID asc;

before compaction:

+----+----+---+---------+------------------+
| id |  b | c | _ROW_ID | _SEQUENCE_NUMBER |
+----+----+---+---------+------------------+
|  2 | 22 | 2 |       0 |                2 |
|  3 |  3 | 3 |       1 |                2 |
|  1 | 11 | 0 |       2 |                2 |
+----+----+---+---------+------------------+

after compaction:

+--------+----+--------+---------+------------------+
|     id |  b |      c | _ROW_ID | _SEQUENCE_NUMBER |
+--------+----+--------+---------+------------------+
| <NULL> | 22 | <NULL> |       0 |                2 |
|      2 |  2 |      2 |       0 |                1 |
| <NULL> |  3 | <NULL> |       1 |                2 |
|      3 |  3 |      3 |       1 |                1 |
|      1 | 11 |      0 |       2 |                2 |
+--------+----+--------+---------+------------------+

Disable compaction in Spark to align with Flink behavior (#6112).

Tests

API and Format

Documentation

@JingsongLi
Copy link
Contributor

+1

@JingsongLi JingsongLi merged commit db56793 into apache:master Oct 10, 2025
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants