Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Improve generating and writing out changes in Merge
This changes is part of a larger effort to improve merge performance, see #1827 ## Description This change rewrites the way modified data is written out in merge to improve performance. `writeAllChanges` now generates a dataframe containing all the updated and copied rows to write out by building a large expression that selectively applies the right merge action to each row. This replaces the previous method that relied on applying a function to individual rows. Changes: - Move `findTouchedFiles` and `writeAllchanges` to a dedicated new trait `ClassicMergeExecutor` implementing the regular merge path when `InsertOnlyMergeExecutor` is not used. - Introduce methods in `MergeOutputGeneration` to transform the merge clauses into expressions that can be applied to generate the output of the merge operation (both main data and CDC data). This change fully preserve the behavior of merge which is extensively tested in `MergeIntoSuiteBase`, `MergeIntoSQLSuite`, `MergeIntoScalaSuite`, `MergeCDCSuite`, `MergeIntoMetricsBase`, `MergeIntoNotMatchedBySourceSuite`. Closes #1854 GitOrigin-RevId: d8c8a0e9439c6710978f2ec345cb94b2b9b19e0e
- Loading branch information