Skip to content

Conversation

gouhongshen
Copy link
Contributor

@gouhongshen gouhongshen commented Oct 19, 2025

User description

What type of PR is this?

  • API-change
  • BUG
  • Improvement
  • Documentation
  • Feature
  • Test and CI
  • Code Refactoring

Which issue(s) this PR fixes:

issue ##21979

What this PR does / why we need it:

Implement the data branch.


PR Type

Enhancement, Feature, Tests


Description

  • Implements comprehensive data branch functionality for snapshot-based table versioning and management

  • Adds core data branch operations: diff detection between snapshots, merge with conflict resolution (FAIL, SKIP, ACCEPT), and branch hierarchy management

  • Implements adaptive hashmap with disk spillover support for efficient change tracking and storage

  • Adds branch metadata system table (mo_branch_metadata) to track clone operations and branch relationships

  • Introduces DAG (Directed Acyclic Graph) structure with LCA (Lowest Common Ancestor) algorithm for branch hierarchy

  • Extends SQL grammar and parser to support DATA BRANCH statements (CREATE TABLE, CREATE DATABASE, DELETE TABLE, DELETE DATABASE, DIFF, MERGE)

  • Refactors clone operations to track branch metadata and support data branch workflows

  • Adds comprehensive test coverage for diff, merge, and conflict handling scenarios with various data types and dataset sizes

  • Updates privilege logic to support data branch operations with appropriate access controls

  • Standardizes keyword capitalization throughout codebase (Exists vs exists) for consistency

  • Adds support for commit timestamp column tracking in distributed TAE engine

  • Extends type system with generic value comparison function and decimal/timestamp support in result sets


Diagram Walkthrough

flowchart LR
  Parser["SQL Parser<br/>Data Branch Grammar"]
  Stmt["Statement Types<br/>DataBranchDiff<br/>DataBranchMerge<br/>DataBranchCreate/Delete"]
  Handler["Statement Handler<br/>handleDataBranch"]
  DiffMerge["Diff & Merge Engine<br/>Snapshot Comparison<br/>Conflict Resolution"]
  DAG["Branch DAG<br/>LCA Algorithm<br/>Hierarchy Management"]
  Hashmap["Adaptive Hashmap<br/>Memory + Disk Spillover<br/>Change Tracking"]
  Metadata["Branch Metadata Table<br/>mo_branch_metadata<br/>Clone Operations"]
  
  Parser --> Stmt
  Stmt --> Handler
  Handler --> DiffMerge
  DiffMerge --> DAG
  DiffMerge --> Hashmap
  Handler --> Metadata
  Metadata --> Metadata
Loading

File Walkthrough

Relevant files
Feature
7 files
data_branch.go
Implement data branch diff and merge operations                   

pkg/frontend/data_branch.go

  • Implements comprehensive data branch functionality including diff and
    merge operations between table snapshots
  • Provides core functions for handling snapshot differences,
    constructing change handles, and managing branch metadata
  • Implements merge logic with conflict resolution options (FAIL, SKIP,
    ACCEPT)
  • Includes helper functions for schema validation, relation pairing, and
    branch DAG construction
+1968/-0
branch_hashmap.go
Implement adaptive hashmap with disk spillover support     

pkg/frontend/databranchutils/branch_hashmap.go

  • Implements adaptive hashmap that automatically spills to disk when
    memory allocation fails
  • Provides operations for storing, retrieving, and removing vector-based
    records with key-value semantics
  • Includes concurrent encoding/decoding of rows and spillable partition
    management
  • Supports iteration over in-memory and spilled entries with proper
    resource cleanup
+1177/-0
data_branch.go
Add data branch statement type definitions and implementations

pkg/sql/parsers/tree/data_branch.go

  • New file implementing data branch statement types:
    DataBranchCreateTable, DataBranchDeleteTable,
    DataBranchCreateDatabase, DataBranchDeleteDatabase
  • Added DataBranchDiff and DataBranchMerge statement types for branch
    operations
  • Implemented object pool allocation for efficient memory management
    using reuse pattern
  • Defined conflict handling options (FAIL, SKIP, ACCEPT)
+392/-0 
branch_dag.go
Implement data branch DAG with LCA algorithm                         

pkg/frontend/databranchutils/branch_dag.go

  • New file implementing DataBranchDAG for managing branch hierarchy
    relationships
  • Implements LCA (Lowest Common Ancestor) algorithm for finding common
    parent branches
  • Supports depth calculation and parent-child relationship tracking
+195/-0 
branch_change_handle.go
Add branch change handle for data collection and filtering

pkg/frontend/databranchutils/branch_change_handle.go

  • New file implementing BranchChangeHandle wrapper for engine changes
    handling
  • Provides filtering capability for branch change data collection
  • Implements CollectChanges function for gathering changes between
    timestamps
+81/-0   
self_handle.go
Add data branch statement execution handler                           

pkg/frontend/self_handle.go

  • Added handler for data branch statement types (Diff, Merge,
    CreateTable, DeleteTable, DeleteDatabase, CreateDatabase)
  • Implemented handleDataBranch function call with proper frontend print
    tracking
+13/-0   
keywords.go
Add data branch and conflict handling keywords                     

pkg/sql/parsers/dialect/mysql/keywords.go

  • Added new keywords for data branch operations: branch, diff, conflict
  • Added conflict handling keywords: fail, skip, accept
+6/-0     
Formatting
15 files
pitr.go
Standardize capitalization of "Exists" in PITR operations

pkg/frontend/pitr.go

  • Updates error messages and log statements to use capitalized Exists
    instead of lowercase exists
  • Changes affect multiple functions including doCreatePitr, doDropPitr,
    doAlterPitr, doRestorePitr
  • Updates comments and error messages for consistency in PITR
    (Point-In-Time Recovery) operations
+23/-23 
publication_subscription_test.go
Standardize capitalization in publication subscription tests

pkg/frontend/publication_subscription_test.go

  • Updates test comments to use capitalized Exists instead of lowercase
    exists
  • Changes affect test functions Test_doAlterPublication,
    Test_doAlterPublication2, and Test_doDropPublication
+3/-3     
snapshot_restore_with_ts.go
Standardize capitalization of "Exists" in snapshot restore operations

pkg/frontend/snapshot_restore_with_ts.go

  • Updates log messages and SQL statements to use capitalized Exists
    instead of lowercase exists
  • Changes affect functions restoreToAccountFromTS, recreateTableFromTS,
    and restoreViewsFromTS
+3/-3     
snapshot.go
Standardize capitalization in snapshot rebuild logging     

pkg/vm/engine/tae/logtail/snapshot.go

  • Updates log message in RebuildAObjectDel function to use lowercase
    exists instead of capitalized Exists
  • Changes log warning message for consistency with logging conventions
+1/-1     
snapshot.go
Standardize keyword capitalization in snapshot operations

pkg/frontend/snapshot.go

  • Standardized error messages and comments to use capitalized Exists
    keyword
  • Updated SQL statements for drop operations to use Exists instead of
    exists
  • Updated log messages and comments throughout snapshot operations
+23/-23 
authenticate_test.go
Update test descriptions with standardized keyword capitalization

pkg/frontend/authenticate_test.go

  • Updated test descriptions and comments to use capitalized Exists
    keyword
  • Updated test case names for consistency with new keyword convention
+19/-19 
session.go
Standardize keyword capitalization in session authentication

pkg/frontend/session.go

  • Updated comments and debug messages to use capitalized Exists keyword
  • Standardized terminology in authentication and database verification
    steps
+8/-8     
query_result.go
Standardize keyword capitalization in query result handling

pkg/frontend/query_result.go

  • Updated comments to use capitalized Exists keyword for consistency
+2/-2     
system_initialize.go
Standardize keyword capitalization in system initialization

pkg/frontend/system_initialize.go

  • Updated comments to use capitalized Exists keyword for consistency
+2/-2     
mysql_protocol_predefines.go
Standardize keyword capitalization in protocol definitions

pkg/frontend/mysql_protocol_predefines.go

  • Updated comment to use capitalized Exists keyword for consistency
+1/-1     
publication_subscription.go
Standardize keyword capitalization in publication subscription

pkg/frontend/publication_subscription.go

  • Updated error message to use capitalized Exists keyword for
    consistency
+1/-1     
cdc_options.go
Standardize keyword capitalization in CDC options               

pkg/frontend/cdc_options.go

  • Updated comment to use capitalized Exists keyword for consistency
+1/-1     
txn.go
Fix comment capitalization in TxnHandler                                 

pkg/frontend/txn.go

  • Fixed comment capitalization: changed "exists always" to "Exists
    always"
  • Minor documentation improvement for code clarity
+1/-1     
util.go
Fix comment capitalization in utility functions                   

pkg/frontend/util.go

  • Fixed comment capitalization in two places: "path exists" to "path
    Exists" and "true/false - exists" to "true/false - Exists"
  • Improved documentation consistency
+2/-2     
pitr_test.go
Fix test case name capitalization                                               

pkg/frontend/pitr_test.go

  • Fixed test case name capitalization: "SubscriptionOption exists" to
    "SubscriptionOption Exists"
  • Maintains consistency with code style guidelines
+1/-1     
Enhancement
14 files
authenticate.go
Add branch metadata table support and update privilege logic

pkg/frontend/authenticate.go

  • Added catalog.MO_BRANCH_METADATA to system account tables and
    privilege maps
  • Updated drop SQL statements to use capitalized Exists keyword for
    consistency
  • Added MoCatalogBranchMetadataDDL to catalog initialization
  • Updated privilege determination logic for data branch operations
    (CreateTable, DeleteTable, Merge, Diff)
  • Changed privilege requirements for clone operations to use
    PrivilegeTypeTableAll instead of PrivilegeTypeInsert
+76/-64 
clone.go
Implement data branch metadata tracking for clone operations

pkg/frontend/clone.go

  • Added support for data branch operations with new cloneReceipt struct
  • Refactored getOpAndToAccountId to use ToAccountOpt instead of string
    parameter
  • Added resolveSnapshot helper function for timestamp resolution
  • Implemented updateBranchMetaTable to record clone operations in branch
    metadata table
  • Updated clone database handling to preserve and restore default
    database context
+159/-18
encoding.go
Add generic value comparison function for all types           

pkg/container/types/encoding.go

  • Added new CompareValues function to compare values of different types
  • Supports comparison for all major types including numeric, temporal,
    and special types
  • Handles Decimal64, Decimal128, UUID, TS, and Rowid types
+103/-0 
build_alter_table.go
Improve variable naming for fake primary key tracking       

pkg/sql/plan/build_alter_table.go

  • Renamed variable hasFakePK to copyFakePKCol for better clarity
  • Updated logic to properly track fake primary key column handling
    during alter table copy operations
+5/-6     
local_disttae_datasource.go
Add commit timestamp column support to disttae datasource

pkg/vm/engine/disttae/local_disttae_datasource.go

  • Added support for DefaultCommitTS_Attr column in filtered in-memory
    committed inserts
  • Implemented logic to append commit timestamp values to output batch
    when present
+22/-1   
ddl.go
Mark deleted tables in branch metadata during drop operations

pkg/sql/compile/ddl.go

  • Updated drop table logic to mark branch metadata entries as deleted
  • Added SQL statement to update table_deleted flag in mo_branch_metadata
    table
  • Improved error handling and logging for extra SQL operations
+21/-7   
reader.go
Add commit timestamp attribute support to reader utility 

pkg/vm/engine/readutil/reader.go

  • Added handling for DefaultCommitTS_Attr column in column update logic
  • Maps commit timestamp attribute to appropriate sequence number and
    type
+5/-2     
clone.go
Refactor clone statement to use ToAccountOpt structure     

pkg/sql/parsers/tree/clone.go

  • Added new ToAccountOpt struct to encapsulate account option
    information
  • Updated CloneTable and CloneDatabase to use ToAccountOpt instead of
    direct identifier
  • Improved type safety and extensibility for account-related clone
    operations
+10/-6   
stmt_kind.go
Enable data branch statements in uncommitted transactions

pkg/frontend/stmt_kind.go

  • Added data branch statement types to
    statementCanBeExecutedInUncommittedTransaction function
  • Allows DataBranchCreateTable, DataBranchCreateDatabase,
    DataBranchDeleteTable, and DataBranchDeleteDatabase to execute in
    uncommitted transactions
+6/-1     
types.go
Add data branch frontend print type and standardize keywords

pkg/frontend/types.go

  • Added FPDataBranch constant to frontend print types
  • Updated comments to use capitalized Exists keyword
+2/-1     
resultset.go
Add decimal and timestamp type support to result set         

pkg/frontend/resultset.go

  • Added support for Decimal64, Decimal128, Decimal256, and Timestamp
    types in GetString method
  • Fixed error message format to use type assertion instead of numeric
    type code
+9/-1     
func_mo.go
Register MO_BRANCH_METADATA in function catalog                   

pkg/sql/plan/function/func_mo.go

  • Added catalog.MO_BRANCH_METADATA constant to the function registry
    with value 0
  • Registers the new branch metadata system table in the function mapping
+1/-0     
types.go
Define MO_BRANCH_METADATA system table constant                   

pkg/catalog/types.go

  • Added new constant MO_BRANCH_METADATA = "mo_branch_metadata" to system
    table definitions
  • Defines the catalog name for the branch metadata system table
+2/-0     
mysql_sql.y
Implement data branch SQL grammar and parsing                       

pkg/sql/parsers/dialect/mysql/mysql_sql.y

  • Added new union types: toAccountOpt, conflictOpt, and diffAsOpt for
    branch operations
  • Added new tokens: BRANCH, LOG, REVERT, REBASE, DIFF, CONFLICT,
    CONFLICT_FAIL, CONFLICT_SKIP, CONFLICT_ACCEPT
  • Implemented branch_stmt grammar rule supporting DATA BRANCH operations
    (CREATE TABLE, CREATE DATABASE, DELETE TABLE, DELETE DATABASE, DIFF,
    MERGE)
  • Added helper rules: to_account_opt, conflict_opt, diff_as_opt for
    optional parameters
  • Updated create_table_stmt to use to_account_opt instead of hardcoded
    TO ACCOUNT syntax
  • Added DATA and BRANCH to algorithm_type alternatives
+112/-22
Tests
23 files
branch_hashmap_test.go
Add comprehensive tests for branch hashmap implementation

pkg/frontend/databranchutils/branch_hashmap_test.go

  • New test file implementing comprehensive tests for BranchHashmap
    functionality
  • Tests cover basic operations (Put, Get), composite keys, spilling to
    disk, and partial deletion
  • Includes helper functions for building test vectors and a limited
    allocator for testing spill behavior
+465/-0 
branch_dag_test.go
Add tests for data branch DAG and LCA algorithm                   

pkg/frontend/databranchutils/branch_dag_test.go

  • New test file for DAG (Directed Acyclic Graph) functionality
    supporting branch hierarchy
  • Tests node existence checks and Lowest Common Ancestor (LCA) finding
    algorithm
  • Covers complex forest structures with multiple trees and edge cases
+186/-0 
diff_3.result
Add data branch diff test results for mixed types               

test/distributed/cases/snapshot/branch/diff/diff_3.result

  • Added comprehensive test results for data branch diff operations on
    mixed data types
  • Tests three scenarios: composite primary keys with numeric/temporal
    types, string primary keys with decimal/binary columns, and temporal
    primary keys with float/text updates
  • Validates diff output format and correctness across different table
    structures
[link]   
diff_3.sql
Add data branch diff test cases for mixed types                   

test/distributed/cases/snapshot/branch/diff/diff_3.sql

  • Added test cases for data branch diff operations with complex data
    types
  • Tests three scenarios: orders with composite keys, SKU inventory with
    binary data, and sensor events with temporal keys
  • Validates diff detection across multiple branches with snapshots
+127/-0 
merge_1.result
Add data branch merge test results                                             

test/distributed/cases/snapshot/branch/merge/merge_1.result

  • Added comprehensive test results for data branch merge operations
  • Tests multiple merge scenarios including simple merges, mixed merges
    with updates, and merges with cloned tables
  • Validates merge correctness and diff state after merge operations
+164/-0 
diff_1.result
Add data branch diff test results with snapshots                 

test/distributed/cases/snapshot/branch/diff/diff_1.result

  • Added test results for data branch diff operations with snapshots
  • Tests various scenarios: tables with no LCA, branched tables with
    same/different branch times, and LCA relationships
  • Validates diff output for different snapshot and branching
    configurations
+131/-0 
diff_2.result
Add data branch diff test results for large datasets         

test/distributed/cases/snapshot/branch/diff/diff_2.result

  • Added test results for data branch diff operations with large datasets
    and complex primary keys
  • Tests delete/update operations, LCA scenarios, and composite primary
    key handling
  • Validates diff correctness with 8192+ row tables
+108/-0 
diff_1.sql
Add data branch diff test cases with snapshots                     

test/distributed/cases/snapshot/branch/diff/diff_1.sql

  • Added test cases for data branch diff with various LCA (Lowest Common
    Ancestor) scenarios
  • Tests tables with no LCA, branched tables at same/different times, and
    LCA relationships
  • Validates diff operations with snapshots across different branching
    patterns
+128/-0 
merge_1.sql
Add data branch merge test cases                                                 

test/distributed/cases/snapshot/branch/merge/merge_1.sql

  • Added test cases for data branch merge operations
  • Tests simple merges, mixed merges with updates, and merges with cloned
    tables
  • Validates merge behavior and resulting table state
+125/-0 
diff_2.sql
Add data branch diff test cases for large datasets             

test/distributed/cases/snapshot/branch/diff/diff_2.sql

  • Added test cases for data branch diff with large datasets and complex
    scenarios
  • Tests delete/update operations on large tables, LCA scenarios, and
    composite primary keys
  • Validates diff detection with 8192+ row datasets
+105/-0 
merge_2.result
Add data branch merge conflict handling test results         

test/distributed/cases/snapshot/branch/merge/merge_2.result

  • Added test results for data branch merge with conflict handling
  • Tests conflict scenarios with CONFLICT_FAIL, CONFLICT_SKIP, and
    CONFLICT_ACCEPT options
  • Validates merge behavior when conflicts occur during merge operations
+112/-0 
merge_2.sql
Add data branch merge conflict handling test cases             

test/distributed/cases/snapshot/branch/merge/merge_2.sql

  • Added test cases for data branch merge with conflict resolution
  • Tests conflict scenarios with different conflict handling strategies
    (skip, accept)
  • Validates merge behavior when source and destination tables have
    conflicting changes
+83/-0   
tenant.result
Update tenant test results for branch metadata table         

test/distributed/cases/tenant/tenant.result

  • Updated query filter from relname not like '__mo_index_unique__%' to
    relname not like '__mo_index%'
  • Updated expected results to include mo_branch_metadata in system table
    listings
  • Adjusted table ordering to reflect new system table addition
+4/-3     
starlark.result
Update starlark procedure test results with escaped keywords

test/distributed/cases/procedure/starlark.result

  • Updated procedure parameter names to use backticks for reserved
    keywords: mo and mo.foo
  • Reflects proper escaping of reserved words in procedure definitions
+2/-2     
sp_table.result
Update sp_table test results for branch metadata                 

test/distributed/cases/dml/select/sp_table.result

  • Updated query filter from relname not like '__mo_index_unique__%' to
    relname not like '__mo_index%'
  • Added mo_branch_metadata to expected system table results
+2/-1     
sp_table.sql
Update sp_table query filter for index matching                   

test/distributed/cases/dml/select/sp_table.sql

  • Updated query filter from relname not like '__mo_index_unique__%' to
    relname not like '__mo_index%'
  • Broadens index filtering to match all index-related tables
+1/-1     
tenant.test
Update tenant test query filter for index matching             

test/distributed/cases/tenant/tenant.test

  • Updated query filter from relname not like '__mo_index_unique__%' to
    relname not like '__mo_index%'
  • Aligns with broader index table filtering strategy
+1/-1     
starlark.sql
Update starlark procedure definitions with escaped keywords

test/distributed/cases/procedure/starlark.sql

  • Updated procedure parameter definitions to use backticks for reserved
    keywords: mo and mo.foo
  • Properly escapes reserved words in procedure parameter names
+2/-2     
restore_cluster_table.result
Update cluster restore test results for branch metadata   

test/distributed/cases/snapshot/cluster/restore_cluster_table.result

  • Added mo_branch_metadata to expected system table listings in two
    locations
  • Updated show tables output to include the new branch metadata system
    table
+2/-0     
cluster_level_snapshot_restore_cluster.result
Update cluster snapshot restore test results for branch metadata

test/distributed/cases/snapshot/cluster_level_snapshot_restore_cluster.result

  • Added mo_branch_metadata to expected system table listings in two
    locations
  • Updated show tables output to include the new branch metadata system
    table
+2/-0     
clone_sys_db_table_to_new_db_table.result
Update clone system table test results for branch metadata

test/distributed/cases/snapshot/clone/clone_sys_db_table_to_new_db_table.result

  • Added mo_branch_metadata to expected system table listings in two
    locations
  • Updated show tables output for cloned catalog databases
+2/-0     
show.result
Update show tables test results for branch metadata           

test/distributed/cases/dml/show/show.result

  • Added mo_branch_metadata to expected system table listings
  • Updated show tables output to include the new branch metadata system
    table
+1/-0     
create_user_default_role.result
Update privilege test results for branch metadata               

test/distributed/cases/tenant/privilege/create_user_default_role.result

  • Added mo_branch_metadata to expected system table listings
  • Updated show tables output to include the new branch metadata system
    table
+1/-0     
Configuration changes
4 files
cluster_upgrade_list.go
Add branch metadata table to v4.0.0 cluster upgrade           

pkg/bootstrap/versions/v4_0_0/cluster_upgrade_list.go

  • Added upgrade entry for creating mo_branch_metadata table
  • Integrated branch metadata table creation into cluster upgrade process
+13/-0   
cluster_upgrade_list.go
Add branch metadata table to v3.0.0 cluster upgrade           

pkg/bootstrap/versions/v3_0_0/cluster_upgrade_list.go

  • Added upgrade entry for creating mo_branch_metadata table
  • Integrated branch metadata table creation into v3.0.0 cluster upgrade
    process
+12/-0   
predefined.go
Add branch metadata table DDL definition                                 

pkg/frontend/predefined.go

  • Added MoCatalogBranchMetadataDDL constant defining the branch metadata
    table schema
  • Table includes columns for table_id, clone_ts, p_table_id, creator,
    level, and table_deleted flag
  • Includes indexes on p_table_id and creator columns
+12/-0   
lib.go
Add math library linking flag to CGO                                         

cgo/lib.go

  • Added #cgo LDFLAGS: -lm compiler flag for linking the math library
  • Enables proper linking of mathematical functions in C code
+1/-0     
Additional files
1 files
mysql_sql.go [link]   

@qodo-merge-pro
Copy link

qodo-merge-pro bot commented Oct 19, 2025

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
SQL injection

Description: SQL statements for MERGE are constructed via string concatenation with unescaped values,
potentially allowing SQL injection if row values contain quotes or special characters.
data_branch.go [383-420]

Referred Code
initSqlBuf := func() {
	cnt = 0
	buf.Reset()
	buf.WriteString(fmt.Sprintf("replace into %s.%s values ",
		dstRel.GetTableDef(execCtx.reqCtx).DbName,
		dstRel.GetTableDef(execCtx.reqCtx).Name),
	)
}

writeOneRow := func(row []any) error {
	if buf.Bytes()[buf.Len()-1] == ')' {
		buf.WriteString(",")
	}
	cnt++
	buf.WriteString("(")
	for j := 2; j < len(row); j++ {
		switch row[j].(type) {
		case string, []byte:
			buf.WriteString(fmt.Sprintf("'%v'", row[j]))
		default:
			buf.WriteString(fmt.Sprintf("%v", row[j]))


 ... (clipped 17 lines)
SQL injection

Description: Dynamic SQL WHERE clauses are built by concatenating decoded primary key values without
parameterization or proper escaping, which risks SQL injection when string/byte values
contain quotes.
data_branch.go [1037-1124]

Referred Code
case int32:
	buf.WriteString(strconv.FormatInt(int64(pk), 10))
case uint64:
	buf.WriteString(strconv.FormatUint(pk, 10))
case int64:
	buf.WriteString(strconv.FormatInt(pk, 10))
case []uint8:
	buf.WriteString("'")
	buf.WriteString(string(pk))
	buf.WriteString("'")
case types.Timestamp:
	buf.WriteString("'")
	buf.WriteString(pk.String2(time.Local, dels.GetType().Scale))
	buf.WriteString("'")
case types.Datetime:
	buf.WriteString("'")
	buf.WriteString(pk.String2(dels.GetType().Scale))
	buf.WriteString("'")
case types.Date:
	buf.WriteString("'")
	buf.WriteString(pk.String())


 ... (clipped 67 lines)
SQL injection

Description: For non-composite, non-fake primary keys, values are inserted directly into IN (...) lists
via fmt without escaping, which can break SQL or allow injection for string-like types.
data_branch.go [1119-1124]

Referred Code
	sql = fmt.Sprintf(
		"select * from %s.%s %s where %s in (%s) ",
		lcaTblDef.DbName, lcaTblDef.Name, mots, lcaTblDef.Pkey.PkeyColName, buf.String(),
	)
}
Build configuration

Description: Adding -lm to LDFLAGS may introduce platform-specific linkage changes; ensure no
unintended dynamic linking or symbol resolution issues in restricted environments.
lib.go [18-20]

Referred Code
#cgo CFLAGS: -O3 -std=c99
#cgo LDFLAGS: -lm
*/
Ticket Compliance
🟡
🎫 #21979
🟢 Add support for new data branching syntax across SQL grammar and parser.
Implement core data branch operations including DIFF between snapshots.
Implement MERGE operation with conflict handling options (FAIL, SKIP, ACCEPT).
Manage branch hierarchy using a DAG with LCA determination between tables/snapshots.
Maintain branch metadata (e.g., mo_branch_metadata) for clone/branch relationships.
Provide execution-layer integration to compute diffs and apply merges efficiently.
Add tests validating diff and merge behavior across scenarios.
Integrate privileges for data branch statements.
🔴
End-to-end validation of new SQL syntax and parser changes across all dialect paths in
production binaries.
Performance characteristics of diff/merge on large datasets and spill-to-disk behavior
under pressure.
Privilege model correctness in multi-tenant and cross-account scenarios.
Backward compatibility and upgrade/migration paths involving mo_branch_metadata table
across versions.
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
No custom compliance provided

Follow the guide to enable custom compliance check.

  • Update
Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@qodo-merge-pro
Copy link

qodo-merge-pro bot commented Oct 19, 2025

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Security
Fix critical SQL injection vulnerability

Fix a critical SQL injection vulnerability in the writeOneRow function by
properly escaping string values before embedding them into the REPLACE INTO
query.

pkg/frontend/data_branch.go [392-409]

 	writeOneRow := func(row []any) error {
+		// Helper function to escape string values for SQL.
+		// This is a simplified example. A robust implementation should handle various edge cases.
+		escapeString := func(v string) string {
+			v = strings.ReplaceAll(v, `\`, `\\`)
+			v = strings.ReplaceAll(v, `'`, `''`)
+			return v
+		}
+
 		if buf.Bytes()[buf.Len()-1] == ')' {
 			buf.WriteString(",")
 		}
 		cnt++
 		buf.WriteString("(")
 		for j := 2; j < len(row); j++ {
-			switch row[j].(type) {
-			case string, []byte:
-				buf.WriteString(fmt.Sprintf("'%v'", row[j]))
+			switch v := row[j].(type) {
+			case string:
+				buf.WriteString(fmt.Sprintf("'%s'", escapeString(v)))
+			case []byte:
+				buf.WriteString(fmt.Sprintf("'%s'", escapeString(string(v))))
 			default:
-				buf.WriteString(fmt.Sprintf("%v", row[j]))
+				buf.WriteString(fmt.Sprintf("%v", v))
 			}
 			if j != len(row)-1 {
 				buf.WriteString(",")
 			}
 		}
 		buf.WriteString(")")
 ...

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 10

__

Why: The suggestion correctly identifies a critical SQL injection vulnerability in the writeOneRow function and proposes a valid mitigation strategy, which is crucial for application security.

High
Prevent SQL injection in WHERE clauses

Fix a critical SQL injection vulnerability in handleDelsOnLCA by properly
escaping values used to dynamically construct SQL WHERE clauses.

pkg/frontend/data_branch.go [953-1118]

 func handleDelsOnLCA(
 ...
 ) (rows [][]any, err error) {
+
+	// Helper function to escape string values for SQL.
+	// This is a simplified example. A robust implementation should handle various edge cases.
+	escapeString := func(v string) string {
+		v = strings.ReplaceAll(v, `\`, `\\`)
+		v = strings.ReplaceAll(v, `'`, `''`)
+		return v
+	}
 
 	var (
 		sql       string
 		buf       bytes.Buffer
 ...
 	// composite pk
 	if baseTblDef.Pkey.CompPkeyCol != nil {
 ...
 			for j, _ := range tuple {
 ...
 				switch pk := tuple[j].(type) {
 				case string:
 					buf.WriteString("'")
-					buf.WriteString(pk)
+					buf.WriteString(escapeString(pk))
 					buf.WriteString("'")
 ...
 				case []uint8:
 					buf.WriteString("'")
-					buf.WriteString(string(pk))
+					buf.WriteString(escapeString(string(pk)))
 					buf.WriteString("'")
 ...
 	// real pk
 	} else {
 		for i := range dels.Length() {
 			b := dels.GetRawBytesAt(i)
 			val := types.DecodeValue(b, dels.GetType().Oid)
-			switch val.(type) {
+			switch v := val.(type) {
 			case []byte:
 				buf.WriteString("'")
-				buf.WriteString(string(val.([]byte)))
+				buf.WriteString(escapeString(string(v)))
+				buf.WriteString("'")
+			case string:
+				buf.WriteString("'")
+				buf.WriteString(escapeString(v))
 				buf.WriteString("'")
 			default:
 				buf.WriteString(fmt.Sprintf("%v", val))
 			}
 ...

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 10

__

Why: The suggestion correctly identifies a critical SQL injection vulnerability in the dynamic construction of WHERE clauses and proposes a necessary security fix.

High
High-level
Reconsider the custom adaptive hashmap

The custom adaptive hashmap with disk spilling should be reconsidered. Instead,
using a well-established embedded key-value store library could reduce
complexity, improve reliability, and lower maintenance costs.

Examples:

pkg/frontend/databranchutils/branch_hashmap.go [1-1177]
package databranchutils

import (
	"bytes"
	"container/list"
	"encoding/binary"
	"fmt"
	"io"
	"os"
	"runtime"

 ... (clipped 1167 lines)
pkg/frontend/data_branch.go [609-611]
		baseDataHashmap      databranchutils.BranchHashmap
		baseTombstoneHashmap databranchutils.BranchHashmap
	)

Solution Walkthrough:

Before:

// In pkg/frontend/data_branch.go
func diff(...) (err error) {
    var (
        baseDataHashmap      databranchutils.BranchHashmap
        baseTombstoneHashmap databranchutils.BranchHashmap
    )
    // ...
    if baseDataHashmap, baseTombstoneHashmap, err = buildHashmapForBaseTable(
        ctx, mp, dagInfo.lcaTableId, pkColIdxes, baseHandle,
    ); err != nil {
        return
    }
    // ...
    if checkRet, err = baseDataHashmap.PopByVectors(pkVecs, false); err != nil {
        return
    }
    // ...
}

// In pkg/frontend/databranchutils/branch_hashmap.go
// A full custom implementation of a hash map with:
// - In-memory map `inMemory map[uint64]*hashBucket`
// - Disk spilling logic `spill(required uint64)`
// - Concurrency management `pool *ants.Pool`
// - Custom memory allocation and buffer management
type branchHashmap struct { ... }

After:

// In pkg/frontend/data_branch.go
import "some/embedded/kv/store" // e.g., badger, pebble

func diff(...) (err error) {
    // Open an embedded KV store, possibly in-memory
    opts := kv.DefaultOptions("").WithInMemory(true)
    db, err := kv.Open(opts)
    if err != nil { return err }
    defer db.Close()

    // Build the KV store from the base table
    err = db.Update(func(txn *kv.Txn) error {
        // ... logic to iterate through baseHandle ...
        // ... serialize key and value from vectors ...
        return txn.Set(serializedKey, serializedValue)
    })
    // ...
    // Probe the KV store
    err = db.Update(func(txn *kv.Txn) error {
        // ... logic to iterate through tarHandle ...
        // ... serialize key from vectors ...
        item, err := txn.Get(serializedKey)
        // ...
        // If popping, use txn.Delete(serializedKey)
    })
}
// pkg/frontend/databranchutils/branch_hashmap.go would be removed or become a thin wrapper.
Suggestion importance[1-10]: 9

__

Why: The suggestion raises a critical architectural concern about implementing a custom adaptive hashmap instead of using a proven library, which has significant implications for reliability and long-term maintenance.

High
Possible issue
Fix incorrect object pool reset logic

Correct the reset method for DataBranchCreateTable to properly reset its fields
for object pooling instead of incorrectly re-initializing the struct.

pkg/sql/parsers/tree/data_branch.go [142-145]

 func (d *DataBranchCreateTable) reset() {
+	d.ToAccountOpt = nil
+	d.SrcTable.reset()
 	d.CreateTable.reset()
-	*d = DataBranchCreateTable{}
 }
  • Apply / Chat
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies a bug in the object pool reset logic, where resetting a field is made redundant by re-initializing the entire struct, which could lead to incorrect behavior or memory leaks.

Medium
Prevent integer overflow in comparison logic

Refactor the integer comparison logic in CompareValues to use explicit if/else
checks, preventing potential overflow issues that can arise from subtracting
values.

pkg/container/types/encoding.go [382-397]

 	case T_int8:
 		lVal := left.(int8)
 		rVal := right.(int8)
-		return int(lVal - rVal)
+		if lVal > rVal {
+			return 1
+		} else if lVal < rVal {
+			return -1
+		}
+		return 0
 	case T_int16:
 		lVal := left.(int16)
 		rVal := right.(int16)
-		return int(lVal - rVal)
+		if lVal > rVal {
+			return 1
+		} else if lVal < rVal {
+			return -1
+		}
+		return 0
 	case T_int32:
 		lVal := left.(int32)
 		rVal := right.(int32)
-		return int(lVal - rVal)
+		if lVal > rVal {
+			return 1
+		} else if lVal < rVal {
+			return -1
+		}
+		return 0
 	case T_int64:
 		lVal := left.(int64)
 		rVal := right.(int64)
-		return int(lVal - rVal)
+		if lVal > rVal {
+			return 1
+		} else if lVal < rVal {
+			return -1
+		}
+		return 0
 ...

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies a potential integer overflow bug in the newly added CompareValues function, which could lead to incorrect comparison results for edge-case values.

Medium
Improve memory spilling logic robustness

Improve the memory spilling logic in allocateBuffer by repeatedly attempting to
spill and re-allocate until the allocation succeeds or no more data can be
spilled.

pkg/frontend/databranchutils/branch_hashmap.go [728-747]

 func (bh *branchHashmap) allocateBuffer(size uint64) ([]byte, malloc.Deallocator, error) {
 	buf, deallocator, err := bh.allocator.Allocate(size, malloc.NoClear)
+	if err == nil && buf != nil {
+		return buf[:size], deallocator, nil
+	}
 	if err != nil {
 		return nil, nil, err
 	}
-	if buf == nil {
-		fmt.Println("spill")
-		if err := bh.spill(size); err != nil {
-			return nil, nil, err
+
+	// Loop until allocation succeeds or we can't spill anymore.
+	for {
+		spilled, spillErr := bh.spill(size)
+		if spillErr != nil {
+			return nil, nil, spillErr
 		}
+
 		buf, deallocator, err = bh.allocator.Allocate(size, malloc.NoClear)
 		if err != nil {
 			return nil, nil, err
 		}
-		if buf == nil {
+		if buf != nil {
+			return buf[:size], deallocator, nil
+		}
+
+		// If spill did not free any memory and allocation still fails, we are out of options.
+		if spilled == 0 {
 			return nil, nil, moerr.NewInternalErrorNoCtx("branchHashmap failed to allocate memory after spilling")
 		}
 	}
-	return buf[:size], deallocator, nil
 }
  • Apply / Chat
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a potential weakness in the memory spilling logic and proposes a more robust, iterative approach to handle memory allocation failures.

Medium
  • Update

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/wip kind/bug Something isn't working kind/feature Review effort 5/5 size/XXL Denotes a PR that changes 2000+ lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants