Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Online DDL: ensure high lock_wait_timeout in Vreplication cut-over #16601

Merged
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -455,6 +455,11 @@ func testScheduler(t *testing.T) {
})
}

t.Run("set low lock_wait_timeout", func(t *testing.T) {
_, err := primaryTablet.VttabletProcess.QueryTablet("set global lock_wait_timeout=1", keyspaceName, false)
require.NoError(t, err)
})

// CREATE
t.Run("CREATE TABLEs t1, t2", func(t *testing.T) {
{ // The table does not exist
Expand Down Expand Up @@ -578,6 +583,16 @@ func testScheduler(t *testing.T) {
assert.NotEmpty(t, rs.Rows)
})

t.Run("low @@lock_wait_timeout", func(t *testing.T) {
defer primaryTablet.VttabletProcess.QueryTablet("set global lock_wait_timeout=100000", keyspaceName, false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does this 100000 value come from?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some high value. Let me change that into something more official. One sec.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now setting to actual original lock_wait_timeout.


t1uuid = testOnlineDDLStatement(t, createParams(trivialAlterT1Statement, ddlStrategy, "vtgate", "", "", false)) // wait
t.Run("trivial t1 migration", func(t *testing.T) {
onlineddl.CheckMigrationStatus(t, &vtParams, shards, t1uuid, schema.OnlineDDLStatusComplete)
checkTable(t, t1Name, true)
})
})

forceCutoverCapable, err := capableOf(capabilities.PerformanceSchemaDataLocksTableCapability) // 8.0
require.NoError(t, err)
if forceCutoverCapable {
Expand Down
38 changes: 38 additions & 0 deletions go/vt/vttablet/onlineddl/executor.go
Original file line number Diff line number Diff line change
Expand Up @@ -979,7 +979,16 @@ func (e *Executor) cutOverVReplMigration(ctx context.Context, s *VReplStream, sh
if err != nil {
return err
}
// Set large enough `@@lock_wait_timeout` so that it does not interfere with the cut-over operation.
// The code will ensure everything that needs to be terminated by `migrationCutOverThreshold` will be terminated.
// Setting `lock_wait_timeout` here is done to just ensure MySQL does not interrupt prematurely. It is not
// done by means of ensuring cleanup/termination of operations.
lockConnRestoreLockWaitTimeout, err := e.initConnectionLockWaitTimeout(ctx, lockConn.Conn, 5*migrationCutOverThreshold)
if err != nil {
return err
}
defer lockConn.Recycle()
defer lockConnRestoreLockWaitTimeout()
defer lockConn.Conn.Exec(ctx, sqlUnlockTables, 1, false)

renameCompleteChan := make(chan error)
Expand All @@ -988,6 +997,14 @@ func (e *Executor) cutOverVReplMigration(ctx context.Context, s *VReplStream, sh
if err != nil {
return err
}
// Set large enough `@@lock_wait_timeout` so that it does not interfere with the cut-over operation.
// The code will ensure everything that needs to be terminated by `migrationCutOverThreshold` will be terminated.
// Setting `lock_wait_timeout` here is done to just ensure MySQL does not interrupt prematurely. It is not
// done by means of ensuring cleanup/termination of operations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the It is not // done by means of ensuring cleanup/termination of operations. part is about?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to explain that lock_wait_timeout has no role in the cut-over algorithm. It's not placed here so as to be part of the algorithm. We don't rely on MySQL to e.g. ensure proper rollback. The only reason we put it here is to ensure it is "higher than we'd run into".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can probably remove that whole sentence. The comment already says "The code will ensure ...", which is sufficient explanation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

renameConnRestoreLockWaitTimeout, err := e.initConnectionLockWaitTimeout(ctx, renameConn.Conn, 5*migrationCutOverThreshold*4)
if err != nil {
return err
}
defer renameConn.Recycle()
defer func() {
if !renameWasSuccessful {
Expand All @@ -997,6 +1014,8 @@ func (e *Executor) cutOverVReplMigration(ctx context.Context, s *VReplStream, sh
}
}
}()
defer renameConnRestoreLockWaitTimeout()

// See if backend MySQL server supports 'rename_table_preserve_foreign_key' variable
preserveFKSupported, err := e.isPreserveForeignKeySupported(ctx)
if err != nil {
Expand Down Expand Up @@ -1260,6 +1279,25 @@ func (e *Executor) initMigrationSQLMode(ctx context.Context, onlineDDL *schema.O
return deferFunc, nil
}

// initConnectionLockWaitTimeout sets the given lock_wait_timeout for the given connection, with a deferred value restoration function
func (e *Executor) initConnectionLockWaitTimeout(ctx context.Context, conn *connpool.Conn, lockWaitTimeout time.Duration) (deferFunc func(), err error) {
deferFunc = func() {}
mattlord marked this conversation as resolved.
Show resolved Hide resolved

if _, err := conn.Exec(ctx, `set @lock_wait_timeout=@@lock_wait_timeout`, 0, false); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, but IMO slightly clearer to always use the return var. i.e. not use := here. Behavior is the same though either way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it would add clarity and intention to specify global here and a comment for those not super famiilar:

// Set a connection level SQL variable to the current global value.
set @lock_wait_timeout=@@global.lock_wait_timeout

Similar for the defer:

set @@session.lock_wait_timeout=@lock_wait_timeout

(we are using @@session already when we actually set it on line 1290)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, but IMO slightly clearer to always use the return var. i.e. not use := here. Behavior is the same though either way.

My take is the opposite, I always find it more comfortable to be explicit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now using explicit @@session. at all places. In no place do I intend to use @@global.lock_wait_timeout. I'm reading the current session value, modify it, restoring it. I don't touch the global setting.

return deferFunc, vterrors.Errorf(vtrpcpb.Code_UNKNOWN, "could not read lock_wait_timeout: %v", err)
}
timeoutSeconds := int64(lockWaitTimeout.Seconds())
setQuery := fmt.Sprintf("set @@session.lock_wait_timeout=%d", timeoutSeconds)
_, err = conn.Exec(ctx, setQuery, 0, false)
if err != nil {
return deferFunc, err
}
deferFunc = func() {
conn.Exec(ctx, "SET lock_wait_timeout=@lock_wait_timeout", 0, false)
}
return deferFunc, nil
}

// newConstraintName generates a new, unique name for a constraint. Our problem is that a MySQL
// constraint's name is unique in the schema (!). And so as we duplicate the original table, we must
// create completely new names for all constraints.
Expand Down
Loading