title | summary | category |
---|---|---|
TiDB Data Migration Glossary |
Learn the terms used in TiDB Data Migration. |
glossary |
This document lists the terms used in the logs, monitoring, configurations, and documentation of TiDB Data Migration (DM).
In TiDB DM, binlogs refer to the binary log files generated in the TiDB database. It has the same indications as that in MySQL or MariaDB. Refer to MySQL Binary Log and MariaDB Binary Log for details.
Binlog events are information about data modification made to a MySQL or MariaDB server instance. These binlog events are stored in the binlog files. Refer to MySQL Binlog Event and MariaDB Binlog Event for details.
Binlog event filter is a more fine-grained filtering feature than the black and white lists filtering rule. Refer to binlog event filter for details.
The binlog position is the offset information of a binlog event in a binlog file. Refer to MySQL SHOW BINLOG EVENTS
and MariaDB SHOW BINLOG EVENTS
for details.
Binlog replication processing unit is the processing unit used in DM-worker to read upstream binlogs or local relay logs, and to replicate these logs to the downstream. Each subtask corresponds to a binlog replication processing unit. In the current documentation, the binlog replication processing unit is also referred to as the sync processing unit.
Black & white table list is the feature that filters or only replicates all operations of some databases or some tables. Refer to black & white table lists for details. This feature is similar to MySQL Replication Filtering and MariaDB Replication Filters.
A checkpoint indicates the position from which a full data import or an incremental replication task is paused and resumed, or is stopped and restarted.
- In a full import task, a checkpoint corresponds to the offset and other information of the successfully imported data in a file that is being imported. A checkpoint is updated synchronously with the data import task.
- In an incremental replication, a checkpoint corresponds to the binlog position and other information of a binlog event that is successfully parsed and replicated to the downstream. A checkpoint is updated after the DDL operation is successfully replicated or 30 seconds after the last update.
In addition, the relay.meta
information corresponding to a relay processing unit works similarly to a checkpoint. A relay processing unit pulls the binlog event from the upstream and writes this event to the relay log, and writes the binlog position or the GTID information corresponding to this event to relay.meta
.
The dump processing unit is the processing unit used in DM-worker to export all data from the upstream. Each subtask corresponds to a dump processing unit.
The GTID is the global transaction ID of MySQL or MariaDB. With this feature enabled, the GTID information is recorded in the binlog files. Multiple GTIDs form a GTID set. Refer to MySQL GTID Format and Storage and MariaDB Global Transaction ID for details.
The heartbeat is a mechanism that calculates the delay from the time data is written in the upstream to the time data is processed by the binlog replication processing unit. Refer to replication delay monitoring for details.
The load processing unit is the processing unit used in DM-worker to import the fully exported data to the downstream. Each subtask corresponds to a load processing unit. In the current documentation, the load processing unit is also referred to as the import processing unit.
The relay log refers to the binlog files that DM-worker pulls from the upstream MySQL or MariaDB, and stores in the local disk. The format of the relay log is the standard binlog file, which can be parsed by tools such as mysqlbinlog of a compatible version.
For more details such as the relay log's directory structure, initial replication rules, and data purge in TiDB DM, see TiDB DM relay log.
The relay processing unit is the processing unit used in DM-worker to pull binlog files from the upstream and write data into relay logs. Each DM-worker instance has only one relay processing unit.
Safe mode is the mode in which DML statements can be imported more than once when the primary key or unique index exists in the table schema.
In this mode, some statements from the upstream are replicated to the downstream only after they are re-written. The INSERT
statement is re-written as REPLACE
; the UPDATE
statement is re-written as DELETE
and REPLACE
. TiDB DM automatically enables the safe mode within 5 minutes after the replication task is started or resumed. You can manually enable the mode by modifying the safe-mode
parameter in the task configuration file.
The shard DDL is the DDL statement that is executed on the upstream sharded tables. It needs to be coordinated and migrated by TiDB DM in the process of merging the sharded tables. In the current documentation, the shard DDL is also referred to as the sharding DDL.
The shard DDL lock is the lock mechanism that coordinates the replication of shard DDL. Refer to the implementation principles of merging and replicating data from sharded tables for details. In the current documentation, the shard DDL lock is also referred to as the sharding DDL lock.
A shard group is all the upstream sharded tables to be merged and replicated to the same table in the downstream. Two-level shard groups are used for implementation of TiDB DM. Refer to the implementation principles of merging and replicating sharded tables for details. In the current documentation, the shard group is also referred to as the sharding group.
The subtask is a part of a data replication task that is running on each DM-worker instance. In different task configurations, a single data replication task might have one subtask or multiple subtasks.
The subtask status is the status of a data replication subtask. The current status options include New
, Running
, Paused
, Stopped
, and Finished
. Refer to subtask status for more details about the status of a data replication task or subtask.
The table routing feature enables DM to replicate a certain table of the upstream MySQL or MariaDB instance to the specified table in the downstream, which can be used to merge and replicate sharded tables. Refer to table routing for details.
The data replication task, which is started after you successfully execute a start-task
command. In different task configurations, a single replication task can run on a single DM-worker instance or on multiple DM-worker instances at the same time.
The task status refers to the status of a data replication task. The task status depends on the statuses of all its subtasks. Refer to subtask status for details.