Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite some TiDB architecture descriptions for clarity #18316

Open
markdonsky opened this issue Jul 21, 2024 · 0 comments
Open

Rewrite some TiDB architecture descriptions for clarity #18316

markdonsky opened this issue Jul 21, 2024 · 0 comments
Assignees

Comments

@markdonsky
Copy link

File: /release-8.1/tidb-architecture.md

I recommend the following updates to make this page easier to understand:

TiDB Architecture:
TiDB offers several advantages over traditional standalone databases:

  • It features a distributed architecture that supports flexible and scalable growth.
  • It is fully compatible with the MySQL protocol, including common features and syntax. This compatibility often allows seamless migration of applications without requiring any code changes.
  • TiDB supports high availability with automatic failover in case of replica failures, ensuring uninterrupted service without impacting applications.
  • It provides ACID transactions, making it suitable for applications requiring strong consistency, such as financial transactions.
    TiDB offers a comprehensive suite of tools for data migration, replication, and backup.
  • As a distributed database, TiDB comprises multiple components that communicate and collaborate to form a cohesive system. The architecture includes:

TiDB server:
The TiDB server operates as a stateless SQL interface layer, presenting an endpoint compatible with the MySQL protocol to external systems. Upon receiving SQL requests, TiDB performs parsing and optimization, culminating in the creation of a distributed execution plan. Designed for horizontal scalability, it offers a unified interface to external clients through load balancing components like TiProxy, Linux Virtual Server (LVS), HAProxy, ProxySQL, or F5.

TiDB itself does not store data; instead, it focuses solely on computation and SQL analysis. It directs actual data read requests to TiKV nodes (or TiFlash nodes), facilitating efficient data retrieval and management within the distributed architecture.

PD Server:
The PD server serves as the central metadata management component for the entire cluster. It maintains metadata regarding the real-time data distribution across each TiKV node, as well as the topology of the entire TiDB cluster. Additionally, it hosts the TiDB Dashboard for management purposes and assigns transaction IDs for distributed transactions. Acting as the "brain" of the TiDB cluster, the PD server not only stores cluster metadata but also directs data scheduling commands to specific TiKV nodes based on real-time data distribution updates from those nodes. To ensure high availability, the PD server is typically deployed across a minimum of three nodes, with an odd number of nodes recommended for optimal resilience.

TiKV:
The TiKV server functions as a data storage solution within the TiDB ecosystem. TiKV operates as a distributed transactional key-value storage engine.

In TiKV, data storage revolves around "Regions," which are fundamental units. Each Region stores data for a specific Key Range, defined as a left-closed and right-open interval from StartKey to EndKey.

Every TiKV node houses multiple Regions. TiKV APIs offer native support for distributed transactions at the key-value pair level and default to Snapshot Isolation level for transaction isolation. This forms the core mechanism enabling TiDB to manage distributed transactions at the SQL layer.

When SQL statements are executed, the TiDB server translates the SQL execution plan into calls to the TiKV API, thereby storing data in TiKV. All data within TiKV is automatically maintained across multiple replicas (typically three replicas by default), ensuring native high availability and supporting automatic failover.

TiFlash:
The TiFlash server is a specialized storage server that differs from typical TiKV nodes. It stores data in a columnar format, primarily optimized for accelerating analytical processing tasks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants