MySQL Orchestrator/HAProxy/VIP sandbox

This sandbox is designed to safely experiment with High Availability solution based on MySQL Orchestrator, HAProxy and virtual IP (VIP).

The sandbox uses Docker and Docker Compose to up and down MySQL cluster and all its dependencies.

The sandbox has next configuration:

MySQL cluster (1 master, N replicas). MySQL version - 5.7.
MySQL orchestrator cluster using raft consensus (3 nodes).
HAProxy instance which used to spread SQL read queries across replicas.
Simple Bash script to emulate read/write queries.

To down and up network interfaces SSH key is added to every MySQL nodes.

Access to orchestrator is restricted via multi mode:

user with read/write privileges: dba_team:time_for_dinner,
user with only read privileges: readonly:any

MySQL Orchestrator

I recommend to clone hacked orchestrator from my own repository:

# Clone Git repository
cd <work-dir>
git clone https://github.com/pparshin/orchestrator.git orchestrator/source

# Build Docker image
cd orchestrator/source
docker build . -f docker/Dockerfile -t orchestrator:latest

VIP (virtual IP)

172.20.0.200 - master
172.20.0.2xx - replica xx (1..20)

HAProxy

Administrative section is available on http://localhost:8080/stats

How to discover the cluster topology by orchestrator

Use web UI or a command line:

make discover

How to print the current cluster topology:

make orchestrator-client c="-c topology -i db_test cli"

Configure and run the sandbox

Build and up all containers:

make clean
make up
# Where n is a total number of replicas, max 20.
make scale n=10

Load SQL schema and discover the topology:

make load_schema
make discover

Wait a few seconds and you can play with cluster as you would like, e.g.:

# Emulate read/write queries in the backgroung or separate terminal 
make try_sql n=200
# Block the master using iptables (mimic a network partitioning)
make node_drop n=orchestrator-sandbox_node1_1
# Or completely stop the master
docker-compose stop node1

You should expect that orchestrator detects master downtime and start the failover process. A new master will be elected and topology will be rebuilt after 20-30 seconds.

FAQ

How to get back an old master to a new cluster after failover?

Run next commands on old master:

SET @@GLOBAL.READ_ONLY=1;
SET @@GLOBAL.SLAVE_NET_TIMEOUT=4; 
CHANGE MASTER TO MASTER_HOST='172.20.0.200', MASTER_PORT=3306, MASTER_USER='repl', MASTER_PASSWORD='repl', MASTER_AUTO_POSITION=1;
CHANGE MASTER TO MASTER_CONNECT_RETRY=1, MASTER_RETRY_COUNT=86400, MASTER_HEARTBEAT_PERIOD=2;
START SLAVE;

Update the orchestrator:

# -i - old master, -d - new master
ORCHESTRATOR_API="http://127.0.0.1:80/api" scripts/orchestrator-client -b "dba_team:time_for_dinner" -c relocate -i 172.20.0.11 -d 172.20.0.12

Which options might affect on time to detect a problem and recover a master?

Read (really read) the orchestrator documentation.

Read MySQL documentation carefully and pay attention to:

Note that a change to the value or default setting of slave_net_timeout does not automatically change the heartbeat interval, whether that has been set explicitly or is using a previously calculated default. ... If slave_net_timeout is changed, you must also issue CHANGE MASTER TO to adjust the heartbeat interval to an appropriate value so that the heartbeat signal occurs before the connection timeout.

How to get current value of MASTER_HEARTBEAT_PERIOD:

SELECT HEARTBEAT_INTERVAL FROM performance_schema.replication_connection_configuration;

How to update slave_net_timeout:

STOP SLAVE; 
SET @@GLOBAL.SLAVE_NET_TIMEOUT=4; 
START SLAVE;

How to check the current connection status between replica and master:

SELECT * FROM performance_schema.replication_connection_status\G;"

Play with next orchestrator options:

DelayMasterPromotionIfSQLThreadNotUpToDate,
InstancePollSeconds,
RecoveryPollSeconds. Basically, it is a hardcoded constant which defines how often to run topology analyze.

Each node of cluster is probed once every InstancePollSeconds seconds. If a problem is detected, orchestrator forcefully updates cluster topology and decides to execute recovery or not.

What the meaning of cluster_domain from meta table?

Has no direct influence on orchestrator and how it performs the recovery.

It might be useful in hooks to implement your own logic:

as a placeholder {failureClusterDomain},
as a environment variable ORC_FAILURE_CLUSTER_DOMAIN (topology_recovery.go#L314).

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
haproxy		haproxy
mysql		mysql
orchestrator		orchestrator
replicator		replicator
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.repl.yaml		docker-compose.repl.yaml
docker-compose.yaml		docker-compose.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MySQL Orchestrator/HAProxy/VIP sandbox

MySQL Orchestrator

VIP (virtual IP)

HAProxy

How to discover the cluster topology by orchestrator

Configure and run the sandbox

FAQ

How to get back an old master to a new cluster after failover?

Which options might affect on time to detect a problem and recover a master?

What the meaning of cluster_domain from meta table?

About

Releases

Packages

Languages

License

pparshin/mysql-ha-sandbox

Folders and files

Latest commit

History

Repository files navigation

MySQL Orchestrator/HAProxy/VIP sandbox

MySQL Orchestrator

VIP (virtual IP)

HAProxy

How to discover the cluster topology by orchestrator

Configure and run the sandbox

FAQ

How to get back an old master to a new cluster after failover?

Which options might affect on time to detect a problem and recover a master?

What the meaning of cluster_domain from meta table?

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages