Skip to content

Conversation

lazzyfu
Copy link

@lazzyfu lazzyfu commented Jun 19, 2023

Hi
My English is so-so, I use Google translate, sorry, Thank you.

Issue

Triggering conditions

  • PreventCrossRegionMasterFailover = true
  • PreventCrossDataCenterMasterFailover = true

Satisfy any one or enable both

trigger timing

not necessarily,Once there, the impact is very serious

How to reproduce

  1. When ORC scans the instance status, the Master node is normal at this time, and ORC will mark the current Master node instanceFound=true
  2. The master node is suddenly shut down or other unreachable faults
  3. At this time, ORC will continue to execute DetectRegionQuery, DetectDataCenterQuery and other operations (non-matching configuration file regular part)
  4. The master node has been down, so naturally the results cannot be obtained
  5. Update the null value to the table orchestrator.database_instance, you can see that the Master node region is empty
    20230619-112525
  6. If Failover is performed at this time, the value of analysisEntry.AnalyzedInstanceRegion will be empty, causing the region or datacenter verification to fail, and the failover will fail.

Steps to reproduce

topology

20230619-113005

debug code

go/inst/instance_dao.go

Add a for loop under instanceFound = true (you can take as many seconds as you want, don’t be too big, if it is too big, the detection will be too slow, and the effect will appear slowly)

image

Reboot Orchestrator

go run go/cmd/orchestrator/main.go -config conf/orchestrator.conf.json  -debug http

Shutdown the Master node

Here you need to choose a good timing. The timing is the 5-second logic of debugging the code. The shutdown command should be executed during the 5-second period of the Loop.

image

observe topology

At this time, the topology restoration fails, and the restoration failure will form a cascade topology.

image

Observe the Recovery log

image
image
image

Observe the records of the Orchestrator table

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant