Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-849: To check the cluster replication type based on cluster spec instead of masterdb spec #850

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

viggy28
Copy link

@viggy28 viggy28 commented Oct 6, 2021

Addresses #849

  1. When the primary failed and sync replica was failing and sentinel assigned SR as new primary
2021-10-06T14:44:31.522-0700	WARN	cmd/sentinel.go:276	no keeper info available	{"db": "f9aca2fc", "keeper": "postgres1"}
2021-10-06T14:44:31.522-0700	WARN	cmd/sentinel.go:276	no keeper info available	{"db": "1fe806fe", "keeper": "postgres3"}
2021-10-06T14:44:36.740-0700	WARN	cmd/sentinel.go:276	no keeper info available	{"db": "f9aca2fc", "keeper": "postgres1"}
2021-10-06T14:44:36.740-0700	WARN	cmd/sentinel.go:276	no keeper info available	{"db": "1fe806fe", "keeper": "postgres3"}
2021-10-06T14:44:36.750-0700	INFO	cmd/sentinel.go:995	master db is failed	{"db": "f9aca2fc", "keeper": "postgres1"}
2021-10-06T14:44:36.750-0700	INFO	cmd/sentinel.go:1006	trying to find a new master to replace failed master
2021-10-06T14:44:36.750-0700	INFO	cmd/sentinel.go:1032	electing db as the new master	{"db": "1fe806fe", "keeper": "postgres3"}
2021-10-06T14:44:42.018-0700	WARN	cmd/sentinel.go:276	no keeper info available	{"db": "f9aca2fc", "keeper": "postgres1"}
  1. However SR also failed.
2021-10-06T14:44:47.334-0700	INFO	cmd/sentinel.go:1006	trying to find a new master to replace failed master
2021-10-06T14:44:47.334-0700	WARN	cmd/sentinel.go:1016	cannot choose synchronous standby since there are no common elements between the latest master reported synchronous standbys and the db spec ones	{"reported": [], "spec": ["f9aca2fc"]}
2021-10-06T14:44:47.334-0700	ERROR	cmd/sentinel.go:1035	no eligible masters
2021-10-06T14:44:52.581-0700	WARN	cmd/sentinel.go:276	no keeper info available	{"db": "f9aca2fc", "keeper": "postgres1"}
2021-10-06T14:44:52.581-0700	WARN	cmd/sentinel.go:276	no keeper info available	{"db": "1fe806fe", "keeper": "postgres3"}
  1. Disabled synchronous replication and sentinel picked ASR as the new primary
2021-10-06T14:45:39.779-0700	INFO	cmd/sentinel.go:995	master db is failed	{"db": "1fe806fe", "keeper": "postgres3"}
2021-10-06T14:45:39.779-0700	INFO	cmd/sentinel.go:1001	db not converged	{"db": "1fe806fe", "keeper": "postgres3"}
2021-10-06T14:45:39.779-0700	INFO	cmd/sentinel.go:1006	trying to find a new master to replace failed master
2021-10-06T14:45:39.779-0700	INFO	cmd/sentinel.go:1032	electing db as the new master	{"db": "9665a7da", "keeper": "postgres2"}
2021-10-06T14:45:45.068-0700	WARN	cmd/sentinel.go:276	no keeper info available	{"db": "f9aca2fc", "keeper": "postgres1"}
2021-10-06T14:45:45.068-0700	WARN	cmd/sentinel.go:276	no keeper info available	{"db": "1fe806fe", "keeper": "postgres3"}
2021-10-06T14:45:50.335-0700	WARN	cmd/sentinel.go:276	no keeper info available	{"db": "1fe806fe", "keeper": "postgres3"}
2021-10-06T14:45:50.335-0700	WARN	cmd/sentinel.go:276	no keeper info available	{"db": "f9aca2fc", "keeper": "postgres1"}
2021-10-06T14:45:50.345-0700	INFO	cmd/sentinel.go:1151	removing old master db	{"db": "1fe806fe", "keeper": "postgres3"}
2021-10-06T14:45:50.345-0700	INFO	cmd/sentinel.go:1151	removing old master db	{"db": "f9aca2fc", "keeper": "postgres1"}

@viggy28 viggy28 force-pushed the bug-unhealthy-cluster/gh-849 branch 2 times, most recently from be9619d to 85a5327 Compare October 7, 2021 06:23
@viggy28 viggy28 force-pushed the bug-unhealthy-cluster/gh-849 branch from 85a5327 to 5322008 Compare October 7, 2021 16:14
@viggy28
Copy link
Author

viggy28 commented Oct 7, 2021

integration tests are failing. Not sure why.

a. Also, I am unable to restart it (the only way I can trigger that right now is by pushing a commit)

@sgotti
Copy link
Member

sgotti commented Oct 7, 2021

@viggy28 They're all failing in the same 4 tests cases. So your changes are affecting them in some ways. You should run the specific tests locally to better understand the reason.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants