-
Notifications
You must be signed in to change notification settings - Fork 217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Additional service checks for ingestion server health endpoint #3590
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM, it would be nice to add unit tests, though.
I know I opened the issue... but I was initially a little sceptical about whether this is actually needed. We never ended up using the API healthcheck features. But I've got the answer wrong, really. After adding these, we should go back and actually start using the API and these ones somewhere, maybe in the load balancer healthcheck for the API, and we can update the ingestion server healthcheck playbook to use these new parameters (not sure where else we actually use the ingestion server healthcheck).
Based on the medium urgency of this PR, the following reviewers are being gently reminded to review this PR: @obulat Excluding weekend1 days, this PR was ready for review 13 day(s) ago. PRs labelled with medium urgency are expected to be reviewed within 4 weekday(s)2. @AetherUnbound, if this PR is not ready for a review, please draft it to prevent reviewers from getting further unnecessary pings. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Functionally LGTM. In terms of the schema of the response, status
as a collection of message strings feels a bit unstructured. This is my suggestion for a bit more structured (albeit verbose) way of conveying the info.
interface Response {
status: "200 OK" | "503 Service Unavailable"
dependencies: { // only if `check_deps=true`
es: {
isOk: boolean
messages: string[]
},
// ...
}
}
327dbbf
to
72263b3
Compare
Thanks for the suggestion @dhruvkb! I updated the response to the following, since the presence of the dependency in the response should indicate whether or not it's OK: interface Response {
status: "200 OK" | "503 Service Unavailable"
dependencies: { // only if `check_deps=true`
es: string[], // only if es problems
db: string[] // only if db problems
}
} |
Fixes
Fixes #2019 by @sarayourfriend
Description
This PR adds checks for Elasticsearch and both databases to the ingestion server health check endpoint behind the
check_deps
query parameter. This also refactors the database connection into adb_helpers
module similar to thees_helpers
one that already exists. This helped simplify some of the code and centralize where the database connection logic was being defined.The failed response has a
status
field with a list of possible failures due to the number of upstream services to be checked. If there's a better way to structure that, let me know!Testing Instructions
just i
to start the stackcurl -L http://localhost:50281/
and verify the output is{"status": "200 OK"}
curl -L http://localhost:50281/?check_deps=true
and verify the output is{"status": "200 OK"}
docker stop openverse-db-1 openverse-es-1 openverse-upstream_db-1
to simulate the resources becoming unavailablecurl -L http://localhost:50281/
and verify the output is{"status": "200 OK"}
(unaffected)curl -L http://localhost:50281/?check_deps=true
and verify the output is{"status": "503 Service Unavailable", "dependencies": {"es": ["Elasticsearch could not be reached"], "db": ["Database connection for 'upstream' could not be established", "Database connection for 'api' could not be established"]}}
(error messages shown)Checklist
Update index.md
).main
) or a parent feature branch.Developer Certificate of Origin
Developer Certificate of Origin