Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add heartbeat endpoint & tests #881

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

brian-lou
Copy link
Member

This PR adds an cron job endpoint that, when triggered, sends a heartbeat message to a Slack feed channel.

request: FastifyRequest,
reply: FastifyReply
) {
const verified = await handleCronAuth(request, reply);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this framework not provide a native way to authenticate requests rather than hoping every developer remembers to add authentication in their application code in every endpoint ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The auth for jobs are handled via cloud scheduler (managed by terraform), so developers don't need to worry about it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The webhooks do currently require auth for each endpoint, but I'm not sure if there is a way around that from a security perspective, since traffic is open to the internet

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant was whether the web framework we use supports that rather than implementing auth inside the application logic.

Comment on lines 8 to 11
await bolt.client.chat.postMessage({
channel: INFRA_HUB_HEARTBEAT_CHANNEL,
text: 'Infra Hub is up',
});

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the goal of sending this message?
Is it about testing the Slack integration or are people supposed to notice these messages are not coming through ? I would push back against the second.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's for the second - Alex mentioned that we should have some sort of heartbeat to notice when it's down (since other important parts use eng-pipes, such as gocd deployment notifs in slack)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nobody will catch that. There has to be a datadog alert, so you should probably record a datadog metric or an event and build an alert on that rather than relying on slack and hoping people pay attention to the channel.

Copy link

@fpacifici fpacifici left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see my comment on the slack message. That is not going to be a robust way to alert people that the system is down. You want a proper datadog alert with pagerduty attached instead.

@brian-lou brian-lou changed the base branch from brian/uptime-heartbeat to main October 17, 2024 22:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants