Skip to content
This repository has been archived by the owner on May 28, 2022. It is now read-only.

refactor: isolate agent channel failures #161

Open
luketchang opened this issue Feb 4, 2022 · 0 comments
Open

refactor: isolate agent channel failures #161

luketchang opened this issue Feb 4, 2022 · 0 comments

Comments

@luketchang
Copy link
Collaborator

Hub agents are pushing messages from home to all the other chain's replicas. This makes the hub agents dependent on the faultiest channel (e.g. worst RPC) and they will fail in entirety if one channel fails (e.g. moonbase RPC failure will cause rinkeby --> kovan to also stop). We want to isolate each channel's tasks so that other channels can continue running if one fails.

Stop canceling agent tasks if one channel task fails. If one fails, emit an error message and retry (maybe with exponential retry).

Make sure we can see in Grafana if a channel has stopped and that we have alerts for each channel to notify us.

@kekonen kekonen self-assigned this Feb 4, 2022
@luketchang luketchang changed the title refactor: isolate processor channel failures refactor: isolate agent channel failures Feb 4, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants