Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix Replication.Supervisor's strategoy and minimal durable-slot/back-pressure documentation in README #70

Merged
merged 2 commits into from
Jul 9, 2024

Conversation

DaemonSnake
Copy link
Contributor

@DaemonSnake DaemonSnake commented Jul 9, 2024

This PR fixes the supervision strategy of Replication.Supervisor.
We had an issue in production where we lost all events for multiple hours because the
Replication.Publisher crashed, restarted and discarded all events until we forced a full restart.
This is actually expect as Postgres only sends the Relations/Types/etc. messages when the replication connection
is started or on an alter on a specific table.
To fix this instead we changed the replication strategy from one_for_one to one_for_all.

Also update of the README to explain minimally the durable_slot and message_middleware configuration options.

Currently the supervision strategy for Replication.Supervisor is one_for_one.
This is an issue for the following reasons:

if Publisher crashes:
  We lost the current state.
  This means that until Postgres decides to send us all the needed Relations and Types messages again,
  we won't be able to decode any events from the Server.
  In the mid time everything would look ok but all events would get discarded.
  The only way to guarantee to get those back is to restart the Server.

if Server crashes:
  The replication will restart at restart_lsn.
  All events from then to the LSN at which the Server crashed will get replayed.
  The means that the message inbox of the Publisher will become potentially inconsistent
  and will likely contain duplicate messages.
  If this is undesirable, one_for_all is required otherwise rest_for_one is fine.
@cpursley cpursley merged commit 459f7af into cpursley:master Jul 9, 2024
1 check passed
@cpursley
Copy link
Owner

cpursley commented Jul 9, 2024

Thank you! Just merged and pushed out a new release.

Again, thank you for all the help on this.

@DaemonSnake DaemonSnake deleted the back-pressure-doc branch July 9, 2024 14:47
@DaemonSnake
Copy link
Contributor Author

woohoo ^^
thanks @cpursley

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants