Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Call disconnect on protocol when reconnecting in Replication connection #726

Merged

Conversation

DohanKim
Copy link
Contributor

@DohanKim DohanKim commented Jan 22, 2025

Problems

Worst Case Scenario

If Postgres terminates a replication connection process due to an OOM issue or a network error:

  • Postgrex creates a new ReplicationConnection without closing the previous one.
  • This repeats up to 10 times. (with auto_reconnect option)
  • The max_wal_senders limit (default: 10) is reached, preventing new ReplicationConnection from being created.
  • The system enters a silent infinite loop of reconnecting without any logs.

This issue has occurred in my production app during periods of network instability.

Screenshot 2025-01-22 at 6 39 53 PM

Commit Changes

  • Invoke disconnect on the protocol when reconnecting to a ReplicationConnection.
  • Log an error message when a ReplicationConnection fails to establish a connection.

@DohanKim DohanKim changed the title Fix/replication connection reconnect Call disconnect on protocol when reconnecting in Replication connection Jan 22, 2025
test/.DS_Store Outdated Show resolved Hide resolved
def handle_event(:internal, {:connect, _info}, @state, %{state: {mod, mod_state}} = s) do
case Protocol.connect(opts()) do
{:ok, protocol} ->
maybe_handle(mod, :handle_connect, [mod_state], %{s | protocol: protocol})

{:error, reason} ->
Logger.error(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please also add this exception to SImpleConnection for consistency?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

@DohanKim DohanKim force-pushed the fix/replication-connection-reconnect branch from 19ab39b to c162d73 Compare January 22, 2025 11:14
@DohanKim DohanKim force-pushed the fix/replication-connection-reconnect branch from c162d73 to 5afa8c9 Compare January 22, 2025 11:18
@josevalim josevalim merged commit a6f2020 into elixir-ecto:master Jan 22, 2025
9 checks passed
@josevalim
Copy link
Member

💚 💙 💜 💛 ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants