-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update HA page: unresponsive endpoint detection and node failure fixes #2878
base: main
Are you sure you want to change the base?
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
|
||
If a compute endpoint is in a degraded state (repeatedly crashing and restarting rather than failing outright), we will detect and reattach it automatically, typically within 5 minutes. During this time, your application may experience intermittent connectivity. | ||
|
||
#### Node failures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as above.
| VM failure | Brief interruption | VM recreation and endpoint reattachment | Seconds | | ||
| Degraded endpoint | Possible intermittent connectivity | Automatic detection and reattachment | Up to 5 minutes | | ||
| Node failure | Compute unavailable | Rescheduling to healthy nodes | ~2 minutes | | ||
|
||
### Impact on session data after a failure? | ||
|
||
While your application should handle reconnections automatically, session-specific data like temporary tables, prepared statements, and the Local File Cache ([LFC](/docs/reference/glossary#local-file-cache)), which stores frequently accessed data, will not persist across a failover. As a result, queries may initially run more slowly until the Postgres memory buffers and cache are rebuilt. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use "failover" above. Should we use "recovery"? For the sake of argument, why can't we call this section "Compute failover" as some have suggested and explain that failover in Neon's serverless architecture is a little different from traditional failover.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's try that!
need confirmation of regions that support multi-AZ
…ebsite into bgrenon-ha-update
Co-authored-by: Daniel <10074684+danieltprice@users.noreply.github.com>
…ebsite into bgrenon-ha-update
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's get approval from someone in Development before posting. Maybe Vadim or Alexey.
…ebsite into bgrenon-ha-update
This page was updated mostly to add "compute failover' sections for:
Preview: https://neon-next-git-bgrenon-ha-update-neondatabase.vercel.app/docs/introduction/high-availability#compute-failover