Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump keep-alive above default LBs #9749

Merged
merged 1 commit into from
Jan 4, 2025
Merged

Bump keep-alive above default LBs #9749

merged 1 commit into from
Jan 4, 2025

Conversation

spolu
Copy link
Contributor

@spolu spolu commented Jan 3, 2025

Description

Bump keep-alive to very high value as inspired by https://github.com/vercel/next.js/pull/35827/files + above known ALB values.

Risk

Will deploy on front-edge first to test and give it a try. Many potential risks of resource exhaustion. Will monitor and revert accordingly.

Deploy Plan

  • deploy front and monitor

@spolu spolu requested a review from flvndvd January 3, 2025 16:15
@spolu
Copy link
Contributor Author

spolu commented Jan 4, 2025

@devloai review this PR

Copy link
Contributor

@devloai devloai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The keep-alive timeout change looks reasonable but would benefit from better documentation and configurability. I've suggested adding code comments explaining the timeout value choice, documenting monitoring metrics, and making the value configurable via environment variable for easier production adjustments.

@@ -3,7 +3,7 @@
"dev:all": "concurrently --kill-others \"cd ../types/ && npm run start\" \"sleep 20 && cd ../sdks/js/ && npm run start\" \"sleep 22 && next dev\" \"sleep 22 && tsx ./start_worker.ts\"",
"dev": "next dev",
"build": "next build",
"start": "next start --keepAliveTimeout 6000",
"start": "next start --keepAliveTimeout 610000",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The keep-alive timeout is being increased by ~100x (from 6s to 610s). While this matches the strategy from vercel/next.js#35827, we should:

  1. Add a comment in the code explaining why this specific value (610000) was chosen and how it relates to ALB timeouts
  2. Document the metrics we'll be monitoring to detect any resource exhaustion issues (e.g. connection pool metrics, memory usage)
  3. Consider adding an environment variable for this value to make it easier to adjust in production if needed:
Suggested change
"start": "next start --keepAliveTimeout 610000",
"start": "next start --keepAliveTimeout ${NEXT_KEEP_ALIVE_TIMEOUT:-610000}",

@spolu
Copy link
Contributor Author

spolu commented Jan 4, 2025

Giving it a spin after testing on front-edge

@spolu spolu merged commit 4d7a0fc into main Jan 4, 2025
17 of 19 checks passed
@spolu spolu deleted the spolu-massive_keep_alive branch January 4, 2025 19:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant