Skip to content

Comments

feat: add uvicorn workers and connection backpressure for scalability#358

Draft
Matovidlo wants to merge 2 commits intomainfrom
devin/1768823690-scalability-workers-backpressure
Draft

feat: add uvicorn workers and connection backpressure for scalability#358
Matovidlo wants to merge 2 commits intomainfrom
devin/1768823690-scalability-workers-backpressure

Conversation

@Matovidlo
Copy link
Contributor

@Matovidlo Matovidlo commented Jan 19, 2026

Description

Link to Devin run: https://app.devin.ai/sessions/58bff70f3347474283d6dc04fa2d4783
Requested by: Martin Vasko (@Matovidlo)

Change Type

  • Major (breaking changes, significant new features)
  • Minor (new features, enhancements, backward compatible)
  • Patch (bug fixes, small improvements, no new features)

Summary

This PR addresses the primary scalability bottleneck where Python's asyncio event loop runs on a single thread. With many concurrent SSE connections (e.g., 1000+), every new request competes for the same event loop, causing simple operations to work but complex ones (like tools/list) to timeout.

Changes:

  1. Uvicorn Workers (--workers CLI argument, default: 1)

    • Each worker runs its own asyncio event loop, distributing load across CPU cores
    • Note: SSE connections are stateful, so sticky sessions at the load balancer are required when using multiple workers
  2. Connection Limits with Backpressure (--max-connections CLI argument, default: 1000)

    • New ConnectionMetrics class tracks active connections thread-safely
    • ConnectionLimitMiddleware returns HTTP 503 when at capacity
    • Prevents degradation for existing connections by rejecting new ones early
    • Health check (/health-check) and info (/) endpoints are excluded from tracking

Key files:

  • src/keboola_mcp_server/connections.py - New module with connection tracking and middleware
  • src/keboola_mcp_server/cli.py - CLI arguments and middleware integration

Human Review Checklist

  • IMPORTANT: Verify workers parameter works correctly with uvicorn.Server.serve() - typically multiple workers are spawned via uvicorn CLI, not the programmatic API. This may require using uvicorn.run() instead or a different approach.
  • Review thread-safety of ConnectionMetrics using threading.Lock in asyncio context
  • Confirm 503 response format is appropriate for MCP clients
  • Consider if default max_connections=1000 is appropriate for production
  • Note: No unit tests added for the new module - consider if tests should be required

Testing

  • Tested with Cursor AI desktop (Streamable-HTTP transports)

Optional testing

  • Tested with Cursor AI desktop (all transports)
  • Tested with claude.ai web and canary-orion MCP (SSE and Streamable-HTTP)
  • Tested with In Platform Agent on canary-orion
  • Tested with RO chat on canary-orion

Checklist

  • Self-review completed
  • Unit tests added/updated (if applicable)
  • Integration tests added/updated (if applicable)
  • Project version bumped according to the change type (if applicable)
  • Documentation updated (if applicable)

Release Notes

Justification, description

Adds scalability improvements for high-concurrency scenarios with new --workers and --max-connections CLI options for HTTP-based transports.

Plans for Customer Communication

N/A

Impact Analysis

Low risk - new optional CLI arguments with sensible defaults (workers=1, max-connections=1000). Existing behavior unchanged unless explicitly configured.

Deployment Plan

N/A

Rollback Plan

N/A

Post-Release Support Plan

N/A

This addresses the primary bottleneck where Python's asyncio event loop
runs on a single thread. With many concurrent SSE connections (e.g., 1000+),
every new request competes for the same event loop, causing simple operations
to work but complex ones (like tools/list) to timeout.

Solution 2: Uvicorn Workers
- Add --workers CLI argument (default: 1) to run multiple event loops in parallel
- Each worker process handles its own set of connections
- Note: SSE connections are stateful, so sticky sessions at the load balancer
  are required when using multiple workers

Solution 5: Connection Limits with Backpressure
- Add --max-connections CLI argument (default: 1000) per worker
- New ConnectionMetrics class tracks active connections thread-safely
- ConnectionLimitMiddleware returns HTTP 503 when at capacity
- This prevents degradation for existing connections by rejecting new ones
  rather than allowing the event loop to become overloaded
- Health check and info endpoints are excluded from connection tracking

Co-Authored-By: Martin Vasko <Matovidlo2@gmail.com>
@devin-ai-integration
Copy link
Contributor

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

Co-Authored-By: Martin Vasko <Matovidlo2@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant