-
Notifications
You must be signed in to change notification settings - Fork 85
Open
Description
Problem
The SDK currently creates new HTTP connections for each API request. Each new connection requires a TCP handshake (~50-100ms) and TLS negotiation (~100-200ms), adding ~150-300ms overhead per request.
For applications making multiple sequential API calls (e.g., chat followed by embed, or multiple embed calls), this overhead adds up.
Proposed Solution
Configure httpx.Limits on the default httpx.Client and httpx.AsyncClient instances to enable connection reuse:
httpx.Limits(
max_keepalive_connections=20,
max_connections=100,
keepalive_expiry=30.0
)This is a ~16 line change across sync and async clients.
Expected Impact
- 15-30% reduction in latency for subsequent API calls
- Reduced server load from fewer connection establishments
- More predictable response times
Context
We use the Cohere SDK at Oracle for workloads involving multiple sequential API calls. Connection pooling is a standard optimization in HTTP client libraries and httpx supports it natively.
Implementation available in PR #697.
References
- httpx connection pooling docs: https://www.python-httpx.org/advanced/#pool-limit-configuration
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels