Skip to content

Feature request: HTTP connection pooling for improved performance #734

@fede-kamel

Description

@fede-kamel

Problem

The SDK currently creates new HTTP connections for each API request. Each new connection requires a TCP handshake (~50-100ms) and TLS negotiation (~100-200ms), adding ~150-300ms overhead per request.

For applications making multiple sequential API calls (e.g., chat followed by embed, or multiple embed calls), this overhead adds up.

Proposed Solution

Configure httpx.Limits on the default httpx.Client and httpx.AsyncClient instances to enable connection reuse:

httpx.Limits(
    max_keepalive_connections=20,
    max_connections=100,
    keepalive_expiry=30.0
)

This is a ~16 line change across sync and async clients.

Expected Impact

  • 15-30% reduction in latency for subsequent API calls
  • Reduced server load from fewer connection establishments
  • More predictable response times

Context

We use the Cohere SDK at Oracle for workloads involving multiple sequential API calls. Connection pooling is a standard optimization in HTTP client libraries and httpx supports it natively.

Implementation available in PR #697.

References

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions