Skip to content

60s cooldown on concurrent calls #11

@BP602

Description

@BP602

When making two streaming requests at the same time with subscription auth, the second one waits about 60 seconds after the first one finishes before it even starts.

Reproduction

Concurrent requests (shows the problem):

# Fire off two streaming requests simultaneously
time curl -N -X POST http://localhost:3456/messages \
  -d '{"model":"claude-sonnet-4","messages":[{"role":"user","content":"test"}],"stream":true}' &

time curl -N -X POST http://localhost:3456/messages \
  -d '{"model":"claude-haiku-4","messages":[{"role":"user","content":"test"}],"stream":true}'

# Result:
# First request:  ~4 seconds
# Second request: ~63 seconds (waits 60s after first completes)

Sequential requests (works fine):

# Run them one after another
time curl -N -X POST http://localhost:3456/messages \
  -d '{"model":"claude-sonnet-4","messages":[{"role":"user","content":"test"}],"stream":true}'

time curl -N -X POST http://localhost:3456/messages \
  -d '{"model":"claude-haiku-4","messages":[{"role":"user","content":"test"}],"stream":true}'

# Result:
# First request:  ~4 seconds
# Second request: ~5 seconds
# Total: ~9 seconds

Why this matters

OpenCode makes 2 concurrent requests on every message - one for the actual response, one for generating a session title using Haiku. This means every new message has a 60 second wait if you're using this project.

Solution for OpenCode users

Set a different provider's model for title generation in your config:

{
  "small_model": "zai-coding-plan/glm-4.7-flash"
}

See: https://docs.opencode.sh/configuration/small-model

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions