Skip to content

Admin Routstr provider balance check returns 500 on upstream ConnectTimeout #399

@shroominic

Description

@shroominic

Summary

GET /admin/api/upstream-providers/{provider_id}/balance can raise an unhandled httpx.ConnectTimeout when the upstream provider is another Routstr node and that node is slow or unreachable.

Instead of returning a controlled upstream error, the request falls through to the general exception handler and becomes a 500.

Affected versions

  • Reproduced in v0.3.0
  • Still present on branch v0.4.0 as of 2026-03-13

Reproduction

  1. Configure an upstream provider with provider_type == "routstr"
  2. Point base_url at a Routstr node that is slow, unreachable, or timing out on TCP connect
  3. Call GET /admin/api/upstream-providers/{provider_id}/balance
  4. Observe httpx.ConnectTimeout and a 500 response

Current behavior

The admin route special-cases Routstr providers and calls the upstream node directly:

  • routstr/core/admin.py lines 1025-1046 on current v0.4.0
  • raw httpx.AsyncClient().get(f"{clean_url}/v1/balance/info", ...)
  • no try/except for httpx.TimeoutException / httpx.RequestError

This produces logs like:

  • error_type: ConnectTimeout
  • path: /admin/api/upstream-providers/4/balance

Expected behavior

The admin balance endpoint should not crash on upstream timeout.

It should either:

  • return a controlled 502/504 with a useful error message, or
  • return { "ok": false, "balance_data": null } consistently for unreachable upstreams

but it should not become an unhandled 500.

Additional context

There is already a RoutstrUpstreamProvider.get_balance() implementation in routstr/upstream/routstr.py that wraps the request and returns None on failure. The admin route currently bypasses that abstraction and duplicates the HTTP call.

There was a related v0.4.0 change in commit 4643aef ("no auth for routstr node to check and topup balance"), but that only makes the auth header optional. It does not address timeout handling.

There are also related timeout issues/fixes elsewhere in the repo:

Those do not fix this admin balance path.

Possible fix direction

  1. Remove the Routstr-specific raw HTTP branch from get_provider_balance
  2. Instantiate the provider via _instantiate_provider(provider) for all provider types
  3. Let RoutstrUpstreamProvider.get_balance() own the Routstr balance fetch behavior
  4. Normalize failures into a controlled API response or explicit HTTPException
  5. Add regression tests for timeout and non-200 upstream responses

Secondary observation

The error log for this path may show request_id: no-request-id even though general_exception_handler() passes request_id in extra. RequestIdFilter appears to overwrite record.request_id from the context var fallback in routstr/core/logging.py. That makes debugging harder, but it is separate from the 500 timeout bug.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions