Skip to content

Conversation

@mfittko
Copy link

@mfittko mfittko commented Sep 29, 2025

Title

Fix: Resolve Postgres Deadlocks in Daily Spend Updates

Relevant issues

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • I have added a screenshot of my new test passing locally
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem

Type

🐛 Bug Fix
🧹 Refactoring

Changes

This PR addresses recurring Postgres deadlocks (40P01) during daily spend updates.

Root Cause:

  1. Lock-order Inversion: Multiple workers concurrently batch-update the same daily spend rows, but acquire locks in different key orders, leading to deadlocks.
  2. Ineffective Pod Lock: The PodLockManager was not utilizing Redis, allowing multiple pods to concurrently commit to the DB even when the Redis transaction buffer was enabled, bypassing intended serialization.

Fix Implemented:

  1. Enable Cross-Pod Locking: DBSpendUpdateWriter now passes redis_cache to PodLockManager, ensuring the pod-level leader lock is effective and only one leader pod applies DB commits.
  2. Deterministic Ordering: Daily transaction keys are now sorted before batching in _update_daily_spend. This ensures all concurrent updaters acquire row locks in the same order, preventing lock-order inversion.
  3. Deadlock-Aware Retry: Implemented retry logic with exponential backoff and jitter for Postgres 40P01 deadlock errors within _update_daily_spend. This handles transient deadlocks gracefully.
  4. Deadlock Detection Helper: Added is_deadlock_error to exception_handler.py for robust detection of Postgres deadlock errors.

Slack Thread

Open in Cursor Open in Web

Co-authored-by: github <github@mfittko.com>
@cursor
Copy link

cursor bot commented Sep 29, 2025

Cursor Agent can help with this pull request. Just @cursor in comments and I'll start working on changes in this branch.
Learn more about Cursor Agents

@mfittko mfittko self-assigned this Sep 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants