BerriAI · TeddyAmkie · Oct 7, 2025
diff --git a/docs/my-website/docs/proxy/prometheus.md b/docs/my-website/docs/proxy/prometheus.md
@@ -158,11 +158,21 @@ Use this for LLM API Error monitoring and tracking remaining rate limits and tok
 | `litellm_remaining_tokens_metric`                | Track `x-ratelimit-remaining-tokens` return from LLM API Deployment. Labels: `"model_group", "api_provider", "api_base", "litellm_model_name", "hashed_api_key", "api_key_alias"` |
 
 ### Deployment State 
+
 | Metric Name          | Description                          |
 |----------------------|--------------------------------------|
 | `litellm_deployment_state`             | The state of the deployment: 0 = healthy, 1 = partial outage, 2 = complete outage. Labels: `"litellm_model_name", "model_id", "api_base", "api_provider"` |
 | `litellm_deployment_latency_per_output_token`       | Latency per output token for deployment. Labels: `"litellm_model_name", "model_id", "api_base", "api_provider", "hashed_api_key", "api_key_alias", "team", "team_alias"` |
 
+#### State Transitions
+
+| From State | To State | Trigger Conditions |
+|------------|----------|-------------------|
+| **Healthy (0)** | **Partial Outage (1)** | • Any single API call fails<br/>• Network timeout<br/>• Authentication error (401)<br/>• Rate limit hit (429)<br/>• Server error (5xx)<br/>• Any other exception during API call |
+| **Partial Outage (1)** | **Complete Outage (2)** | • Cooldown logic triggers (multiple failures)<br/>• Rate limiting detected<br/>• High failure rate (>50%)<br/>• Non-retryable errors accumulate |
+| **Partial Outage (1)** | **Healthy (0)** | • Next successful API call<br/>• Deployment recovers from cooldown<br/>• Manual intervention |
+| **Complete Outage (2)** | **Healthy (0)** | • Cooldown TTL expires (default: 5 seconds)<br/>• Successful request after cooldown period<br/>• Manual intervention |
+
 #### Fallback (Failover) Metrics
 
 | Metric Name          | Description                          |

diff --git a/docs/my-website/docs/routing.md b/docs/my-website/docs/routing.md
@@ -1051,6 +1051,19 @@ The router automatically cools down deployments based on the following condition
 
 During cooldown, the specific deployment is temporarily removed from the available pool, while other healthy deployments continue serving requests.
 
+#### Deployment State Lifecycle
+
+```
+🟢 Healthy (0) → 🟡 Partial Outage (1) → 🔴 Complete Outage (2) → 🟢 Healthy (0)
+```
+
+| From State | To State | Concrete Triggers |
+|------------|----------|-------------------|
+| **Healthy (0)** | **Partial Outage (1)** | • Any single API call fails<br/>• Network timeout<br/>• Authentication error (401)<br/>• Rate limit hit (429)<br/>• Server error (5xx) |
+| **Partial Outage (1)** | **Complete Outage (2)** | • >50% failure rate in current minute<br/>• 429 rate limit errors<br/>• Non-retryable errors (401, 404, 408)<br/>• Exceeds allowed fails limit (default: 3) |
+| **Partial Outage (1)** | **Healthy (0)** | • Next successful API call |
+| **Complete Outage (2)** | **Healthy (0)** | • Cooldown TTL expires (default: 5 seconds)<br/>• Successful request after cooldown period |
+
 #### Cooldown Recovery
 
 Deployments automatically recover from cooldown after the cooldown period expires. The router will: