diff --git a/docs/my-website/docs/proxy/prometheus.md b/docs/my-website/docs/proxy/prometheus.md
index f3c2f2e37d62..8bb861613816 100644
--- a/docs/my-website/docs/proxy/prometheus.md
+++ b/docs/my-website/docs/proxy/prometheus.md
@@ -158,11 +158,21 @@ Use this for LLM API Error monitoring and tracking remaining rate limits and tok
| `litellm_remaining_tokens_metric` | Track `x-ratelimit-remaining-tokens` return from LLM API Deployment. Labels: `"model_group", "api_provider", "api_base", "litellm_model_name", "hashed_api_key", "api_key_alias"` |
### Deployment State
+
| Metric Name | Description |
|----------------------|--------------------------------------|
| `litellm_deployment_state` | The state of the deployment: 0 = healthy, 1 = partial outage, 2 = complete outage. Labels: `"litellm_model_name", "model_id", "api_base", "api_provider"` |
| `litellm_deployment_latency_per_output_token` | Latency per output token for deployment. Labels: `"litellm_model_name", "model_id", "api_base", "api_provider", "hashed_api_key", "api_key_alias", "team", "team_alias"` |
+#### State Transitions
+
+| From State | To State | Trigger Conditions |
+|------------|----------|-------------------|
+| **Healthy (0)** | **Partial Outage (1)** | • Any single API call fails
• Network timeout
• Authentication error (401)
• Rate limit hit (429)
• Server error (5xx)
• Any other exception during API call |
+| **Partial Outage (1)** | **Complete Outage (2)** | • Cooldown logic triggers (multiple failures)
• Rate limiting detected
• High failure rate (>50%)
• Non-retryable errors accumulate |
+| **Partial Outage (1)** | **Healthy (0)** | • Next successful API call
• Deployment recovers from cooldown
• Manual intervention |
+| **Complete Outage (2)** | **Healthy (0)** | • Cooldown TTL expires (default: 5 seconds)
• Successful request after cooldown period
• Manual intervention |
+
#### Fallback (Failover) Metrics
| Metric Name | Description |
diff --git a/docs/my-website/docs/routing.md b/docs/my-website/docs/routing.md
index 971427806ed0..9e0e2abc928e 100644
--- a/docs/my-website/docs/routing.md
+++ b/docs/my-website/docs/routing.md
@@ -1051,6 +1051,19 @@ The router automatically cools down deployments based on the following condition
During cooldown, the specific deployment is temporarily removed from the available pool, while other healthy deployments continue serving requests.
+#### Deployment State Lifecycle
+
+```
+🟢 Healthy (0) → 🟡 Partial Outage (1) → 🔴 Complete Outage (2) → 🟢 Healthy (0)
+```
+
+| From State | To State | Concrete Triggers |
+|------------|----------|-------------------|
+| **Healthy (0)** | **Partial Outage (1)** | • Any single API call fails
• Network timeout
• Authentication error (401)
• Rate limit hit (429)
• Server error (5xx) |
+| **Partial Outage (1)** | **Complete Outage (2)** | • >50% failure rate in current minute
• 429 rate limit errors
• Non-retryable errors (401, 404, 408)
• Exceeds allowed fails limit (default: 3) |
+| **Partial Outage (1)** | **Healthy (0)** | • Next successful API call |
+| **Complete Outage (2)** | **Healthy (0)** | • Cooldown TTL expires (default: 5 seconds)
• Successful request after cooldown period |
+
#### Cooldown Recovery
Deployments automatically recover from cooldown after the cooldown period expires. The router will: