Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
9779303
docs(hivemind): add comprehensive HiveMind orchestration plan and tas…
Mirrowel Nov 19, 2025
20e0cb1
feat(ensemble): add HiveMind ensemble manager, config loader, and def…
Mirrowel Nov 19, 2025
e80c3e0
feat(ensemble): delegate HiveMind ensemble requests to ensemble manager
Mirrowel Nov 19, 2025
2c99326
feat(ensemble): ✨ add _prepare_drones to prepare drone configs for pa…
Mirrowel Nov 19, 2025
d13eb95
feat(ensemble): add parallel drone execution and response formatter
Mirrowel Nov 19, 2025
eccbea4
feat(ensemble): add arbiter prompt builder, arbiter caller, and swarm…
Mirrowel Nov 19, 2025
0ab51aa
fix(ensemble): use litellm.acompletion for drone API calls
Mirrowel Nov 19, 2025
eb5d7a1
feat(ensemble): add streaming arbiter and swarm handlers
Mirrowel Nov 19, 2025
8af4919
fix(client): prefetch model mapping to avoid repeated lookups during …
Mirrowel Nov 19, 2025
4d83427
feat(ensemble): route swarm requests to streaming handler when stream…
Mirrowel Nov 19, 2025
b343bd4
feat(ensemble): ✨ add temperature jitter and adversarial drone mode
Mirrowel Nov 19, 2025
aa8a609
feat(ensemble): ✨ add blind-mode response anonymization and hoist imp…
Mirrowel Nov 19, 2025
bc2672a
feat(ensemble): prepare specialist model configurations for fusion
Mirrowel Nov 19, 2025
e86457a
feat(ensemble): add fusion phase 5 with specialist roles, arbiter rou…
Mirrowel Nov 19, 2025
08edb05
docs(hivemind): 📚 update HiveMind task checklist progress
Mirrowel Nov 19, 2025
d03d34d
feat(ensemble): ✨ add streaming fusion handler and consolidate fusion…
Mirrowel Nov 19, 2025
e41cfd2
feat(ensemble): add recursive arbiter mode and filter internal reasoning
Mirrowel Nov 19, 2025
0856dc0
fix(ensemble): 🐛 use deepcopy, load provider models, and robustly han…
Mirrowel Nov 19, 2025
5da1db4
feat(ensemble): dynamically aggregate usage and add cost/latency trac…
Mirrowel Nov 19, 2025
865f7cf
feat(ensemble): add specialist weight descriptions and embed expertis…
Mirrowel Nov 19, 2025
60243f5
feat(ensemble): extract specialist metadata for arbiter and return al…
Mirrowel Nov 19, 2025
55a94f8
fix(rotator): 🐛 include HiveMind fusion models in available models li…
Mirrowel Nov 19, 2025
4b0a0bf
docs(hivemind): 📚 add HiveMind API and user guide, update task checklist
Mirrowel Nov 19, 2025
6c9f278
feat(ensemble): add preset-based hivemind swarm model discovery and h…
Mirrowel Nov 19, 2025
d093b26
docs(hivemind): 📚 mark fusion features and documentation items comple…
Mirrowel Nov 19, 2025
2323dbc
feat(ensemble): support multi-fusion config format and fusion id suffix
Mirrowel Nov 19, 2025
d8c90b2
docs(hivemind): 📚 standardize "HiveMind Ensemble" naming across docum…
Mirrowel Nov 19, 2025
105d10a
feat(ensemble): switch swarm loader to preset-based format and add sa…
Mirrowel Nov 19, 2025
d8ed4a2
feat(ensemble): add role template support and sample role configs
Mirrowel Nov 19, 2025
6794096
fix(config): 🐛 report correct swarm preset count in loader log
Mirrowel Nov 19, 2025
f8de42b
refactor(ensemble): 🔨 standardize HiveMind ensemble initialization logs
Mirrowel Nov 19, 2025
e03d42c
feat(ensemble): ✨ enable implicit preset lookup for compact swarm IDs…
Mirrowel Nov 19, 2025
9e6cbc0
docs(ensemble): 📚 add HiveMind Ensemble documentation, presets, roles…
Mirrowel Nov 19, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
147 changes: 146 additions & 1 deletion DOCUMENTATION.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,10 @@ The project is a monorepo containing two primary components:
* **Batch Manager**: Optimizes high-volume embedding requests.
* **Detailed Logger**: Provides per-request file logging for debugging.
* **OpenAI-Compatible Endpoints**: `/v1/chat/completions`, `/v1/embeddings`, etc.
2. **The Resilience Library (`rotator_library`)**: This is the core engine that provides high availability. It is consumed by the proxy app to manage a pool of API keys, handle errors gracefully, and ensure requests are completed successfully even when individual keys or provider endpoints face issues.
2. **The Resilience Library (`rotator_library`)**: This is the core engine that provides high availability. It is consumed by the proxy app to manage a pool of API keys, handle errors gracefully, and ensure requests are completed successfully even when individual keys or provider endpoints face issues. It also includes:
* **HiveMind Ensemble Manager**: Orchestrates parallel model execution (Swarm and Fusion modes) with intelligent arbitration.
* **Key Management**: Advanced concurrency control and intelligent key selection.
* **Error Handling**: Escalating cooldowns and automatic recovery.

This architecture cleanly separates the API interface from the resilience logic, making the library a portable and powerful tool for any application needing robust API key management.

Expand Down Expand Up @@ -315,6 +318,148 @@ The `CooldownManager` handles IP or account-level rate limiting that affects all

---

## 2.10. HiveMind Ensemble (`ensemble/`)

The **HiveMind Ensemble** system enables parallel model execution with intelligent arbitration, supporting two distinct modes:

### 2.10.1. Swarm Mode

**Purpose**: Execute the same model multiple times in parallel to generate diverse responses, then synthesize them into a single high-quality output.

**Key Features**:
- **Temperature Jitter**: Randomly varies temperature across drones (±delta) to increase response diversity
- **Adversarial Mode**: Dedicates N drones as critical reviewers with adversarial prompts to stress-test solutions
- **Blind Switch**: Optionally hides model names from the arbiter to reduce synthesis bias
- **Self-Arbitration**: Can use the same model as arbiter to save costs

**Configuration** (`ensemble_configs/swarms/*.json`):
- Folder-based preset system with model-specific overrides
- Default configuration applies to all swarms unless overridden
- Preset-based discovery: `{base_model}-{preset_id}[swarm]` format

**Example Usage**:
```python
response = await client.acompletion(
model="gpt-4o-mini-default[swarm]",
messages=[{"role": "user", "content": "Explain AI"}]
)
# → 3 parallel calls to gpt-4o-mini with temperature jitter
# → Arbiter synthesizes responses into final answer
```

### 2.10.2. Fusion Mode

**Purpose**: Combine responses from multiple specialized models with role-based routing and weighted synthesis.

**Key Features**:
- **Role Assignment**: Each specialist model receives a custom system prompt defining its expertise
- **Weight Descriptions**: Guide arbiter on which specialist to trust for specific domains
- **Role Templates**: Reusable role definitions stored in `ensemble_configs/roles/`
- **Blind Mode**: Hides model names while preserving role labels
- **Multi-Provider Support**: Can mix models from different providers in a single fusion

**Configuration** (`ensemble_configs/fusions/*.json`):
- Each fusion defined in its own JSON file or as an array in a single file
- Specialists can reference role templates via `role_template` field
- Supports `weight_description` for arbiter context

**Example Configuration**:
```json
{
"id": "dev-team",
"specialists": [
{
"model": "gpt-4o",
"role": "Architect",
"system_prompt": "Focus on scalability and system design.",
"weight_description": "Expert in architecture. Trust for design decisions."
},
{
"model": "claude-3-opus",
"role": "Security",
"role_template": "security-expert"
}
],
"arbiter": {
"model": "gpt-4o",
"strategy": "synthesis",
"blind": true
}
}
```

### 2.10.3. Arbitration Strategies

Strategies define how the arbiter synthesizes responses. Stored as plain text files in `ensemble_configs/strategies/*.txt` with `{responses}` placeholder.

**Built-in Strategies**:
- **synthesis**: Combine best elements from all responses
- **best_of_n**: Select and refine the strongest response
- **code_review**: Code-specific evaluation criteria

**Custom Strategies**: Users can add their own `.txt` files with custom synthesis prompts.

### 2.10.4. Recursive Mode

**Purpose**: Enable autonomous arbiter decision-making for low-consensus scenarios.

**Mechanism**:
- Arbiter assesses consensus (1-10 scale)
- If consensus < threshold: arbiter performs internal critique reasoning
- If consensus >= threshold: proceeds directly to synthesis
- All internal reasoning wrapped in `[INTERNAL]` tags (filtered from user output)

**Markers**:
- `[CONSENSUS: X/10]`: Logged at WARN level if below threshold
- `[CONFLICTS: ...]`: Identified disagreement points
- `[CRITIQUE: ...]`: Internal reasoning about conflicts
- `[FINAL SYNTHESIS:]`: Start of user-facing output

### 2.10.5. Usage Tracking

HiveMind responses include standard OpenAI-compatible usage fields **plus** supplementary `hivemind_details`:

**Standard Fields** (aggregated totals from all models):
- `prompt_tokens`: Total prompt tokens (drones/specialists + arbiter)
- `completion_tokens`: Total completion tokens
- `total_tokens`: Grand total

**Supplementary Breakdown** (`hivemind_details`):
```json
{
"mode": "swarm" | "fusion",
"drone_count" | "specialist_count": 3,
"drone_tokens" | "specialist_tokens": 450,
"arbiter_tokens": 200,
"total_cost_usd": 0.00123,
"latency_ms": 1523.45
}
```

**Important**: Consumers should use standard `usage` fields for billing/analytics. The `hivemind_details` provides debugging context.

### 2.10.6. Architecture

**Components**:
- **EnsembleManager** (`manager.py`): Orchestration engine
- Detects ensemble requests (`is_ensemble()`)
- Prepares drones/specialists (`_prepare_drones()`, `_prepare_fusion_models()`)
- Executes parallel calls (`_execute_parallel()`)
- Builds arbiter prompts (`_build_arbiter_prompt()`)
- Handles streaming (`_call_arbiter_streaming()`)

- **ConfigLoader** (`config_loader.py`): Configuration management
- Loads swarm presets, fusions, strategies, and role templates
- Supports both single-item and array-based file formats
- Validates and merges configurations

**Integration**:
- Initialized in `RotatingClient.__init__()`
- Intercepts requests in `acompletion()` before normal routing
- Inherits all retry/resilience logic from RotatingClient

---

## 3. Provider Specific Implementations

The library handles provider idiosyncrasies through specialized "Provider" classes in `src/rotator_library/providers/`.
Expand Down
52 changes: 51 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,11 @@ This project provides a powerful solution for developers building complex applic
## Features

- **Universal API Endpoint**: Simplifies development by providing a single, OpenAI-compatible interface for diverse LLM providers.
- **HiveMind Ensemble**: Parallel model execution with intelligent arbitration in two modes:
- **Swarm Mode**: Run multiple copies of the same model with temperature jitter, adversarial critique, and consensus-based synthesis
- **Fusion Mode**: Combine responses from different specialized models with role-based routing and weighted synthesis
- **Recursive Refinement**: Autonomous arbiter decision-making for low-consensus scenarios with internal critique reasoning
- **Streaming Support**: Full streaming support with real-time arbiter synthesis
- **High Availability**: The underlying library ensures your application remains operational by gracefully handling transient provider errors and API key-specific issues.
- **Resilient Performance**: A global timeout on all requests prevents your application from hanging on unresponsive provider APIs.
- **Advanced Concurrency Control**: A single API key can be used for multiple concurrent requests. By default, it supports concurrent requests to *different* models. With configuration (`MAX_CONCURRENT_REQUESTS_PER_KEY_<PROVIDER>`), it can also support multiple concurrent requests to the *same* model using the same key.
Expand Down Expand Up @@ -340,11 +345,56 @@ curl -X POST http://127.0.0.1:8000/v1/chat/completions \
}'
```

### HiveMind Ensemble - Parallel Model Execution

HiveMind enables you to run multiple models in parallel with intelligent arbitration. Use the `[swarm]` suffix or pre-configured fusion IDs.

**Swarm Mode** (same model, multiple executions):
```bash
# Explicit preset format
curl -X POST http://127.0.0.1:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer a-very-secret-and-unique-key" \
-d '{
"model": "gpt-4o-mini-aggressive[swarm]",
"messages": [{"role": "user", "content": "Explain quantum computing"}]
}'

# Short format (requires omit_id: true in preset)
curl -X POST http://127.0.0.1:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer a-very-secret-and-unique-key" \
-d '{
"model": "gpt-4o-mini[swarm]",
"messages": [{"role": "user", "content": "Explain quantum computing"}]
}'
```

**Fusion Mode** (multiple specialist models):
```bash
curl -X POST http://127.0.0.1:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer a-very-secret-and-unique-key" \
-d '{
"model": "dev-team[fusion]",
"messages": [{"role": "user", "content": "Review this API design"}]
}'
```

HiveMind automatically:
- Executes models in parallel
- Applies temperature jitter for diversity (Swarm mode)
- Routes to specialized models with role prompts (Fusion mode)
- Synthesizes responses using an arbiter model
- Aggregates usage and cost across all calls

For detailed configuration and advanced features, see the [HiveMind User Guide](docs/HiveMind_User_Guide.md).

### Available API Endpoints

- `POST /v1/chat/completions`: The main endpoint for making chat requests.
- `POST /v1/embeddings`: The endpoint for creating embeddings.
- `GET /v1/models`: Returns a list of all available models from your configured providers.
- `GET /v1/models`: Returns a list of all available models from your configured providers (includes HiveMind fusions and swarms).
- `GET /v1/providers`: Returns a list of all configured providers.
- `POST /v1/token-count`: Calculates the token count for a given message payload.

Expand Down
Loading
Loading