Cerebrum API Reference

Complete API documentation for the Cerebrum distributed AI code generation system.

Overview
Authentication
CM4 Orchestrator API
VPS Backend API
Error Handling
Rate Limiting

Overview

Cerebrum exposes two APIs:

Component	Port	Purpose	Public
CM4 Orchestrator	7000	User-facing completion & management	Yes (local network)
VPS Backend	9000	Internal inference engine	No (Tailscale only)

Typical flow:

Client → CM4:7000 (code completion request)
CM4 → VPS:9000 (prepared prompt)
VPS → CM4 (streamed tokens)
CM4 → Client (proxied stream)

Authentication

CM4 Orchestrator (Port 7000)

Currently: No authentication required (designed for single-user local access)

Future: Optional API key via header:

X-API-Key: your-key-here

VPS Backend (Port 9000)

Required for all endpoints except /health:

X-API-Key: your-cerebrum-api-key

Generating keys:

cd ~/Cerebrum/cerebrum-backend/scripts
./generate_api_key.sh

Keys are stored in .env files and must match between CM4 and VPS.

CM4 Orchestrator API (Port 7000)

Base URL: http://<cm4-ip>:7000

Endpoints

`GET /health`

Health check endpoint.

Request:

curl http://localhost:7000/health

Response:

{
  "status": "healthy",
  "timestamp": "2025-12-25T12:00:00.000000",
  "vps_available": true,
  "uptime": 3600.5
}

Status Codes:

200 OK - Service healthy
503 Service Unavailable - VPS backend unreachable

`POST /v1/complete`

Non-streaming code completion.

Request:

curl -X POST http://localhost:7000/v1/complete \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "def fibonacci(n):",
    "language": "python",
    "max_tokens": 256,
    "temperature": 0.4
  }'

Request Body:

{
  "prompt": "string (required)",
  "language": "string (required)",
  "max_tokens": "integer (optional, default: 512)",
  "temperature": "float (optional, default: 0.4, range: 0.0-1.0)"
}

Supported Languages:

python, javascript, typescript → Uses Qwen-7B
rust, c, cpp, go → Uses CodeLLaMA-7B

Response:

{
  "completion": "def fibonacci(n):\n    if n <= 1:\n        return n\n    return fibonacci(n-1) + fibonacci(n-2)",
  "language": "python",
  "model": "qwen_7b",
  "total_tokens": 42,
  "inference_time": 18.234,
  "timestamp": "2025-12-25T12:00:00.000000"
}

Status Codes:

200 OK - Success
400 Bad Request - Invalid parameters
503 Service Unavailable - VPS unreachable or overloaded
429 Too Many Requests - Load shedding active (>2 concurrent)

`POST /v1/complete/stream`

Streaming code completion (Server-Sent Events).

Request:

curl -N -X POST http://localhost:7000/v1/complete/stream \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "async def fetch():",
    "language": "python",
    "max_tokens": 128,
    "temperature": 0.4
  }'

Request Body: Same as /v1/complete

Response: Server-Sent Events (SSE)

data: {"token": "import", "total_tokens": 1}

data: {"token": " aiohttp", "total_tokens": 2}

data: {"token": "\n", "total_tokens": 3}

data: {"token": "async", "total_tokens": 4}

data: {"token": " def", "total_tokens": 5}

...

data: {"done": true, "language": "python", "model": "qwen_7b", "total_tokens": 128, "inference_time": 182.14, "timestamp": "2025-12-25T12:00:00.000000"}

Event Types:

Token event:

{
  "token": "string",
  "total_tokens": "integer"
}

Done event:

{
  "done": true,
  "language": "string",
  "model": "string",
  "total_tokens": "integer",
  "inference_time": "float (seconds)",
  "timestamp": "string (ISO 8601)"
}

Error event:

{
  "error": true,
  "message": "string",
  "code": "string"
}

Client Implementation:

const response = await fetch('http://localhost:7000/v1/complete/stream', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    prompt: 'def hello():',
    language: 'python',
    max_tokens: 128
  })
});

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  
  const chunk = decoder.decode(value);
  const lines = chunk.split('\n');
  
  for (const line of lines) {
    if (line.startsWith('data: ')) {
      const data = JSON.parse(line.slice(6));
      if (data.token) {
        process.stdout.write(data.token);
      } else if (data.done) {
        console.log(`\n[${data.total_tokens} tokens in ${data.inference_time}s]`);
      }
    }
  }
}

`GET /v1/models`

List available models.

Request:

curl http://localhost:7000/v1/models

Response:

{
  "models": [
    {
      "id": "qwen_7b",
      "name": "Qwen 7B",
      "languages": ["python", "javascript", "typescript"],
      "parameters": "7B",
      "quantization": "Q4"
    },
    {
      "id": "codellama_7b",
      "name": "CodeLLaMA 7B",
      "languages": ["rust", "c", "cpp", "go"],
      "parameters": "7B",
      "quantization": "Q4"
    }
  ]
}

`GET /v1/stats`

System statistics.

Request:

curl http://localhost:7000/v1/stats

Response:

{
  "uptime": 3600.5,
  "requests_total": 142,
  "requests_active": 1,
  "vps_available": true,
  "vps_response_time_ms": 12.3,
  "memory_mb": 487.2,
  "load_avg": [0.5, 0.6, 0.7]
}

VPS Backend API (Port 9000)

Base URL: http://127.0.0.1:9000 (localhost only, via Tailscale)

⚠️ Internal API: Not intended for direct client access. Used by CM4 Orchestrator only.

Endpoints

`GET /health`

Health check (no authentication required).

Request:

curl http://127.0.0.1:9000/health

Response:

{
  "status": "healthy",
  "timestamp": "2025-12-25T12:00:00.000000",
  "cpu_percent": 45.2,
  "memory_available_mb": 2048.5,
  "models_loaded": ["qwen_7b"]
}

`POST /v1/inference`

Internal inference endpoint (non-streaming).

⚠️ Requires authentication

Request:

curl -X POST http://127.0.0.1:9000/v1/inference \
  -H "Content-Type: application/json" \
  -H "X-API-Key: your-key-here" \
  -d '{
    "prompt": "def hello():",
    "model": "qwen_7b",
    "max_tokens": 128,
    "temperature": 0.4
  }'

Request Body:

{
  "prompt": "string (required)",
  "model": "string (required)",
  "max_tokens": "integer (optional, default: 512)",
  "temperature": "float (optional, default: 0.4)"
}

Response:

{
  "completion": "def hello():\n    print(\"Hello, world!\")",
  "model": "qwen_7b",
  "total_tokens": 12,
  "inference_time": 8.234
}

`POST /v1/inference/stream`

Internal streaming inference endpoint.

⚠️ Requires authentication

Request: Same as /v1/inference but returns SSE stream

Response: Server-Sent Events (same format as CM4 streaming)

`GET /v1/models`

List loaded models.

⚠️ Requires authentication

Request:

curl -H "X-API-Key: your-key-here" \
  http://127.0.0.1:9000/v1/models

Response:

{
  "models": [
    {
      "id": "qwen_7b",
      "loaded": true,
      "path": "/home/user/Cerebrum/cerebrum-backend/models/qwen-7b-q4.gguf",
      "memory_mb": 3840.2,
      "last_used": "2025-12-25T11:55:00.000000"
    },
    {
      "id": "codellama_7b",
      "loaded": false,
      "path": "/home/user/Cerebrum/cerebrum-backend/models/codellama-7b-q4.gguf"
    }
  ]
}

`GET /v1/stats`

System resource statistics.

⚠️ Requires authentication

Request:

curl -H "X-API-Key: your-key-here" \
  http://127.0.0.1:9000/v1/stats

Response:

{
  "cpu_percent": 45.2,
  "memory_total_mb": 7260.0,
  "memory_available_mb": 2048.5,
  "memory_used_mb": 5211.5,
  "models_loaded": 1,
  "requests_total": 87,
  "requests_active": 0,
  "uptime": 7200.3
}

`POST /v1/unload/{model}`

Manually unload a model from memory.

⚠️ Requires authentication

Request:

curl -X POST \
  -H "X-API-Key: your-key-here" \
  http://127.0.0.1:9000/v1/unload/qwen_7b

Response:

{
  "model": "qwen_7b",
  "unloaded": true,
  "memory_freed_mb": 3840.2
}

`POST /v1/cleanup`

Unload all idle models (not used in last 60 minutes).

⚠️ Requires authentication

Request:

curl -X POST \
  -H "X-API-Key: your-key-here" \
  http://127.0.0.1:9000/v1/cleanup

Response:

{
  "models_unloaded": ["qwen_7b"],
  "memory_freed_mb": 3840.2
}

Error Handling

Error Response Format

All errors return JSON with this structure:

{
  "error": true,
  "message": "Human-readable error description",
  "code": "ERROR_CODE",
  "timestamp": "2025-12-25T12:00:00.000000"
}

Common Error Codes

Code	HTTP Status	Meaning
`INVALID_REQUEST`	400	Missing or invalid parameters
`UNAUTHORIZED`	401	Missing or invalid API key
`RATE_LIMIT`	429	Load shedding active
`VPS_UNAVAILABLE`	503	Backend unreachable
`MODEL_NOT_FOUND`	404	Requested model doesn't exist
`RESOURCE_EXHAUSTED`	503	CPU/RAM limits exceeded

Example Error Response

{
  "error": true,
  "message": "VPS backend is unavailable (circuit breaker open)",
  "code": "VPS_UNAVAILABLE",
  "timestamp": "2025-12-25T12:00:00.000000"
}

Rate Limiting

CM4 Orchestrator

Load Shedding:

Max concurrent requests: 2
Exceeded behavior: Returns 429 Too Many Requests
No retry-after header (client should implement exponential backoff)

Recommended client retry:

async function retryRequest(fn, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      return await fn();
    } catch (err) {
      // assumes client surfaces HTTP status on error
      if (err.status === 429 && i < maxRetries - 1) {
        await new Promise(r => setTimeout(r, Math.pow(2, i) * 1000));
        continue;
      }
      throw err;
    }
  }
}

VPS Backend

Resource Protection:

Rejects when CPU > 70%
Rejects when RAM < 1GB available
Returns 503 Service Unavailable

Request IDs

Every request through the CM4 generates a correlation ID for debugging:

Response Header:

X-Request-ID: 65652caa-c647-4685-be81-5e51bc97f453

Logging:

2025-12-25 12:00:00 - INFO - [65652caa-c647-4685-be81-5e51bc97f453] POST /v1/complete/stream 200 182.14s

Use this ID when reporting issues or debugging.

Client Libraries

Official

Bash: scripts/cerebrum_repl.sh (streaming REPL)

Community

None yet - PRs welcome!

Example Implementations

Python (streaming):

import httpx

async def stream_completion(prompt: str, language: str = "python"):
    async with httpx.AsyncClient() as client:
        async with client.stream(
            "POST",
            "http://localhost:7000/v1/complete/stream",
            json={
                "prompt": prompt,
                "language": language,
                "max_tokens": 256
            },
            timeout=300.0
        ) as response:
            async for line in response.aiter_lines():
                if line.startswith("data: "):
                    data = json.loads(line[6:])
                    if "token" in data:
                        print(data["token"], end="", flush=True)
                    elif data.get("done"):
                        print(f"\n[{data['total_tokens']} tokens]")

JavaScript (fetch API): See streaming example in /v1/complete/stream section above.

API Versioning

Current Version: v1

All endpoints are prefixed with /v1/. Breaking changes will increment the version (/v2/), with /v1/ maintained for compatibility.

Support

Issues: https://github.com/artcore-c/Cerebrum/issues
Discussions: https://github.com/artcore-c/Cerebrum/discussions

For VPS backend issues, check:

ssh user@vps
sudo journalctl -u cerebrum-backend -f

For CM4 orchestrator issues:

tail -f /opt/cerebrum-pi/logs/cerebrum.log

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cerebrum API Reference

Table of Contents

Overview

Authentication

CM4 Orchestrator (Port 7000)

VPS Backend (Port 9000)

CM4 Orchestrator API (Port 7000)

Endpoints

`GET /health`

`POST /v1/complete`

`POST /v1/complete/stream`

`GET /v1/models`

`GET /v1/stats`

VPS Backend API (Port 9000)

Endpoints

`GET /health`

`POST /v1/inference`

`POST /v1/inference/stream`

`GET /v1/models`

`GET /v1/stats`

`POST /v1/unload/{model}`

`POST /v1/cleanup`

Error Handling

Error Response Format

Common Error Codes

Example Error Response

Rate Limiting

CM4 Orchestrator

VPS Backend

Request IDs

Client Libraries

Official

Community

Example Implementations

API Versioning

Support

FilesExpand file tree

API.md

Latest commit

History

API.md

File metadata and controls

Cerebrum API Reference

Table of Contents

Overview

Authentication

CM4 Orchestrator (Port 7000)

VPS Backend (Port 9000)

CM4 Orchestrator API (Port 7000)

Endpoints

GET /health

POST /v1/complete

POST /v1/complete/stream

GET /v1/models

GET /v1/stats

VPS Backend API (Port 9000)

Endpoints

GET /health

POST /v1/inference

POST /v1/inference/stream

GET /v1/models

GET /v1/stats

POST /v1/unload/{model}

POST /v1/cleanup

Error Handling

Error Response Format

Common Error Codes

Example Error Response

Rate Limiting

CM4 Orchestrator

VPS Backend

Request IDs

Client Libraries

Official

Community

Example Implementations

API Versioning

Support

`GET /health`

`POST /v1/complete`

`POST /v1/complete/stream`

`GET /v1/models`

`GET /v1/stats`

`GET /health`

`POST /v1/inference`

`POST /v1/inference/stream`

`GET /v1/models`

`GET /v1/stats`

`POST /v1/unload/{model}`

`POST /v1/cleanup`