Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
Binary file added .DS_Store
Binary file not shown.
9 changes: 9 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# HuggingFace token — required to download model weights
# Get yours at https://huggingface.co/settings/tokens
# Make sure you've accepted the model license at https://huggingface.co/nvidia/personaplex-7b-v1
HF_TOKEN=hf_your_token_here

# API key for authenticating backend API requests.
# If not set, the server auto-generates one and prints it to the console.
# You can also pass --api-key on the command line.
PERSONAPLEX_API_KEY=
214 changes: 214 additions & 0 deletions API_DOCUMENTATION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,214 @@
# PersonaPlex API Documentation

## Overview

PersonaPlex exposes a lightweight HTTP + WebSocket API for real-time, full-duplex speech conversations with persona control.

---

## Base URL

| Environment | URL |
|---|---|
| Local (HTTPS) | `https://localhost:8998` |
| LAN | `https://<LAN_IP>:8998` |
| Public | `https://<PUBLIC_IP>:8998` |

> **Note:** The server uses self-signed certificates by default. Use `-k` with curl or bypass browser warnings.

---

## Authentication

All `/api/*` endpoints require an API key. Two methods are supported:

### 1. HTTP Header (recommended for REST)

```
X-API-Key: <your_api_key>
```

### 2. Query Parameter (required for WebSocket)

```
?api_key=<your_api_key>
```

### Configuration

Set the API key via any of these methods (in priority order):

1. **CLI argument:** `--api-key <key>`
2. **Environment variable:** `PERSONAPLEX_API_KEY=<key>`
3. **Auto-generated:** If neither is set, the server generates a key and prints it to the console.

---

## Endpoints

### `GET /api/health`

Health check endpoint. Returns server status and version.

**Headers:**

| Header | Required | Description |
|---|---|---|
| `X-API-Key` | Yes | Your API key |

**Sample Request:**

```bash
curl -k -H "X-API-Key: YOUR_KEY" https://localhost:8998/api/health
```

**Success Response — `200 OK`:**

```json
{
"status": "ok",
"version": "0.1.0"
}
```

**Error Responses:**

| Status | Body |
|---|---|
| `401 Unauthorized` | `{"error": "Unauthorized", "message": "Missing API key. Provide via X-API-Key header or api_key query parameter."}` |
| `403 Forbidden` | `{"error": "Forbidden", "message": "Invalid API key."}` |

---

### `GET /api/voices`

Lists all available voice prompt files on the server.

**Headers:**

| Header | Required | Description |
|---|---|---|
| `X-API-Key` | Yes | Your API key |

**Sample Request:**

```bash
curl -k -H "X-API-Key: YOUR_KEY" https://localhost:8998/api/voices
```

**Success Response — `200 OK`:**

```json
{
"voices": [
{ "filename": "NATF0.pt", "name": "NATF0" },
{ "filename": "NATF1.pt", "name": "NATF1" },
{ "filename": "NATF2.pt", "name": "NATF2" },
{ "filename": "NATF3.pt", "name": "NATF3" },
{ "filename": "NATM0.pt", "name": "NATM0" },
{ "filename": "NATM1.pt", "name": "NATM1" },
{ "filename": "NATM2.pt", "name": "NATM2" },
{ "filename": "NATM3.pt", "name": "NATM3" },
{ "filename": "VARF0.pt", "name": "VARF0" },
{ "filename": "VARF1.pt", "name": "VARF1" }
],
"count": 10
}
```

**Error Responses:**

| Status | Body |
|---|---|
| `401 Unauthorized` | `{"error": "Unauthorized", "message": "Missing API key. Provide via X-API-Key header or api_key query parameter."}` |
| `403 Forbidden` | `{"error": "Forbidden", "message": "Invalid API key."}` |

---

### `GET /api/chat` (WebSocket Upgrade)

Opens a full-duplex WebSocket connection for real-time speech conversation.

> **Important:** This endpoint upgrades HTTP to WebSocket. Use `wss://` protocol. Authentication must be via query parameter since browsers do not support custom headers on WebSocket handshake.

**Query Parameters:**

| Parameter | Required | Type | Description |
|---|---|---|---|
| `api_key` | Yes | string | Your API key |
| `voice_prompt` | Yes | string | Voice prompt filename (e.g., `NATF2.pt`) |
| `text_prompt` | Yes | string | System text prompt (e.g., `You enjoy having a good conversation.`) |
| `text_temperature` | No | float | Text generation temperature (default: model default) |
| `text_topk` | No | int | Text top-k sampling (default: model default) |
| `audio_temperature` | No | float | Audio generation temperature |
| `audio_topk` | No | int | Audio top-k sampling |
| `pad_mult` | No | float | Padding multiplier |
| `text_seed` | No | int | Random seed for text generation |
| `audio_seed` | No | int | Random seed for audio generation |
| `repetition_penalty` | No | float | Repetition penalty factor |
| `repetition_penalty_context` | No | int | Repetition penalty context window |
| `seed` | No | int | Global random seed |

**Sample WebSocket URL:**

```
wss://localhost:8998/api/chat?api_key=YOUR_KEY&voice_prompt=NATF2.pt&text_prompt=You%20enjoy%20having%20a%20good%20conversation.
```

**Binary Protocol:**

Messages are exchanged as binary frames with a 1-byte type prefix:

| Byte | Direction | Description |
|---|---|---|
| `0x00` | Server → Client | Handshake acknowledgement |
| `0x01` | Server → Client | Opus-encoded audio chunk |
| `0x01` | Client → Server | Opus-encoded audio chunk |
| `0x02` | Server → Client | UTF-8 text token |

**Connection Lifecycle:**

1. Client connects via WebSocket URL with query params
2. Server processes voice/text prompts (may take a few seconds)
3. Server sends handshake byte `0x00`
4. Client begins streaming Opus audio (`0x01` frames)
5. Server responds with interleaved audio (`0x01`) and text (`0x02`) frames
6. Either side closes the connection to end the session

**Error Responses (before WebSocket upgrade):**

| Status | Body |
|---|---|
| `401 Unauthorized` | `{"error": "Unauthorized", "message": "Missing API key. Provide via X-API-Key header or api_key query parameter."}` |
| `403 Forbidden` | `{"error": "Forbidden", "message": "Invalid API key."}` |

---

## Status Codes Summary

| Code | Meaning |
|---|---|
| `200` | Success |
| `101` | Switching Protocols (WebSocket upgrade) |
| `401` | Unauthorized — missing API key |
| `403` | Forbidden — invalid API key |
| `404` | Not Found — unknown endpoint |
| `500` | Internal Server Error |

---

## Quick Start

```bash
# 1. Set your API key
export PERSONAPLEX_API_KEY="your_key_here"

# 2. Test health
curl -k -H "X-API-Key: $PERSONAPLEX_API_KEY" https://localhost:8998/api/health

# 3. List voices
curl -k -H "X-API-Key: $PERSONAPLEX_API_KEY" https://localhost:8998/api/voices

# 4. Connect via WebSocket (using wscat)
wscat -n -c "wss://localhost:8998/api/chat?api_key=$PERSONAPLEX_API_KEY&voice_prompt=NATF2.pt&text_prompt=Hello"
```
17 changes: 15 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,3 +1,13 @@
# ── Stage 1: Build the React/Vite frontend ──────────────────────────
FROM node:20-slim AS frontend

WORKDIR /app/client
COPY client/package.json client/package-lock.json ./
RUN npm ci
COPY client/ ./
RUN npm run build

# ── Stage 2: Python backend with CUDA runtime ───────────────────────
ARG BASE_IMAGE="nvcr.io/nvidia/cuda"
ARG BASE_IMAGE_TAG="12.4.1-runtime-ubuntu22.04"

Expand All @@ -9,17 +19,20 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
pkg-config \
libopus-dev \
&& rm -rf /var/lib/apt/lists/*
&& rm -rf /var/lib/apt/lists/*

WORKDIR /app/moshi/

COPY moshi/ /app/moshi/
RUN uv venv /app/moshi/.venv --python 3.12
RUN uv sync

# Copy the pre-built frontend
COPY --from=frontend /app/client/dist /app/client-dist

RUN mkdir -p /app/ssl

EXPOSE 8998

ENTRYPOINT []
CMD ["/app/moshi/.venv/bin/python", "-m", "moshi.server", "--ssl", "/app/ssl"]
CMD ["/app/moshi/.venv/bin/python", "-m", "moshi.server", "--ssl", "/app/ssl", "--static", "/app/client-dist"]
3 changes: 2 additions & 1 deletion client/.env.local
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
VITE_QUEUE_API_PATH=/api
VITE_QUEUE_API_URL=https://moshi.chat
VITE_QUEUE_API_URL=https://moshi.chat
VITE_PERSONAPLEX_API_KEY=k_ZbPcGpqduyQXH_nQeuASpdjxMqnlnOrD7dMfVgfG5XNVXTVp9LnrLfZst90cT4
29 changes: 16 additions & 13 deletions client/index.html
Original file line number Diff line number Diff line change
@@ -1,14 +1,17 @@
<!doctype html>
<html lang="en" class="bg-neutral-50" data-theme="light">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<link rel="icon" type="image/png" sizes="32x32" href="/assets/favicon-32x32.png">
<link rel="icon" type="image/png" sizes="16x16" href="/assets/favicon-16x16.png">
<title>PersonaPlex</title>
</head>
<body class="bg-neutral-50 text-zinc-700">
<div id="root" />
<script type="module" src="/src/app.tsx"></script>
</body>
</html>
<html lang="en" class="dark">

<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<link rel="icon" type="image/png" sizes="32x32" href="/assets/favicon-32x32.png">
<link rel="icon" type="image/png" sizes="16x16" href="/assets/favicon-16x16.png">
<title>PersonaPlex</title>
</head>

<body class="min-h-screen antialiased">
<div id="root" />
<script type="module" src="/src/app.tsx"></script>
</body>

</html>
Loading