@copilotkit/llmock

Deterministic multi-provider mock LLM server for testing. Streams SSE responses in real OpenAI, Claude, and Gemini API formats, driven entirely by fixtures. Zero runtime dependencies — built on Node.js builtins only.

Supports both streaming (SSE) and non-streaming JSON responses across OpenAI (Chat Completions + Responses), Anthropic Claude (Messages), and Google Gemini (GenerateContent) APIs. Text completions, tool calls, and error injection. Point any process at it via OPENAI_BASE_URL, ANTHROPIC_BASE_URL, or Gemini base URL and get reproducible, instant responses.

Install

npm install @copilotkit/llmock

When to Use This vs MSW

MSW (Mock Service Worker) is a popular API mocking library, but it solves a different problem.

The key difference is architecture. llmock runs a real HTTP server on a port. MSW patches http/https/fetch modules inside a single Node.js process. MSW can only intercept requests from the process that calls server.listen() — child processes, separate services, and workers are unaffected.

This matters for E2E tests where multiple processes make LLM API calls:

Playwright test runner (Node)
  └─ controls browser → Next.js app (separate process)
                            └─ OPENAI_BASE_URL → llmock :5555
                                ├─ Mastra agent workers
                                ├─ LangGraph workers
                                └─ CopilotKit runtime

MSW can't intercept any of those calls. llmock can — it's a real server on a real port.

Use llmock when:

Multiple processes need to hit the same mock (E2E tests, agent frameworks, microservices)
You want multi-provider SSE format out of the box (OpenAI, Claude, Gemini)
You prefer defining fixtures as JSON files rather than code
You need a standalone CLI server

Use MSW when:

All API calls originate from a single Node.js process (unit tests, SDK client tests)
You're mocking many different APIs, not just OpenAI
You want in-process interception without running a server

Capability	llmock	MSW
Cross-process interception	Yes (real server)	No (in-process only)
OpenAI Chat Completions SSE	Built-in	Manual — build `data: {json}\n\n` + `[DONE]` yourself
OpenAI Responses API SSE	Built-in	Manual — MSW's `sse()` sends `data:` events, not OpenAI's `event:` format
Claude Messages API SSE	Built-in	Manual — build `event:`/`data:` SSE yourself
Gemini streaming	Built-in	Manual — build `data:` SSE yourself
Fixture file loading (JSON)	Yes	No — handlers are code-only
Request journal / inspection	Yes	No — track requests manually
Non-streaming responses	Yes	Yes
Error injection (one-shot)	Yes	Yes (via `server.use()`)
CLI for standalone use	Yes	No
Zero dependencies	Yes	No (~300KB)

Quick Start

import { LLMock } from "@copilotkit/llmock";

const mock = new LLMock({ port: 5555 });

mock.onMessage("hello", { content: "Hi there!" });

const url = await mock.start();
// Point your OpenAI client at `url` instead of https://api.openai.com

// ... run your tests ...

await mock.stop();

E2E Test Patterns

Real-world patterns from using llmock in Playwright E2E tests with CopilotKit, Mastra, LangGraph, and Agno agent frameworks.

Global Setup/Teardown

Start the mock server once for the entire test suite. All child processes (Next.js, agent workers) inherit the URL via environment variable.

// e2e/llmock-setup.ts
import { LLMock } from "@copilotkit/llmock";
import * as path from "node:path";

let mockServer: LLMock | null = null;

export async function setupLLMock(): Promise<void> {
  mockServer = new LLMock({ port: 5555 });

  // Load JSON fixtures from a directory
  mockServer.loadFixtureDir(path.join(__dirname, "fixtures", "openai"));

  const url = await mockServer.start();

  // Child processes use this to find the mock
  process.env.LLMOCK_URL = `${url}/v1`;
}

export async function teardownLLMock(): Promise<void> {
  if (mockServer) {
    await mockServer.stop();
    mockServer = null;
  }
}

The Next.js app (or any other service) just needs:

OPENAI_BASE_URL=http://localhost:5555/v1
OPENAI_API_KEY=mock-key

# Or for Anthropic Claude:
ANTHROPIC_BASE_URL=http://localhost:5555/v1

# Or for Google Gemini — point at the base URL:
# http://localhost:5555/v1beta

JSON Fixture Files

Define fixtures as JSON — one file per feature, loaded with loadFixtureFile or loadFixtureDir.

Text responses — match on a substring of the last user message:

{
  "fixtures": [
    {
      "match": { "userMessage": "stock price of AAPL" },
      "response": { "content": "The current stock price of Apple Inc. (AAPL) is $150.25." }
    },
    {
      "match": { "userMessage": "capital of France" },
      "response": { "content": "The capital of France is Paris." }
    }
  ]
}

Tool call responses — the agent framework receives these as tool calls and executes them:

{
  "fixtures": [
    {
      "match": { "userMessage": "one step with eggs" },
      "response": {
        "toolCalls": [
          {
            "name": "generate_task_steps",
            "arguments": "{\"steps\":[{\"description\":\"Crack eggs into bowl\",\"status\":\"enabled\"},{\"description\":\"Preheat oven to 350F\",\"status\":\"enabled\"}]}"
          }
        ]
      }
    },
    {
      "match": { "userMessage": "background color to blue" },
      "response": {
        "toolCalls": [
          {
            "name": "change_background",
            "arguments": "{\"background\":\"blue\"}"
          }
        ]
      }
    }
  ]
}

Fixture Load Order Matters

Fixtures are evaluated first-match-wins. When two fixtures could match the same message, load the more specific one first:

// Load HITL fixtures first — "one step with eggs" is more specific than
// "plan to make brownies" which also appears in the HITL user message
mockServer.loadFixtureFile(path.join(FIXTURES_DIR, "human-in-the-loop.json"));

// Then load everything else — earlier matches take priority
mockServer.loadFixtureDir(FIXTURES_DIR);

Predicate-Based Routing

When substring matching isn't enough — for example, when the last user message is the same across multiple requests but the system prompt differs — use predicates:

// Supervisor agent: same user message every time, but system prompt
// contains state flags like "Flights found: false"
mockServer.addFixture({
  match: {
    predicate: (req) => {
      const sysMsg = req.messages.find((m) => m.role === "system");
      return sysMsg?.content?.includes("Flights found: false") ?? false;
    },
  },
  response: {
    toolCalls: [
      {
        name: "supervisor_response",
        arguments: '{"answer":"Let me find flights for you!","next_agent":"flights_agent"}',
      },
    ],
  },
});

mockServer.addFixture({
  match: {
    predicate: (req) => {
      const sys = req.messages.find((m) => m.role === "system")?.content ?? "";
      return sys.includes("Flights found: true") && sys.includes("Hotels found: false");
    },
  },
  response: {
    toolCalls: [
      {
        name: "supervisor_response",
        arguments: '{"answer":"Now let me find hotels.","next_agent":"hotels_agent"}',
      },
    ],
  },
});

Tool Result Catch-All

After a tool executes, the next request contains a role: "tool" message with the result. Add a catch-all for these so the conversation can continue:

const toolResultFixture = {
  match: {
    predicate: (req) => {
      const last = req.messages[req.messages.length - 1];
      return last?.role === "tool";
    },
  },
  response: { content: "Done! I've completed that for you." },
};
mockServer.addFixture(toolResultFixture);

// Move it to the front so it matches before substring-based fixtures
// (the last user message hasn't changed, so substring fixtures would
// match the same fixture again otherwise)
const fixtures = (mockServer as any).fixtures;
const idx = fixtures.indexOf(toolResultFixture);
if (idx > 0) {
  fixtures.splice(idx, 1);
  fixtures.unshift(toolResultFixture);
}

Universal Catch-All

Append a catch-all last to handle any request that doesn't match a specific fixture, preventing 404s from crashing the test:

mockServer.addFixture({
  match: { predicate: () => true },
  response: { content: "I understand. How can I help you with that?" },
});

Programmatic API

`new LLMock(options?)`

Create a new mock server instance.

Option	Type	Default	Description
`port`	`number`	`0` (random)	Port to listen on
`host`	`string`	`"127.0.0.1"`	Host to bind to
`latency`	`number`	`0`	Default ms delay between SSE chunks
`chunkSize`	`number`	`20`	Default characters per SSE chunk

`LLMock.create(options?)`

Static factory — creates an instance and starts it in one call. Returns Promise<LLMock>.

Server Lifecycle

Method	Returns	Description
`start()`	`Promise<string>`	Start the server, returns the base URL
`stop()`	`Promise<void>`	Stop the server
`url`	`string`	Base URL (throws if not started)
`baseUrl`	`string`	Alias for `url`
`port`	`number`	Listening port (throws if not started)

Fixture Registration

All registration methods return this for chaining.

`on(match, response, opts?)`

Register a fixture with full control over match criteria.

mock.on({ userMessage: /weather/i, model: "gpt-4" }, { content: "It's sunny!" }, { latency: 50 });

`onMessage(pattern, response, opts?)`

Shorthand — matches on the last user message.

mock.onMessage("hello", { content: "Hi!" });
mock.onMessage(/greet/i, { content: "Hey there!" });

`onToolCall(name, response, opts?)`

Shorthand — matches when the request contains a tool with the given name.

mock.onToolCall("get_weather", {
  toolCalls: [{ name: "get_weather", arguments: '{"location":"SF"}' }],
});

`onToolResult(id, response, opts?)`

Shorthand — matches when a tool result message has the given tool_call_id.

mock.onToolResult("call_abc123", { content: "Temperature is 72F" });

`addFixture(fixture)` / `addFixtures(fixtures)`

Add raw Fixture objects directly.

`loadFixtureFile(path)` / `loadFixtureDir(path)`

Load fixtures from JSON files on disk. See Fixture Files above.

`clearFixtures()`

Remove all registered fixtures.

Error Injection

`nextRequestError(status, errorBody?)`

Queue a one-shot error for the very next request. The error fires once, then auto-removes itself.

mock.nextRequestError(429, {
  message: "Rate limited",
  type: "rate_limit_error",
});

// Next request → 429 error
// Subsequent requests → normal fixture matching

Request Journal

Every request to all API endpoints (/v1/chat/completions, /v1/responses, /v1/messages, and Gemini endpoints) is recorded in a journal.

Programmatic Access

Method	Returns	Description
`getRequests()`	`JournalEntry[]`	All recorded requests
`getLastRequest()`	`JournalEntry \| null`	Most recent request
`clearRequests()`	`void`	Clear the journal
`journal`	`Journal`	Direct access to the journal instance

await fetch(mock.url + "/v1/chat/completions", { ... });

const last = mock.getLastRequest();
expect(last?.body.messages).toContainEqual({
  role: "user",
  content: "hello",
});

HTTP Endpoints

The server also exposes journal data over HTTP (useful in CLI mode):

GET /v1/_requests — returns all journal entries as JSON. Supports ?limit=N.
DELETE /v1/_requests — clears the journal. Returns 204.

Reset

`reset()`

Clear all fixtures and the journal in one call. Works before or after the server is started.

afterEach(() => {
  mock.reset();
});

Fixture Matching

Fixtures are evaluated in registration order (first match wins). A fixture matches when all specified fields match the incoming request (AND logic).

Field	Type	Matches on
`userMessage`	`string \| RegExp`	Content of the last `role: "user"` message
`toolName`	`string`	Name of a tool in the request's `tools` array
`toolCallId`	`string`	`tool_call_id` on a `role: "tool"` message
`model`	`string \| RegExp`	The `model` field in the request
`predicate`	`(req) => boolean`	Arbitrary matching function

Fixture Responses

Text

{
  content: "Hello world";
}

Streams as SSE chunks, splitting content by chunkSize. With stream: false, returns a standard chat.completion JSON object.

Tool Calls

{
  toolCalls: [{ name: "get_weather", arguments: '{"location":"SF"}' }];
}

Errors

{
  error: { message: "Rate limited", type: "rate_limit_error" },
  status: 429
}

API Endpoints

The server handles:

POST /v1/chat/completions — OpenAI Chat Completions API (streaming and non-streaming)
POST /v1/responses — OpenAI Responses API (streaming and non-streaming)
POST /v1/messages — Anthropic Claude Messages API (streaming and non-streaming)
POST /v1beta/models/{model}:generateContent — Google Gemini (non-streaming)
POST /v1beta/models/{model}:streamGenerateContent — Google Gemini (streaming)

All endpoints share the same fixture pool — the same fixtures work across all providers. Requests are translated to a common format internally for fixture matching.

CLI

The package includes a standalone server binary:

llmock [options]

Option	Short	Default	Description
`--port`	`-p`	`4010`	Port to listen on
`--host`	`-h`	`127.0.0.1`	Host to bind to
`--fixtures`	`-f`	`./fixtures`	Path to fixtures directory or file
`--latency`	`-l`	`0`	Latency between SSE chunks (ms)
`--chunk-size`	`-c`	`20`	Characters per SSE chunk
`--help`			Show help

# Start with bundled example fixtures
llmock

# Custom fixtures on a specific port
llmock -p 8080 -f ./my-fixtures

# Simulate slow responses
llmock --latency 100 --chunk-size 5

Advanced Usage

Low-level Server

If you need the raw HTTP server without the LLMock wrapper:

import { createServer } from "@copilotkit/llmock";

const fixtures = [{ match: { userMessage: "hi" }, response: { content: "Hello!" } }];

const { server, journal, url } = await createServer(fixtures, { port: 0 });
// ... use it ...
server.close();

Per-Fixture Timing

mock.on({ userMessage: "slow" }, { content: "Finally..." }, { latency: 200, chunkSize: 5 });

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
.changeset		.changeset
.github/workflows		.github/workflows
.husky		.husky
docs		docs
fixtures		fixtures
src		src
.gitignore		.gitignore
.prettierignore		.prettierignore
.prettierrc		.prettierrc
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
commitlint.config.cjs		commitlint.config.cjs
eslint.config.mjs		eslint.config.mjs
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
tsconfig.json		tsconfig.json
tsdown.config.ts		tsdown.config.ts
vitest.config.ts		vitest.config.ts

Folders and files

Latest commit

History

Repository files navigation

@copilotkit/llmock

Install

When to Use This vs MSW

Quick Start

E2E Test Patterns

Global Setup/Teardown

JSON Fixture Files

Fixture Load Order Matters

Predicate-Based Routing

Tool Result Catch-All

Universal Catch-All

Programmatic API

new LLMock(options?)

LLMock.create(options?)

Server Lifecycle

Fixture Registration

on(match, response, opts?)

onMessage(pattern, response, opts?)

onToolCall(name, response, opts?)

onToolResult(id, response, opts?)

addFixture(fixture) / addFixtures(fixtures)

loadFixtureFile(path) / loadFixtureDir(path)

clearFixtures()

Error Injection

nextRequestError(status, errorBody?)

Request Journal

Programmatic Access

HTTP Endpoints

Reset

reset()

Fixture Matching

Fixture Responses

Text

Tool Calls

Errors

API Endpoints

CLI

Advanced Usage

Low-level Server

Per-Fixture Timing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`new LLMock(options?)`

`LLMock.create(options?)`

`on(match, response, opts?)`

`onMessage(pattern, response, opts?)`

`onToolCall(name, response, opts?)`

`onToolResult(id, response, opts?)`

`addFixture(fixture)` / `addFixtures(fixtures)`

`loadFixtureFile(path)` / `loadFixtureDir(path)`

`clearFixtures()`

`nextRequestError(status, errorBody?)`

`reset()`

Packages