Local Inference

Run LLM-based knowledge extraction on any OpenAI-compatible local server — llama.cpp, Ollama, vLLM, LocalAI, text-gen-webui, and more. No cloud API keys required.

Source: src/adapters/local_llm/connector.py, src/adapters/local_llm/exhaust.py

Why Local?

Airgapped / sovereign — no data leaves your network
Cost control — zero per-token API costs
Low latency — GPU on the same machine or LAN
Dev iteration — iterate on extraction prompts without burning API credits

Setup

pip install -e ".[local]"

# Start any OpenAI-compatible server, e.g.:
./llama-server -m models/llama-3-8b.Q4_K_M.gguf --port 8080

# Configure
export DEEPSIGMA_LLM_BACKEND=local
export DEEPSIGMA_LOCAL_BASE_URL=http://localhost:8080
export EXHAUST_USE_LLM=1

Environment Variables

Variable	Default	Description
`DEEPSIGMA_LLM_BACKEND`	`anthropic`	Set to `local` for local inference
`DEEPSIGMA_LOCAL_BASE_URL`	`http://localhost:8080`	Server URL
`DEEPSIGMA_LOCAL_API_KEY`	(empty)	Bearer token if server requires auth
`DEEPSIGMA_LOCAL_MODEL`	(empty)	Model name; empty = server default
`DEEPSIGMA_LOCAL_TIMEOUT`	`120`	HTTP timeout (seconds)

Usage

Direct

from adapters.local_llm import LlamaCppConnector

connector = LlamaCppConnector()
print(connector.health())

result = connector.chat([
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Summarize the decision."},
])
print(result["text"])

Exhaust Pipeline (automatic)

When DEEPSIGMA_LLM_BACKEND=local and EXHAUST_USE_LLM=1, the exhaust refiner routes LLM extraction through the local server automatically — no code changes needed.

Exhaust Adapter (manual)

from adapters.local_llm import LlamaCppConnector
from adapters.local_llm.exhaust import LocalLLMExhaustAdapter

connector = LlamaCppConnector()
adapter = LocalLLMExhaustAdapter(connector, project="my-project")
result = adapter.chat_with_exhaust([{"role": "user", "content": "Key risks?"}])

Tested Servers

Server	Notes
llama.cpp (`llama-server`)	Reference implementation
Ollama	`DEEPSIGMA_LOCAL_BASE_URL=http://localhost:11434`
vLLM	OpenAI-compatible mode
LocalAI	Drop-in OpenAI replacement
text-generation-webui	Enable `--api` flag

Backward Compatibility

Default backend remains anthropic — zero changes to existing deployments
EXHAUST_USE_LLM=1 remains the master on/off switch
ANTHROPIC_API_KEY only required when backend is anthropic

Related Pages

Exhaust Inbox — Full extraction pipeline docs
Snowflake — Cortex AI connector (similar pattern)
AskSage — AskSage connector + exhaust adapter

Full documentation: docs/30-local-inference.md

Σ OVERWATCH — Coherence Ops Platform • Current release: v2.1.0 • DeepSigma

Start
- Home
- Wiki Index
- Quickstart
- FAQ
Core
Schemas
FEEDS + Exhaust
- Exhaust Inbox
- FEEDS Schemas
Integrations
Reference Layer
- Manifesto
- Static Site
Ops
Excel-First
- Creative Director Suite
- Excel-First Governance
EDGE + ABP
Domain Modes
Governance
Meta

Local Inference

Local Inference

Why Local?

Setup

Environment Variables

Usage

Direct

Exhaust Pipeline (automatic)

Exhaust Adapter (manual)

Tested Servers

Backward Compatibility

Related Pages

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!