Query Language

contextdb's query language is built on three standards:

PostgreSQL-compatible SQL — DDL, DML, expressions, operators, JOINs, CTEs, ON CONFLICT DO UPDATE, $param binding
pgvector conventions — <=> operator for cosine similarity in ORDER BY
SQL/PGQ-style graph queries — GRAPH_TABLE(... MATCH ...) following SQL/PGQ conventions for bounded graph traversal (not a full standard implementation)

On top of these, contextdb adds declarative constraints for agentic memory workloads: IMMUTABLE, STATE MACHINE, DAG, RETAIN, and PROPAGATE. These are contextdb-specific extensions — everything else should feel familiar if you've used PostgreSQL.

All examples work in the Rust API via db.execute(sql, &params) where parameters are passed as HashMap<String, Value>. The CLI REPL does not support parameter binding ($param) — use literal values directly. Vector search works in the CLI using vector literals: ORDER BY embedding <=> [0.1, 0.2, 0.3] LIMIT 5.

Statements

CREATE TABLE

CREATE TABLE observations (
  id UUID PRIMARY KEY,
  data JSON,
  embedding VECTOR(384),
  recorded_at TIMESTAMP DEFAULT NOW()
) IMMUTABLE

See Table Options for IMMUTABLE, STATE MACHINE, DAG, RETAIN, and PROPAGATE.

ALTER TABLE

ALTER TABLE t ADD [COLUMN] col TYPE
ALTER TABLE t DROP [COLUMN] col
ALTER TABLE t RENAME COLUMN old TO new
ALTER TABLE t SET RETAIN 7 DAYS [SYNC SAFE]
ALTER TABLE t DROP RETAIN
ALTER TABLE t SET SYNC_CONFLICT_POLICY 'latest_wins'
ALTER TABLE t DROP SYNC_CONFLICT_POLICY

DROP TABLE

DROP TABLE t

CREATE INDEX

CREATE INDEX idx_name ON t (col)

INSERT

INSERT INTO observations (id, data, embedding)
VALUES ($id, $data, $embedding)

-- Multiple rows
INSERT INTO entities (id, name) VALUES ($id1, $name1), ($id2, $name2)

-- Upsert
INSERT INTO entities (id, name) VALUES ($id, $name)
ON CONFLICT (id) DO UPDATE SET name = $name

UPDATE / DELETE

UPDATE decisions SET status = 'superseded' WHERE id = $id
DELETE FROM scratch WHERE created_at < $cutoff

SELECT

SELECT [DISTINCT] columns FROM table
  [INNER JOIN | LEFT JOIN other ON condition]
  [WHERE condition]
  [ORDER BY col [ASC|DESC]]
  [LIMIT n]

CTEs

WITH active AS (
  SELECT id, name FROM entities WHERE status = 'active'
)
SELECT * FROM active WHERE name LIKE 'sensor%'

Multiple CTEs via comma separation. Non-recursive only.

Transactions

BEGIN
-- statements
COMMIT

-- or
ROLLBACK

Configuration

SET SYNC_CONFLICT_POLICY 'latest_wins'
SHOW SYNC_CONFLICT_POLICY
SET MEMORY_LIMIT '512M'
SHOW MEMORY_LIMIT
SET DISK_LIMIT '1G'
SET DISK_LIMIT 'none'
SHOW DISK_LIMIT

SHOW MEMORY_LIMIT returns limit, used, available, and startup_ceiling.

SHOW DISK_LIMIT returns the same columns for file-backed storage. On :memory: databases, disk limit commands are accepted but ignored.

Column Types

Type	Description	Example
`INTEGER` / `INT`	64-bit signed integer	`42`
`REAL` / `FLOAT`	64-bit floating point	`3.14`
`TEXT`	UTF-8 string	`'hello'`
`BOOLEAN` / `BOOL`	Boolean	`TRUE`, `FALSE`
`UUID`	128-bit UUID	`'550e8400-e29b-41d4-a716-446655440000'`
`TIMESTAMP`	Unix timestamp	`NOW()`
`JSON`	JSON value	`'{"key": "value"}'`
`VECTOR(n)`	Fixed-dimension float vector	`[0.1, 0.2, 0.3]`

NULL values display as NULL. Vectors display as [0.1, 0.2, ...].

Column Constraints

CREATE TABLE decisions (
  id UUID PRIMARY KEY,
  description TEXT NOT NULL,
  status TEXT NOT NULL,
  confidence REAL DEFAULT 0.0,
  email TEXT UNIQUE,
  intention_id UUID REFERENCES intentions(id)
)

Constraint	Description
`PRIMARY KEY`	Unique row identifier
`NOT NULL`	Value required
`UNIQUE`	No duplicate values
`DEFAULT expr`	Default value for inserts
`REFERENCES table(col)`	Foreign key

Foreign Key State Propagation

Trigger a state change on this row when the referenced row transitions:

CREATE TABLE decisions (
  id UUID PRIMARY KEY,
  status TEXT NOT NULL,
  intention_id UUID REFERENCES intentions(id)
    ON STATE archived PROPAGATE SET invalidated
) STATE MACHINE (status: active -> [invalidated, superseded])

When an intentions row transitions to archived, any decisions row referencing it transitions to invalidated.

Table Options

Table options appear after the closing ) of the column list. Multiple options can be combined.

IMMUTABLE

Rows cannot be updated or deleted after insertion. Useful for append-only data like observations and audit logs:

CREATE TABLE observations (
  id UUID PRIMARY KEY,
  data JSON,
  embedding VECTOR(384)
) IMMUTABLE

STATE MACHINE

Restrict a column's value transitions to declared edges:

CREATE TABLE decisions (
  id UUID PRIMARY KEY,
  status TEXT NOT NULL
) STATE MACHINE (status: draft -> [active, rejected], active -> [superseded])

Inserting a row sets the initial state. Updates that violate the transition graph are rejected.

DAG

Enforce directed acyclic graph constraint on specified edge types, preventing cycles:

CREATE TABLE edges (
  id UUID PRIMARY KEY,
  source_id UUID NOT NULL,
  target_id UUID NOT NULL,
  edge_type TEXT NOT NULL
) DAG('DEPENDS_ON', 'BASED_ON')

Inserting an edge that would create a cycle returns CycleDetected. Duplicate (source_id, target_id, edge_type) inserts are silently deduplicated.

RETAIN

Automatic row expiry. Units: SECONDS, MINUTES, HOURS, DAYS. Optional SYNC SAFE delays purging until synced:

CREATE TABLE scratch (
  id UUID PRIMARY KEY,
  data TEXT
) RETAIN 24 HOURS SYNC SAFE

Can also be set via ALTER TABLE:

ALTER TABLE scratch SET RETAIN 7 DAYS
ALTER TABLE scratch DROP RETAIN

PROPAGATE ON EDGE

Cascade state changes along graph edges when a row transitions:

CREATE TABLE decisions (
  id UUID PRIMARY KEY,
  status TEXT NOT NULL
) STATE MACHINE (status: active -> [invalidated, superseded])
  PROPAGATE ON EDGE CITES INCOMING STATE invalidated SET invalidated

When a decisions row transitions to invalidated, rows connected via incoming CITES edges also transition to invalidated. Options: INCOMING, OUTGOING, BOTH for edge direction. MAX DEPTH n limits traversal. ABORT ON FAILURE rolls back if any propagation fails.

PROPAGATE ON STATE ... EXCLUDE VECTOR

Remove a row's vector from similarity search results when it enters a given state, without deleting the row:

CREATE TABLE decisions (...)
  PROPAGATE ON STATE invalidated EXCLUDE VECTOR
  PROPAGATE ON STATE superseded EXCLUDE VECTOR

Combining Options

Options compose — a real-world table might use several:

CREATE TABLE decisions (
  id UUID PRIMARY KEY,
  description TEXT NOT NULL,
  status TEXT NOT NULL,
  confidence REAL,
  intention_id UUID REFERENCES intentions(id)
    ON STATE archived PROPAGATE SET invalidated,
  embedding VECTOR(384)
) STATE MACHINE (status: active -> [invalidated, superseded])
  PROPAGATE ON EDGE CITES INCOMING STATE invalidated SET invalidated
  PROPAGATE ON STATE invalidated EXCLUDE VECTOR
  PROPAGATE ON STATE superseded EXCLUDE VECTOR

Expressions and Operators

Comparison

=, !=, <>, <, <=, >, >=

Logical

AND, OR, NOT

Arithmetic

+, -, *, /

Pattern Matching

WHERE name LIKE 'sensor%'       -- % matches any substring
WHERE name LIKE 'item_3'        -- _ matches single character
WHERE name NOT LIKE '%draft%'

Range

WHERE confidence BETWEEN 0.5 AND 1.0
WHERE confidence NOT BETWEEN 0 AND 0.1

Set Membership

WHERE status IN ('active', 'draft')
WHERE id IN (SELECT id FROM other_table WHERE ...)
WHERE status NOT IN ('deleted', 'archived')

Subqueries in IN must select exactly one column.

NULL Checks

WHERE superseded_at IS NULL
WHERE embedding IS NOT NULL

Functions

Function	Returns	Description
`COUNT(*)`	INTEGER	Count all rows
`COUNT(col)`	INTEGER	Count non-NULL values in column
`COALESCE(a, b, ...)`	varies	First non-NULL argument
`NOW()`	TIMESTAMP	Current Unix timestamp

COUNT operates over the entire result set. No GROUP BY or HAVING — use CTEs or application-level grouping for aggregation.

Parameter Binding

In the Rust API, parameters are passed as HashMap<String, Value>:

let mut params = HashMap::new();
params.insert("entity_id".into(), Value::Uuid(id));
params.insert("type".into(), Value::Text("sensor".into()));

let result = db.execute(
    "SELECT * FROM entities WHERE id = $entity_id AND type = $type",
    &params,
)?;

The CLI does not support parameter binding — use literal values directly.

Graph Traversal

Graph queries use GRAPH_TABLE in the FROM clause with openCypher-subset MATCH patterns. The graph executor uses dedicated adjacency indexes and bounded BFS — graph traversal is a native operator, not recursive SQL.

Syntax

SELECT columns FROM GRAPH_TABLE(
  edge_table
  MATCH pattern
  [WHERE condition]
  COLUMNS (expr AS alias, ...)
)

The edge_table is a table with source_id, target_id, and edge_type columns.

Patterns

-- Outgoing edges
MATCH (a)-[:DEPENDS_ON]->(b)

-- Incoming edges
MATCH (a)<-[:BASED_ON]-(b)

-- Bidirectional
MATCH (a)-[:RELATES_TO]-(b)

-- Any edge type
MATCH (a)-[]->(b)

Variable-Length Paths

-- Between 1 and 3 hops
MATCH (a)-[:DEPENDS_ON]->{1,3}(b)

-- 1 to 10 hops (explicit upper bound required)
MATCH (a)-[:EDGE]->{1,10}(b)

An explicit upper bound is always required. Maximum traversal depth enforced by the engine is 10.

Filtering and Projection

Use WHERE to filter after traversal, COLUMNS to project results:

SELECT target_id FROM GRAPH_TABLE(
  edges
  MATCH (a)-[:DEPENDS_ON]->{1,3}(b)
  WHERE a.id = '550e8400-...'
  COLUMNS (b.id AS target_id)
)

Composing with SQL via CTEs

Graph results become a relational CTE for joins, filters, or vector search:

WITH deps AS (
  SELECT b_id FROM GRAPH_TABLE(
    edges
    MATCH (a)-[:DEPENDS_ON]->{1,3}(b)
    WHERE a.id = $start
    COLUMNS (b.id AS b_id)
  )
)
SELECT d.id, d.status FROM decisions d
INNER JOIN deps ON d.id = deps.b_id
WHERE d.status = 'active'

Graph + Vector: Neighborhood Similarity Search

Find semantically similar entities within a graph neighborhood:

WITH neighborhood AS (
  SELECT b_id FROM GRAPH_TABLE(
    edges
    MATCH (a)-[:RELATES_TO]->{1,2}(b)
    COLUMNS (b.id AS b_id)
  )
),
candidates AS (
  SELECT id, name, embedding
  FROM entities e
  INNER JOIN neighborhood n ON e.id = n.b_id
  WHERE e.is_deprecated = FALSE
)
SELECT id, name FROM candidates
ORDER BY embedding <=> $query
LIMIT 5

Vector Similarity Search

The `<=>` Operator

Cosine distance between two vectors. Used in ORDER BY for nearest-neighbor search:

-- Rust API with parameter binding
SELECT id, data FROM observations
ORDER BY embedding <=> $query_vector
LIMIT 10

-- CLI with vector literal
SELECT id, data FROM observations
ORDER BY embedding <=> [0.1, 0.2, 0.3]
LIMIT 10

Lower distance = more similar. A LIMIT clause is required — unbounded vector searches are rejected.

Pre-Filtered Search

Combine WHERE filters with vector ranking. The engine filters first, then scores only matching rows:

SELECT id, description FROM decisions
WHERE status = 'active'
ORDER BY embedding <=> $query
LIMIT 5

Indexing

The engine automatically selects the search strategy based on vector count:

Below ~1000 vectors: brute-force linear scan (exact)
At/above ~1000 vectors: HNSW approximate nearest neighbors (recall target >= 95%)

No manual index creation needed. Use .explain in the CLI to see which strategy is active:

contextdb> .explain SELECT id FROM observations ORDER BY embedding <=> $q LIMIT 5
HNSWSearch { table: "observations", limit: 5 }

SQL Comments

Both styles are stripped before parsing:

-- Line comment
SELECT * FROM entities; /* Block comment */

Unsupported Features

These are explicitly rejected with descriptive error messages:

Feature	Error
`WITH RECURSIVE`	`RecursiveCteNotSupported`
Window functions (`OVER`)	`WindowFunctionNotSupported`
`CREATE PROCEDURE` / `CREATE FUNCTION`	`StoredProcNotSupported`
Full-text search (`WHERE col MATCH pattern`)	`FullTextSearchNotSupported`
`GROUP BY` / `HAVING`	Not supported
`UNION` / `INTERSECT` / `EXCEPT`	Not supported
`INSERT ... SELECT`	Not supported
Subqueries outside `IN`	`SubqueryNotSupported`
SUM, AVG, MIN, MAX	Not supported (COUNT only)

FilesExpand file tree

query-language.md

Latest commit

History