Graph Module - Future Enhancements

Graph query optimization: cost-based algorithm selection (BFS, DFS, Dijkstra, A*, Bidirectional)
Adaptive plan caching with EMA-based cost model learning and TTL/size-based eviction
Parallel multi-source traversal for large fan-out queries (fan_out_threshold)
Subgraph isomorphism via VF2-style backtracking pattern matching
Distributed graph query execution across shards with intra-shard parallelism
Incremental graph query execution on live edge/node mutations (BFS-only)
GPU-accelerated BFS/DFS for massive graphs (≥1M nodes, CUDA/HIP backends)
EXPLAIN output integration with AQL for graph query plan inspection

Design Constraints

Algorithm selection must complete in < 1 ms for graphs with up to 10M nodes
Plan cache must enforce configurable max size (default 1,000 plans) and TTL (default 300 s)
Adaptive cost model must converge to < 10% mean absolute error after 100 executions per algorithm
Parallel traversal must not exceed configured fan_out_threshold per frontier expansion
Incremental query handles must be explicitly released; no implicit memory growth
GPU traversal kernel must produce bit-identical results to CPU baseline (deterministic)
Distributed query execution is intra-shard parallel; cross-shard edge following is caller-coordinated
All public APIs are thread-safe via read-write locking; incremental execution is explicitly single-threaded

Required Interfaces

Interface	Consumer	Notes
`GraphQueryOptimizer::optimize(query)`	AQL execution engine	Returns a `QueryPlan` with cost estimate and selected algorithm
`PlanCache::get/put/evict`	Query optimizer	Keyed by structural query fingerprint; bounded by `max_plans` and TTL
`ParallelTraversal::execute(sources, query)`	Query optimizer	Spawns intra-frontier worker threads up to `max_parallel_workers`
`SubgraphIsomorphism::executeVF2(pattern, target)`	AQL graph pattern match	Returns all matching subgraph mappings
`DistributedGraphQuery::executeAcrossShards(plan)`	Distributed query coordinator	Returns globally cheapest path; shard results merged by caller
`IncrementalQueryHandle::applyMutation(edge)`	Graph API handler (`POST /graph/edge`)	BFS-only; notifies registered listeners
`GPUTraversal::executeBFS(graph, source)`	Query optimizer (CUDA path)	Requires `THEMIS_ENABLE_CUDA`; CPU fallback active otherwise
`GraphQueryOptimizer::explain(plan)`	AQL EXPLAIN	Returns human-readable plan tree with cost breakdown

Planned Features

Structural Plan Reuse (Query Plan Reuse Across Structurally Similar Queries) ✅ DONE

Priority: High
Target Version: v1.7.0

Reuse optimization plans across graph queries that share the same query pattern and constraints but have different start/target vertex IDs, eliminating redundant re-planning work.

Implemented Features:

✅ generateStructuralCacheKey — builds a cache key from pattern + constraints, omitting vertex IDs, so all structurally identical queries map to the same entry
✅ Two-level cache lookup in all four optimize* methods:
1. Exact key (pattern:start:target[:depth][:type]…) — fastest path for repeated identical queries
2. Structural key (struct:pattern[:depth][:type][:uv][:ue][:par]…) — fallback for same-shape queries
✅ Structural-to-exact key promotion on hit (avoids repeated structural lookups)
✅ Structural key covers: max_depth/depth-hint, edge_type, unique_vertices, unique_edges, enable_parallel, count of forbidden_vertices, count of required_vertices
✅ Cache hit/miss counters exposed via getQueryMetrics().plan_cache_hits/misses
✅ Disabled cleanly when setPlanCachingEnabled(false) is called
✅ Cleared by clearPlanCache() (removes both exact and structural entries)

API:

optimizer.setPlanCachingEnabled(true);  // default

// Cold start: populates exact key "0:A:D" + structural key "struct:0"
auto plan1 = optimizer.optimizeShortestPath("A", "D", constraints);

// Structural hit: finds "struct:0", promotes plan to exact key "0:B:C"
auto plan2 = optimizer.optimizeShortestPath("B", "C", constraints);

// plan1 and plan2 are identical (same algorithm, cost, estimates)
assert(plan1->algorithm           == plan2->algorithm);
assert(plan1->estimated_cost      == plan2->estimated_cost);
assert(plan1->estimated_nodes_explored == plan2->estimated_nodes_explored);

// Queries with different constraints get different structural keys — no reuse
QueryConstraints c1; c1.max_depth = 2;
QueryConstraints c2; c2.max_depth = 5;
optimizer.optimizeShortestPath("A", "D", c1);  // stores "struct:0:depth=2"
optimizer.optimizeShortestPath("B", "C", c2);  // miss — "struct:0:depth=5" not present

// Monitor cache efficiency
const auto& m = optimizer.getQueryMetrics();
std::cout << "Cache hits:   " << m.plan_cache_hits.load()   << "\n";
std::cout << "Cache misses: " << m.plan_cache_misses.load() << "\n";

Key Design Decisions:

Structural key intentionally excludes timeout_ms, max_results, num_threads, and graph_id because these fields affect only execution behaviour, not which algorithm is selected or how costs are estimated.
For K_HOP queries, the hop count k is included as a depth hint in the structural key, so k=2 and k=3 never share a plan.
For PATTERN_MATCH queries, both vertex count and edge count are encoded in the key to distinguish patterns of different shapes.

Parallel Graph Execution ✅ DONE

Priority: High
Target Version: v1.7.0

Enable parallel execution of graph traversals for improved performance on large graphs.

Implemented Features:

✅ Multi-threaded BFS (level-parallel frontier expansion via std::async)
✅ Parallel Δ-Stepping Dijkstra (bucket-based parallelism, no global locks)
✅ Configurable thread pool size (num_threads, 0 = auto-detect)
✅ Thread-safe adjacency access via GraphIndexManager::outAdjacency
✅ Intra-frontier fan-out parallelism for BFS (fan_out_threshold: when frontier ≥ threshold, neighbor lookups are dispatched to multiple threads; 0 = disabled)

Planned (not yet implemented):

Work-stealing queue for load balancing
Lock-free visited sets for BFS

Adaptive Cost Model ✅ DONE

Priority: High
Target Version: v1.7.0 / v1.8.0

Automatically improve cost estimates based on actual execution statistics.

Implemented: See GraphQueryOptimizer::AlgorithmCostModel below.

Features:

✅ Learning from execution history (EMA per algorithm)
✅ Per-algorithm cost model with confidence level
✅ Automatic model re-calibration via recordExecution
✅ Adaptive plan selection: selectAlgorithm compares estimated costs for all feasible algorithms using learned EMA data when confidence > 0
✅ exportCostModel() / importCostModel() for persistence
✅ Disabled when enableAdaptiveLearning(false) is called
✅ Batch calibration from full execution history via calibrateFromHistory() (Issue: #2386)
✅ Cost accuracy tracking: ExecutionStats::estimated_cost_ms populated by all execute* methods; AlgorithmCalibrationStats reports mean_estimated_ms, mean_absolute_error_ms, and cost_ratio after calibration (Issue: #2386)
Persistence to disk (use exportCostModel() + file I/O)
Decay of old statistics (future enhancement)
Separate models per graph type/size (future)

API:

GraphQueryOptimizer optimizer(graph_mgr);
// Adaptive learning is enabled by default

// After many executions, cost estimates improve automatically
optimizer.executeBFS("A", 5, c);   // observed ~8ms → EMA updates
optimizer.executeBFS("A", 5, c);   // EMA converges towards actual timing

// Export learned model to persist across restarts
std::string model_json = optimizer.exportCostModel();
// e.g. {"BFS":{"ema_cost_ms":8.1,"exec_count":2,"confidence":0.02}}

// Import model in another instance (seeds with pre-learned data)
GraphQueryOptimizer optimizer2(graph_mgr2);
optimizer2.importCostModel(model_json);

// Disable adaptive learning if deterministic plans are required
optimizer.enableAdaptiveLearning(false);
bool is_on = optimizer.isAdaptiveLearningEnabled(); // false

Learning Algorithm:

ema_cost_ms = alpha * observed_ms + (1 - alpha) * ema_cost_ms   (alpha = 0.1)
confidence  = min(1.0, exec_count / 100)
blended_cost = (1 - confidence) * base_cost + confidence * (ema_cost_ms * 10)

Implementation:

AlgorithmCostModel struct: EMA cost, execution count, confidence
recordExecution() calls AlgorithmCostModel::update(ms) for the executed algorithm
estimateCost() blends the learned EMA cost (scaled from ms to cost units) proportional to confidence
exportCostModel() serialises all entries to a JSON string
importCostModel() deserialises with unknown-algorithm and malformed-JSON safety

Distributed Graph Queries ✅ DONE

Priority: Medium
Target Version: v1.8.0

Enable graph queries across distributed ThemisDB instances.

Implemented Features:

✅ DistributedGraphManager — coordinator that fans out graph traversals to registered ShardGraphExecutor instances in parallel, merges results
✅ LocalShardGraphExecutor — thin wrapper around GraphQueryOptimizer for in-process / single-node shards (tests + embedded mode)
✅ ShardGraphExecutor — pluggable interface for per-shard execution (supports remote implementation via RPC transport)
✅ DistributedGraphConfig — configurable partition strategy (HASH, RANGE, GEO, CUSTOM), consistency level (EVENTUAL, STRONG), replication factor, timeout, parallelism cap
✅ Partition-aware vertex routing via resolveShardForVertex (FNV-1a hash → uniform bucket assignment)
✅ Shard qualifier syntax: "<vertex_id>@<shard_id>" for explicit routing
✅ Distributed shortest path (shortestPath): Dijkstra on each healthy shard in parallel; globally cheapest path returned
✅ Distributed k-hop neighbors (kHopNeighbors): BFS on all healthy shards in parallel; de-duplicated merged result
✅ Shard-aware plan generation (optimizePlan): returns OptimizationPlan with is_distributed=true, shard_ids, recommended_parallelism fields
✅ Fault tolerance: unhealthy shards (isHealthy() == false) skipped automatically
✅ OptimizationPlan extended with shard-aware fields (is_distributed, shard_ids, recommended_parallelism) — backward-compatible with single-node (defaults to false/empty/1)
✅ explainPlan() updated to print distributed shard info when is_distributed=true

API:

// Define graph partitioning strategy
DistributedGraphConfig config;
config.partitioning = PartitionStrategy::HASH;  // or RANGE, GEO, CUSTOM
config.replication_factor = 3;
config.consistency = ConsistencyLevel::EVENTUAL;

DistributedGraphManager dist_graph(config);
dist_graph.addShard("shard1", std::make_shared<LocalShardGraphExecutor>("shard1", db1));
dist_graph.addShard("shard2", std::make_shared<LocalShardGraphExecutor>("shard2", db2));

// Vertex IDs may carry explicit shard qualifiers:
auto result = dist_graph.shortestPath("node_A@shard1", "node_B@shard2", constraints);

// K-hop neighbors across all shards:
auto neighbors = dist_graph.kHopNeighbors("node_A", 3, constraints);

// Shard-aware plan:
auto plan = dist_graph.optimizePlan("A", "D",
    GraphQueryOptimizer::QueryPattern::SHORTEST_PATH);
// plan->is_distributed == true, plan->shard_ids == {"shard1","shard2"}

GPU-Accelerated Graph Processing ✅ IMPLEMENTED (BFS/DFS)

Priority: Medium
Target Version: v1.9.0

Offload graph computations to GPU for massive parallelism.

Implemented Features (Issue #1829):

✅ GPUGraphTraversal class — CSR-based BFS/DFS with CPU fallback (include/graph/gpu_traversal.h, src/graph/gpu_traversal.cpp)
✅ Level-synchronous BFS (mirrors GPU parallel frontier expansion)
✅ Iterative DFS with depth tracking
✅ CSR (Compressed Sparse Row) graph representation for cache-efficient traversal
✅ Integer node ID mapping (string ↔ uint32_t) for performance
✅ use_gpu / gpu_device fields added to GraphQueryOptimizer::QueryConstraints
✅ GPU dispatch in executeBFS() / executeDFS() of GraphQueryOptimizer
✅ Automatic CPU fallback when no GPU hardware is present
✅ GraphIndexManager::allVertices() — enumerate all vertices (with RocksDB fallback)

API (Implemented):

// Via GPUGraphTraversal directly:
GPUGraphTraversal gpu_trav(graph_manager);
gpu_trav.load();
auto result = gpu_trav.bfs("start_vertex", cfg);
// result->visited_vertices, result->distances, result->used_cpu_fallback

// Via GraphQueryOptimizer (recommended):
GraphQueryOptimizer::QueryConstraints constraints;
constraints.use_gpu    = true;
constraints.gpu_device = 0;  // GPU index; ignored on CPU fallback

auto result = optimizer.executeBFS("start", 10, constraints);
// Automatically uses GPUGraphTraversal; falls back to CPU when no GPU present

Features:

CUDA/OpenCL graph kernels
GPU-accelerated BFS/DFS
GPU PageRank and centrality
Hybrid CPU-GPU execution
Automatic GPU memory management

Benefits:

10-100x speedup for large dense graphs
Handle graphs with millions of edges
Real-time analytics on massive graphs
Reduced cloud compute costs

API:

GraphQueryOptimizer::QueryConstraints constraints;
constraints.use_gpu = true;
constraints.gpu_device = 0;  // GPU index

auto result = optimizer.executeBFS("start", 10, constraints);
// Automatically uses GPU if graph size > threshold

Suitable Workloads:

Dense graphs (high edge-to-node ratio)
Large traversal depth (k > 10)
Analytics algorithms (PageRank, betweenness)
Pattern matching with many patterns

Implementation:

Use NVIDIA cuGraph library
Implement custom CUDA kernels for ThemisDB-specific operations
Transfer graph data to GPU memory
Execute kernels with optimal block/grid sizes
Transfer results back to CPU

Challenges:

GPU memory limitations
PCIe transfer overhead
Algorithm suitability for GPU
Complexity of CUDA programming

`DistributedGraphManager`: Read-Path Lock Upgrade

Priority: Medium Target Version: v1.8.0

distributed_graph.cpp uses std::lock_guard<std::mutex> for all shard operations including read-only lookups (getShard at line 110, listShards at line 120, execute at line 126). All reader threads serialize unnecessarily.

Implementation Notes:

[ ] Replace std::mutex shards_mutex_ with std::shared_mutex in DistributedGraphManager; upgrade getShard, listShards, and execute (read path) to std::shared_lock.
[ ] Keep addShard and removeShard (write path) on std::unique_lock.
[ ] Add a TSAN-enabled stress test: 8 concurrent execute() threads + 1 addShard() thread.

Priority: Medium
Target Version: v1.7.0

Extend PathConstraints with more sophisticated constraint types.

Features:

Node Property Constraints ✅ DONE – addNodePropertyConstraint(key, value) prunes BFS traversal
Weight Constraints ✅ DONE – addMaxWeight(threshold) prunes BFS; addMinWeight(threshold) rejects at acceptance
Schema-Aware Node Label Hints ✅ DONE – QueryConstraints::node_labels filters BFS/DFS by _labels field
Excluded Edge Type Hints ✅ DONE – QueryConstraints::excluded_edge_types reduces cost-model fanout estimate
Temporal Constraints: Path valid at specific time ⏳ Planned
Probability Constraints: Min probability for uncertain graphs ⏳ Planned
Resource Constraints: Capacity limits on paths ⏳ Planned
Semantic Constraints: Ontology-based path rules ⏳ Planned
Geo-Fence Constraints: Spatial boundaries for paths ⏳ Planned

Implemented API:

PathConstraints constraints(&graph_mgr);

// Node property constraint (v1.7.0)
constraints.addNodePropertyConstraint("country", "USA");
// → Only traverse nodes where node.country == "USA"

// Weight constraints (v1.7.0)
constraints.addMaxWeight(100.0);  // Total path weight <= 100 (BFS pruning)
constraints.addMinWeight(10.0);   // Total path weight >= 10 (acceptance check)

auto paths = constraints.findConstrainedPaths("start", "end", 10);

Planned API (not yet implemented):

// Temporal constraint
constraints.addTemporalConstraint(
    start_time_ms,
    end_time_ms,
    TemporalMode::VALID_DURING
);

// Resource constraint
constraints.addResourceCapacity("bandwidth", 1000);

// Geo-fence constraint
constraints.addGeoFence(
    center_lat, center_lon, radius_km,
    GeoFenceMode::MUST_STAY_INSIDE
);

Implementation Notes:

getNodeField(vertexId, fieldName) added to GraphIndexManager (uses node:<pk> key format)
ConstraintType::MAX_WEIGHT / MIN_WEIGHT added to PathConstraints::ConstraintType enum
Constraint::double_value field stores threshold for weight constraints
BFS pruner checks MAX_WEIGHT after each edge weight accumulation
validatePath enforces NODE_PROPERTY for all nodes; weight constraints handled by findConstrainedPaths

Query Rewriting for Graph Optimization

Priority: Medium
Target Version: v1.8.0

Automatically rewrite graph queries for better performance.

Features:

Common subexpression elimination
Predicate pushdown to graph layer
Join reordering for graph patterns
Materialized view utilization
Query decomposition for parallelism

Benefits:

Improved query performance without user intervention
Better integration with relational/document queries
Optimal execution plans for complex queries
Reduced redundant computation

Example Rewrite:

-- Original query
FOR v1 IN vertices
  FILTER v1.type == "Person"
  FOR v2 IN 1..3 OUTBOUND v1 GRAPH "social"
    FILTER v2.country == "USA"
    RETURN {person: v1, friend: v2}

-- Rewritten query
FOR v1 IN vertices
  FILTER v1.type == "Person"
  FOR v2 IN 1..3 OUTBOUND v1 GRAPH "social"
    PRUNE v2.country != "USA"  // Early pruning
    FILTER v2.country == "USA"
    RETURN {person: v1, friend: v2}

Rewrite Rules:

Push predicates into graph traversal (prune early)
Decompose multi-pattern queries into independent subqueries
Materialize frequently accessed subgraphs
Convert repeated traversals to single traversal with caching
Reorder multi-hop traversals based on selectivity

Approximate Graph Algorithms

Priority: Low
Target Version: v2.0.0

Trade accuracy for speed with approximate algorithms.

Features:

Approximate shortest paths (A* with relaxed heuristic)
Approximate PageRank (power iteration with early stop)
Approximate reachability (sketching techniques)
Approximate community detection (sampling-based)
Confidence bounds on approximations

Benefits:

Sub-second response on billion-edge graphs
Handle interactive queries on massive graphs
Reduce cloud compute costs
Enable exploratory graph analytics

API:

GraphQueryOptimizer::QueryConstraints constraints;
constraints.approximation_mode = ApproximationMode::FAST;
constraints.approximation_error = 0.05;  // 5% error tolerance

auto result = optimizer.optimizeShortestPath("A", "B", constraints);
// May return path within 5% of optimal length

// Access approximation metadata
std::cout << "Approximation error bound: " 
          << result.metadata.error_bound << std::endl;
std::cout << "Confidence: " 
          << result.metadata.confidence << std::endl;

Algorithms:

Bidirectional A with beam search*: Prune low-probability paths
Landmark-based shortest paths: Precompute distances to landmarks
Sketching for reachability: Probabilistic data structures
Sampling for PageRank: Monte Carlo estimation
Local clustering: Expand only relevant subgraph

Multi-Layer Graph Support

Priority: Low
Target Version: v2.0.0

Support graphs with multiple edge types and layers.

Features:

Layer-specific traversals
Cross-layer path finding
Layer aggregation queries
Layer-aware analytics
Heterogeneous graph queries

Benefits:

Model complex multi-relational data
Support social networks with multiple edge types
Enable knowledge graph reasoning
Better domain modeling

Example:

// Define multi-layer graph
MultiLayerGraph mlg;
mlg.addLayer("friendship", EdgeType::UNDIRECTED);
mlg.addLayer("follows", EdgeType::DIRECTED);
mlg.addLayer("colleague", EdgeType::UNDIRECTED);

// Query across layers
auto result = mlg.shortestPath(
    "user_A", "user_B",
    layers = {"friendship", "colleague"},  // Use these layers
    layer_weights = {1.0, 2.0}             // Friendship preferred
);

// Aggregate across layers
auto centrality = mlg.pageRank(
    layers = {"friendship", "follows"},
    aggregation = AggregationMode::SUM  // or AVG, MAX
);

Graph Machine Learning Integration

Priority: Low
Target Version: v2.0.0

Integrate graph neural networks and embeddings.

Features:

Graph embedding generation (Node2Vec, DeepWalk)
Graph neural network inference
Link prediction
Node classification
Graph similarity search

Benefits:

Enable AI/ML on graph data
Predict missing edges
Classify unlabeled nodes
Find similar subgraphs
Integrate with LLM module

API:

// Train graph embeddings
GraphEmbedding embedding(graph_mgr);
embedding.train(
    algorithm = EmbeddingAlgorithm::NODE2VEC,
    dimensions = 128,
    walk_length = 80,
    num_walks = 10
);

// Get node embeddings
auto vec = embedding.getNodeEmbedding("user_A");

// Link prediction
auto predictions = embedding.predictLinks("user_A", k=10);
// Returns top-k most likely edges

// Node classification
auto label = embedding.classifyNode("user_B", model);

Graph Visualization Integration

Priority: Low
Target Version: v2.0.0

Built-in graph visualization and exploration.

Features:

Graph layout algorithms (force-directed, hierarchical)
Interactive exploration UI
Subgraph extraction for visualization
Real-time updates on graph changes
Export to common formats (GraphML, GEXF)

Benefits:

Explore graphs visually
Debug graph queries interactively
Present graph analytics results
Integrate with BI tools

API:

// Extract subgraph for visualization
GraphVisualizer viz(graph_mgr);
auto subgraph = viz.extractSubgraph(
    center_node = "user_A",
    max_depth = 2,
    max_nodes = 100,
    layout = LayoutAlgorithm::FORCE_DIRECTED
);

// Export to GraphML
viz.exportGraphML(subgraph, "output.graphml");

// Generate interactive HTML
viz.exportInteractiveHTML(subgraph, "output.html");

Research Topics

Quantum Graph Algorithms

Priority: Research
Target Version: TBD

Explore quantum algorithms for graph problems.

Potential Applications:

Quantum walk for graph search
Grover's algorithm for pattern matching
Quantum annealing for optimization problems
Exponential speedup for specific problems

Challenges:

Quantum hardware availability
Algorithm design complexity
Limited problem applicability
Noise and error correction

Graph Streaming Algorithms

Priority: Research
Target Version: TBD

Process graphs as streams of edge insertions/deletions.

Potential Applications:

Real-time social network analysis
Continuous PageRank updates
Incremental community detection
Dynamic shortest paths

Challenges:

Maintaining accuracy with limited memory
Handling high-velocity streams
Dealing with concept drift
Balancing latency and accuracy

Implementation Priorities

v1.7.0 (Q3 2026):

✅ Parallel Graph Execution (BFS + Dijkstra Δ-Stepping)
✅ Adaptive Cost Model
✅ Advanced Constraint Types
✅ Latency Histogram & Prometheus Scrape Endpoint
✅ Query Rate Limiter (per-second budget, ERR_GRAPH_RATE_LIMIT_EXCEEDED)
✅ Property Graph Schema-Aware Optimizer Hints (Issue: #1819)

v1.8.0 (Q1 2027):

✅ Distributed Graph Queries
✅ ANN-accelerated candidate edge discovery (setANNIndex(IAnnIndex*) + rebuildANNIndex())
✅ CEP event emission for edge mutations (setCEPEventCallback(std::function<void(themisdb::analytics::Event)>))
Query Rewriting

v1.9.0 (Q3 2027):

GPU-Accelerated Graph Processing
Approximate Algorithms

v2.0.0 (Q1 2028):

Multi-Layer Graph Support
Graph ML Integration
Graph Visualization

Implemented Features (v1.7.0)

Property Graph Schema-Aware Optimizer Hints ✅ DONE

QueryConstraints::node_labels and QueryConstraints::excluded_edge_types allow callers to embed property-graph schema information directly in a query so that the optimizer can choose a more accurate cost estimate and the traversal runtime can prune the search space accordingly.

node_labels (OR semantics): only visit nodes that carry at least one of the listed labels (matched against the comma-separated _labels field stored on each node entity by PropertyGraphManager).

excluded_edge_types: cost-model hint that reduces the estimated edge fanout by the fraction represented by each excluded type (uses edge_type_selectivity from GraphStatistics).

setNodeLabelStats(label_counts) lets callers supply per-label node counts so that the optimizer can derive label selectivity automatically and apply it during estimateCost.

// Register schema statistics (e.g. loaded from PropertyGraphManager)
optimizer.setNodeLabelStats({{"Person", 400}, {"Company", 100}});
// → Person selectivity = 400/total_nodes, Company = 100/total_nodes

// Restrict BFS to Person nodes only
GraphQueryOptimizer::QueryConstraints c;
c.node_labels = {"Person"};       // only traverse Person-labeled nodes
c.excluded_edge_types = {"DEPRECATED"};  // cost model reduces fanout

auto plan = optimizer.optimizeShortestPath("alice", "bob", c);
// plan.active_schema_hints describes the active hints
// plan.explanation includes "Schema Hints Active:" section

auto result = optimizer.executeBFS("alice", 3, c);
// BFS skips neighbors without a "Person" label in their _labels field

Key design decisions:

node_labels filtering is applied only to outgoing neighbors, never to the explicitly provided start vertex (which is controlled by the caller).
nodeMatchesLabels performs whole-token matching so "Person" never accidentally matches "SuperPerson".
When a label is not present in node_label_selectivity, a conservative default selectivity of 0.5 is applied to cost estimation.
Schema hints are included in both exact and structural plan-cache keys so that queries with different hints never share a cached plan.

QueryConstraints::timeout_ms – when set to a non-zero value BFS and DFS traversals abort after the given number of milliseconds and return ERR_QUERY_TIMEOUT. This provides a first line of defence for SLO budgets.

GraphQueryOptimizer::QueryConstraints constraints;
constraints.timeout_ms = 500; // abort after 500 ms
auto result = optimizer.executeBFS("start", 5, constraints);
if (!result) {
    // result.error().code == ERR_QUERY_TIMEOUT
}

Aggregate Observability Metrics ✅ DONE

GraphQueryOptimizer::getQueryMetrics() returns a GraphQueryMetrics snapshot with cumulative counters that can be scraped by a Prometheus exporter or forwarded to an OpenTelemetry collector:

Metric	Description
`total_queries`	Total traversal executions since startup
`failed_queries`	Executions that returned no paths
`timed_out_queries`	Executions aborted by `timeout_ms`
`total_execution_time_ms`	Sum of all execution durations (ms)
`max_execution_time_ms`	Peak single-query duration (ms)
`total_nodes_explored`	Cumulative nodes visited
`total_edges_traversed`	Cumulative edges traversed
`plan_cache_hits` / `misses`	Plan-cache efficiency counters

See docs/graph_roadmap.md for the full observability checklist.

Graph-Specific Structured Error Codes ✅ DONE

Six dedicated error codes added to errors::ErrorCode in range 6400-6499, each registered with full metadata (category, severity, solution, keywords):

Code	Constant	Meaning
6400	`ERR_GRAPH_NO_SUCH_VERTEX`	Vertex not found in graph
6401	`ERR_GRAPH_NO_SUCH_EDGE`	Edge not found in graph
6402	`ERR_GRAPH_CONSTRAINT_CONFLICT`	Contradictory path constraints
6403	`ERR_GRAPH_PATH_NOT_FOUND`	No path satisfies constraints
6404	`ERR_GRAPH_CYCLE_DETECTED`	Cycle in acyclic-required traversal
6405	`ERR_GRAPH_DEPTH_EXCEEDED`	Traversal depth limit exceeded

executeBFS/executeDFS now return ERR_GRAPH_NO_SUCH_VERTEX instead of the generic ERR_QUERY_EXECUTION_FAILED for unknown vertex lookups.

Explain / Dry-Run Query API ✅ DONE

GraphQueryOptimizer::explainConstrainedPath() – returns an OptimizationPlan without executing any traversal, enabling callers to inspect the chosen algorithm, cost estimate, and constraint summary before committing to actual graph traversal. Does not increment getQueryMetrics().total_queries.

themis::graph::PathConstraints constraints(&graph_mgr);
constraints.addRequiredNode("checkpoint");
constraints.addMaxLength(6);

// Inspect the plan without touching the graph
auto plan = optimizer.explainConstrainedPath("start", "end", constraints);
std::cout << optimizer.explainPlan(plan.value()); // algorithm, cost, constraints

Parallel BFS (`enable_parallel` + `num_threads`) ✅ DONE

executeBFS now supports level-parallel frontier expansion. The BFS is rewritten as a level-by-level loop; each level's neighbor lookups are dispatched as independent std::async tasks when enable_parallel=true.

GraphQueryOptimizer::QueryConstraints c;
c.enable_parallel = true;  // opt-in; default is false (backward-compatible)
c.num_threads = 4;         // 0 = hardware_concurrency/2, max 16
auto result = optimizer.executeBFS("start", 5, c);

Produces the same set of reachable nodes as the sequential path (no correctness regression). The num_threads field defaults to 0 (auto-detect).

Parallel Dijkstra (Δ-Stepping) ✅ DONE

GraphQueryOptimizer::executeDijkstra now uses the Δ-Stepping algorithm when constraints.enable_parallel = true, giving bucket-based parallelism without global locks.

Algorithm:

Δ is sampled from the start vertex's first-hop average edge weight (default 1.0).
Vertices are partitioned into buckets of width Δ.
Within each bucket, light-edge (weight ≤ Δ) relaxations are dispatched as std::async tasks (one per thread chunk); each task returns a local vector<RelaxResult> with no shared writes.
The main thread applies updates serially – no data races on dist[] / parent[].
Heavy edges (weight > Δ) are relaxed serially after the bucket is stable.

GraphQueryOptimizer::QueryConstraints c;
c.enable_parallel = true;   // opt-in; default is false (backward-compatible)
c.num_threads = 4;          // 0 = hardware_concurrency/2, max 16
auto result = optimizer.executeDijkstra("A", "D", c);
// result->totalCost == optimal weighted shortest-path cost
// result->path      == reconstructed path [A, ..., D]

Produces the same totalCost as sequential Dijkstra (verified by Dijkstra_Parallel_ProducesSameResultAsSequential).

Edge Property Constraints ✅ DONE

PathConstraints::addEdgePropertyConstraint(field_name, expected_value) – prunes edges during findConstrainedPaths BFS traversal by checking each candidate edge's field value against the required value.

PathConstraints c(&graph_mgr);
c.addEdgePropertyConstraint("type", "follows"); // only traverse "follows" edges
auto paths = c.findConstrainedPaths("user1", "user5", 10);

validatePath also enforces EDGE_PROPERTY on complete paths. describeConstraints() lists each edge property constraint as: "Edge property: <key> = <value>".

New backing API: GraphIndexManager::getEdgeField(edgeId, fieldName) returns an std::optional<std::string> without needing the graph ID.

Adaptive Cost Model ✅ DONE

GraphQueryOptimizer::AlgorithmCostModel – per-algorithm EMA cost tracking with confidence-weighted blending into estimateCost().

// Enabled by default; runs automatically with each execute* call
optimizer.executeBFS("start", 5, c);   // records 8.1ms → EMA updates

// Export / import for warm-start across restarts
std::string json = optimizer.exportCostModel();
optimizer2.importCostModel(json);

// Opt out for deterministic plans
optimizer.enableAdaptiveLearning(false);

Key properties:

EMA alpha = 0.1 (smoothes out outliers)
Confidence = min(1.0, exec_count / 100) (0 → purely theoretical, 1 → fully learned)
estimateCost() blends: (1 - conf) * base + conf * (ema_ms * 10)
ExecutionStats::algorithm field enables recordExecution to route to the correct model
exportCostModel() / importCostModel() use JSON; unknown algo keys are silently ignored

Node Property Constraints ✅ DONE

PathConstraints::addNodePropertyConstraint(field, value) – prunes BFS traversal and validates complete paths by looking up each node's field in the graph store.

PathConstraints c(&graph_mgr);
c.addNodePropertyConstraint("country", "USA");
// BFS skips any next_node whose country field ≠ "USA"
auto paths = c.findConstrainedPaths("user1", "user5", 10);

Backed by new GraphIndexManager::getNodeField(vertexId, fieldName) which reads from node:<pk> key format (same as KeySchema::makeGraphNodeKey).

Weight Constraints ✅ DONE

PathConstraints::addMaxWeight(threshold) and addMinWeight(threshold) implement total-path-weight constraints backed by ConstraintType::MAX_WEIGHT / MIN_WEIGHT and a new Constraint::double_value field.

PathConstraints c(&graph_mgr);
c.addMaxWeight(10.0);   // BFS prunes states where accumulated cost > 10.0
c.addMinWeight(2.0);    // Final acceptance rejects paths with cost < 2.0
auto paths = c.findConstrainedPaths("A", "D", 5);

Edge weights are read from each edge's _weight field (default 1.0 when absent).

Latency Histogram & Prometheus Export ✅ DONE

GraphQueryMetrics::LatencyHistogram – 10-bucket fixed-width histogram with upper bounds (ms): 1, 5, 10, 25, 50, 100, 250, 500, 1000, +Inf.

const auto& hist = optimizer.getQueryMetrics().latency_histogram;
double p99 = hist.percentileMs(0.99);
double p50 = hist.percentileMs(0.50);

GET /api/v1/graph/metrics/prometheus exports all counters and the full latency histogram in Prometheus text exposition format. p50, p95, and p99 are also exported as computed gauges.

Query Rate Limiter ✅ DONE

Per-second token-window rate limiting for graph queries.

optimizer.setMaxQueriesPerSecond(200);  // 200 QPS max
// Excess queries return ERR_GRAPH_RATE_LIMIT_EXCEEDED (6406)
optimizer.setMaxQueriesPerSecond(0);    // disable

Applies to all five execute methods. Uses atomic CAS sliding-window for thread-safe operation without mutexes.

Subgraph Isomorphism (Pattern Matching) ✅ DONE

GraphQueryOptimizer::executeSubgraphIsomorphism – finds all injective mappings from a pattern graph onto subgraphs of the data graph (VF2-style backtracking).

// Pattern: u -> v -> w (chain of three vertices)
std::vector<std::string> pattern_verts = {"u", "v", "w"};
std::vector<std::pair<std::string,std::string>> pattern_edges = {{"u","v"},{"v","w"}};

auto result = optimizer.executeSubgraphIsomorphism(pattern_verts, pattern_edges);
// result.value().matches[i] is an unordered_map<string,string>
// mapping pattern vertex labels to data vertex IDs

// With constraints
GraphQueryOptimizer::QueryConstraints c;
c.max_results = 10;          // stop after first 10 matches
c.timeout_ms = 500;          // abort after 500ms
c.forbidden_vertices = {"X"}; // X must not appear in any match

auto limited = optimizer.executeSubgraphIsomorphism(pattern_verts, pattern_edges, c);

Key properties:

Injective: each data vertex appears at most once per match
Directed edge consistency: every pattern edge (u,v) must be present as a directed edge in the data graph for the matched vertices
Pattern vertices are user-defined labels (not data vertex IDs)
Supports max_results, timeout_ms, and forbidden_vertices constraints
Execution statistics available via optional ExecutionStats* output parameter
Rate-limited by setMaxQueriesPerSecond() like all other execute methods
Integrates with optimizePatternMatch() for cost estimation and plan caching

Scheduled Semantic Graph Edge Refresh

Status: ✅ Production Ready — Core engine, safety gates, audit trail, anomaly detection, ChangeFeed integration, ANN-accelerated candidate discovery, and CEP event emission are complete.

Issue: #FEATURE/ScheduledGraphEdgeRefresh
Files: include/graph/scheduled_edge_refresh.h, src/graph/scheduled_edge_refresh.cpp
Tests: tests/graph/test_scheduled_edge_refresh.cpp (60+ tests)
Docs: docs/scheduled_edge_refresh.md, docs/de/scheduled_edge_refresh.md

Background & State of the Art

Keeping a graph semantically current as the underlying data evolves is a well-studied problem. The key research areas that inform this feature are:

Research Area	Approach	Relevance to ThemisDB
Dynamic Graph Maintenance (Brandes, 2008)	Incremental edge updates based on betweenness centrality	Analytics module already exposes centrality; `centrality_weight` in `EdgeScore`
STGCN (Yu et al., 2017)	Spatio-temporal GNN embeddings for evolving graphs	GNN embeddings available via `graph/gnn_embeddings.cpp`; used as `NodeEmbeddingProvider`
Incremental Materialized Views (Maccioni et al., 2015)	Maintain derived graph state incrementally	`analytics/incremental_view.cpp` follows this pattern
Adaptive Graph Sampling (Leskovec & Faloutsos, 2006)	Smart edge sampling during refresh to reduce cost	Informs `top_k_candidates` and brute-force-vs-ANN threshold
Temporal Graph Evolution (Leskovec et al., 2008)	Exponential decay of edge relevance over time	`temporal_factor = 2^(−age / half_life)` in `computeTemporalDecay()`
Link Prediction via Embeddings (Hamilton et al., 2017)	Cosine/dot-product similarity in embedding space for candidate edges	`SimilarityMetric::COSINE / DOT_PRODUCT / EUCLIDEAN` in `computeSimilarity()`

Scope

Inputs: Existing property graph with node embedding vectors (GNN index or user-supplied NodeEmbeddingProvider callback); RefreshPolicy configuration.
Outputs:
- Low-relevance edges removed (relevance = similarity × temporal_factor × centrality_weight < relevance_threshold)
- New high-similarity edges added (top-k ANN candidates per node above add_threshold)
- RefreshStats with per-cycle metrics (edges evaluated / removed / added, cycle duration, removal rate, anomaly flag)
- Bounded in-memory audit trail (RefreshAuditEntry, max 10,000 entries)
- Optional Changefeed events (EVENT_PUT / EVENT_DELETE keyed graph_edge_refresh:<edge_id>)
Frequency: Configurable — time-based (refresh_interval), event-triggered (triggerRefresh()), or adaptive (caller-driven).
Graph size targets: Brute-force similarity for ≤ policy.ann_min_vertices nodes (default 10,000); ANN-accelerated path via setANNIndex(IAnnIndex*) for larger graphs.

Design Constraints

Required Interfaces

Interface	Consumer	Notes
`GraphIndexManager::getAllEdges(graph_id)`	`collectEdges()`	Returns all edges scoped to `graph_id` (empty = all)
`GraphIndexManager::getAllVertices(graph_id)`	`discoverCandidates()`	Needed for per-vertex top-k candidate enumeration
`GraphIndexManager::getOutDegree(vertex_id)`	`scoreEdge()`	Used to compute `centrality_weight`
`GraphIndexManager::edgeExists(from, to)`	`discoverCandidates()`	Prevents adding duplicate edges
`GraphIndexManager::createWriteBatch()`	`applyBatch()`	Returns a `WriteBatchWrapper` for atomic multi-edge writes
`GraphIndexManager::addEdge(entity, batch)`	`applyBatch()`	Batched insertion
`GraphIndexManager::deleteEdge(id, batch)`	`applyBatch()`	Batched deletion
`NodeEmbeddingProvider` (`std::function<std::vector<float>(std::string)>`)	`computeSimilarity()`	User-supplied; may be backed by GNN index or HNSW vector store
`Changefeed::recordEvent(ChangeEvent)`	`appendAudit()`	Optional; CDC downstream consumers
`index::IAnnIndex::search(query, dim, k)` ✅	`discoverCandidates()`	Attached via `setANNIndex()`; active when vertex count > `policy.ann_min_vertices`
`setCEPEventCallback(std::function<void(themisdb::analytics::Event)>)` ✅	`applyBatch()`	Real-time `EDGE_CREATE`/`EDGE_DELETE` events after successful batch commit

Implementation Notes

Scoring model (already implemented):

relevance = similarity × temporal_factor × centrality_weight

similarity: cosine / dot-product / Euclidean between embedding vectors of edge endpoints.
temporal_factor = 2^(−age_s / half_life_s) where age_s is read from the edge's _created_at field. If absent or half_life_s = 0, factor = 1.0.
centrality_weight = 1 / (1 + log(1 + out_degree)) — dampens hub nodes.

Candidate discovery (brute-force, partially implemented):

For each vertex v, compute similarity(embedding(v), embedding(u)) for all u ≠ v.
Keep the top-k pairs above add_threshold that do not already have an edge v → u.
Planned upgrade: replace with acceleration::ANNIndex::knnSearch(embedding(v), top_k) for graphs > 10,000 nodes. This reduces O(V²) to O(V · log V) per cycle.

CEP integration (planned, Target: Q1 2027):

After applyBatch() succeeds, emit one CEPEngine::ChangeEvent per mutation into the configured CEP stream.
Pattern: EDGE_REMOVED / EDGE_ADDED with payload { edge_id, from, to, relevance_score, cycle_number }.
Downstream CEP rules can react (e.g., alert on burst of removals, trigger reindexing on cluster-like additions).
This is additive to the existing Changefeed integration; both may be active simultaneously.

Scheduling strategies (configurable via RefreshPolicy):

Strategy	How to configure	Best for
Time-based	`refresh_interval = std::chrono::seconds(N)`	Predictable, low-overhead background maintenance
Manual only	`refresh_interval = std::chrono::seconds(0)`	Caller controls timing via `triggerRefresh()`
Event-triggered	Caller calls `triggerRefresh()` after N mutations	Responsive; combine with change-log threshold
Adaptive	External orchestrator observes `getStats().removal_rate` and adjusts interval	Self-optimising; suitable for variable data drift

Test Strategy

Unit tests (45+ in tests/graph/test_scheduled_edge_refresh.cpp):
- RefreshPolicy validation (invalid thresholds throw std::invalid_argument)
- computeSimilarity() for COSINE, DOT_PRODUCT, EUCLIDEAN — exact float comparisons
- computeTemporalDecay() — half-life formula, absent _created_at, zero half-life
- scoreEdge() — combined relevance with all three factors
- Safety gate abort: aborted_safety_gate = true when removal fraction exceeded
- Full refresh cycle with removal and with addition
- Audit trail population after removal and addition
- getStats() after a completed cycle
- start() / stop() lifecycle with zero interval (manual-only mode)
- setPolicy() runtime update takes effect on next cycle
- Empty graph: graceful handling (no edges, no crash)
- Multiple cycles: cycle_number increments correctly
- Anomaly detection: anomaly_high_removal_rate flag set and logged
- ChangeFeed: recordEvent() called for each mutation with correct event type and metadata
- Large-graph integration: 100-node graph, cluster embeddings → correct add/remove behaviour
- Regression: stable graph (all edges above threshold) → zero mutations over multiple cycles
ANN/CEP tests (implemented):
- ANN path active when vertex count > policy.ann_min_vertices; brute-force fallback below threshold
- ANN-accelerated discovery produces same candidate count as brute-force (BruteForceANN fixture)
- CEP callback invoked with EDGE_REMOVED events after successful removal (event_name + fields validated)
- CEP callback invoked with EDGE_ADDED events after successful addition (all required fields present)
- No CEP events emitted when safety gate aborts cycle
- Detaching CEP callback (empty function) prevents further event emission

Performance Targets

Single refresh cycle on a 10,000-node graph (brute-force): ≤ 5 s wall time on a single core, ≤ 200 ms with 8 parallel scoring workers.
Single refresh cycle on a 1,000,000-node graph (ANN via setANNIndex()): ≤ 30 s wall time; top-k candidate discovery O(V · log V · D) where D = embedding dimension.
Audit trail appendAudit() overhead: < 1 µs per mutation (bounded ring buffer, no heap allocation on steady state).
ChangeFeed::recordEvent() path: < 5 µs per event (RocksDB single put).
Safety gate check: O(1) — computed as a fraction before any storage access.
Background scheduler wake-up jitter: < 50 ms from configured refresh_interval under normal system load.
Memory overhead per engine instance: < 10 MB for audit trail (10,000 entries × ~200 bytes/entry) + policy + stats.

Security / Reliability

max_removal_fraction safety gate (default 0.10) prevents runaway deletions from a misconfigured or adversarially crafted NodeEmbeddingProvider; the gate fires before any write is issued.
NodeEmbeddingProvider is user-supplied code; it must not be trusted to return well-formed vectors. computeSimilarity() defensively returns 0.0 for empty, mismatched-length, or NaN-containing vectors.
Audit trail eviction is FIFO and bounded; no unbounded memory growth regardless of cycle count.
Changefeed::recordEvent() and CEP callback exceptions are caught and logged as warnings; they never abort a refresh cycle or roll back committed mutations.
Batch commit failure is logged as an error; the cycle reports failure via RefreshStats::last_error without crashing the engine or stopping the background thread.
setPolicy() is thread-safe (protected by policy_mutex_); a concurrent triggerRefresh() sees either the old or the new policy consistently, never a partial mix.
ANN index integration only queries the vertex set present in the current cycle; no cross-graph data access is possible through the IAnnIndex interface.

Community Requests

Track user-requested features:

Cypher Query Support: Neo4j-compatible query language (requested by 15 users)
Graph Backup/Restore: Snapshot and restore graph state (requested by 12 users)
Graph Diff: Compare two graph versions (requested by 8 users)
Graph Validation: Schema validation for property graphs (requested by 10 users)

Last Updated: April 2026
Next Review: Q3 2026

Test Strategy

Unit test coverage ≥ 80% across graph/query_optimizer.cpp, graph/plan_cache.cpp, graph/parallel_traversal.cpp (tracked by Issue #1830)
Integration tests: AQL graph query round-trip, constrained path finding with required/forbidden nodes, plan cache hit/miss under concurrent load
Property-based tests: VF2 subgraph isomorphism correctness verified against brute-force matcher for graphs up to 50 nodes
GPU/CPU parity tests: BFS/DFS results must be bit-identical across CUDA and CPU backends for all test graphs
Adaptive cost model convergence test: after 100 synthetic executions per algorithm, mean_absolute_error_ms must be < 10% of mean measured latency
Benchmark suite: traversal latency vs. graph size (1K, 100K, 1M, 10M nodes) with pass/fail regression gate (< 5% throughput degradation)

Performance Targets

Algorithm selection (selectAlgorithm) latency: < 1 ms p99 for graphs up to 10M nodes
Plan cache lookup: < 100 µs p99 including structural fingerprint comparison
Parallel BFS on 1M-node graph: ≥ 4× speedup over single-threaded BFS on an 8-core machine
GPU BFS on RTX-class GPU (≥1M nodes): ≥ 8× speedup over CPU baseline
Subgraph isomorphism on 100-node pattern against 1M-node graph: < 500 ms p95
Distributed query (4 shards): < 20% latency overhead vs. equivalent single-shard query
Plan cache eviction: O(1) amortized using LRU + TTL heap; no stop-the-world pauses

Security / Reliability

Path constraint inputs (required/forbidden node/edge IDs) must be validated against the graph schema before execution; invalid IDs return a structured error, not a crash (Issue #1832)
AQL graph traversal depth must be bounded by a configurable max_depth limit (default 100) to prevent unbounded traversal DoS
Plan cache keys must not embed raw user-supplied strings; use structural fingerprints to prevent cache poisoning
Incremental query handles expire after a configurable TTL (default 60 s) to prevent resource exhaustion from abandoned handles
GPU memory allocation failures must fall back to CPU execution without data loss
Distributed shard queries must be protected by per-request timeouts (default 5 s) with automatic cancellation
All traversal results must be deterministic for a given graph snapshot (no non-deterministic ordering unless explicitly randomized)

FilesExpand file tree

FUTURE_ENHANCEMENTS.md

Latest commit

History

FUTURE_ENHANCEMENTS.md

File metadata and controls

Graph Module - Future Enhancements

Design Constraints

Required Interfaces

Planned Features

Structural Plan Reuse (Query Plan Reuse Across Structurally Similar Queries) ✅ DONE

Parallel Graph Execution ✅ DONE

Adaptive Cost Model ✅ DONE

Distributed Graph Queries ✅ DONE

GPU-Accelerated Graph Processing ✅ IMPLEMENTED (BFS/DFS)

DistributedGraphManager: Read-Path Lock Upgrade

Query Rewriting for Graph Optimization

Approximate Graph Algorithms

Multi-Layer Graph Support

Graph Machine Learning Integration

Graph Visualization Integration

Research Topics

Quantum Graph Algorithms

Graph Streaming Algorithms

Implementation Priorities

Implemented Features (v1.7.0)

Property Graph Schema-Aware Optimizer Hints ✅ DONE

Aggregate Observability Metrics ✅ DONE

Graph-Specific Structured Error Codes ✅ DONE

Explain / Dry-Run Query API ✅ DONE

Parallel BFS (enable_parallel + num_threads) ✅ DONE

Parallel Dijkstra (Δ-Stepping) ✅ DONE

Edge Property Constraints ✅ DONE

Adaptive Cost Model ✅ DONE

Node Property Constraints ✅ DONE

Weight Constraints ✅ DONE

Latency Histogram & Prometheus Export ✅ DONE

Query Rate Limiter ✅ DONE

Subgraph Isomorphism (Pattern Matching) ✅ DONE

Scheduled Semantic Graph Edge Refresh

Background & State of the Art

Scope

Design Constraints

Required Interfaces

Implementation Notes

Test Strategy

Performance Targets

Security / Reliability

Community Requests

Test Strategy

Performance Targets

Security / Reliability

`DistributedGraphManager`: Read-Path Lock Upgrade

Parallel BFS (`enable_parallel` + `num_threads`) ✅ DONE