-
Notifications
You must be signed in to change notification settings - Fork 1
features_temporal_queries
Point-in-Time Queries und Temporal Joins für Zeitbereichs-Abfragen.
- 📋 Übersicht
- ✨ Features
- 🚀 Schnellstart
- 📖 Detaillierte Dokumentation
- 💡 Best Practices
- 🔧 Troubleshooting
- 📚 Siehe auch
- 📝 Changelog
Status: ✅ Implemented & Tested (8/8 tests passing)
Feature: Extended temporal graph queries with time-window filtering
Date: 2025-01-15
This feature extends Themis's temporal graph capabilities from single-timestamp queries to time-range queries. You can now find all edges that overlap with or are fully contained within a specified time window.
- Audit Queries: "Show all relationships valid during Q4 2024"
- Compliance: "Find edges fully contained within investigation period"
- Historical Analysis: "What connections existed between 2020-2022?"
- Temporal Analytics: "Relationships overlapping with event timeframe"
struct TimeRangeFilter {
int64_t start_ms; // Range start (milliseconds since epoch)
int64_t end_ms; // Range end (milliseconds since epoch)
// Factory methods
static TimeRangeFilter between(int64_t start, int64_t end);
static TimeRangeFilter since(int64_t start);
static TimeRangeFilter until(int64_t end);
static TimeRangeFilter all();
// Filtering methods
bool hasOverlap(std::optional<int64_t> edge_valid_from,
std::optional<int64_t> edge_valid_to) const;
bool fullyContains(std::optional<int64_t> edge_valid_from,
std::optional<int64_t> edge_valid_to) const;
};struct EdgeInfo {
std::string edgeId; // Edge identifier
std::string fromPk; // Source node primary key
std::string toPk; // Target node primary key
std::optional<int64_t> valid_from; // Edge valid from (ms)
std::optional<int64_t> valid_to; // Edge valid to (ms)
};std::pair<Status, std::vector<EdgeInfo>>
getEdgesInTimeRange(int64_t range_start_ms,
int64_t range_end_ms,
bool require_full_containment = false) const;Parameters:
-
range_start_ms: Query time window start (milliseconds since epoch) -
range_end_ms: Query time window end (milliseconds since epoch) -
require_full_containment:-
false(default): Returns edges with any overlap with query window -
true: Returns edges fully contained within query window
-
Returns:
-
Status: Operation success/failure -
vector<EdgeInfo>: All matching edges with temporal metadata
Time Complexity: O(E) where E = total edges in database
std::pair<Status, std::vector<EdgeInfo>>
getOutEdgesInTimeRange(std::string_view fromPk,
int64_t range_start_ms,
int64_t range_end_ms,
bool require_full_containment = false) const;Parameters:
-
fromPk: Source node primary key -
range_start_ms: Query time window start (milliseconds since epoch) -
range_end_ms: Query time window end (milliseconds since epoch) -
require_full_containment: Same as global query
Returns:
-
Status: Operation success/failure -
vector<EdgeInfo>: All matching outgoing edges fromfromPk
Time Complexity: O(d) where d = out-degree of node
Find all edges with any overlap with time window [1000, 2000]:
GraphIndexManager graph(db);
// Add edges with different temporal periods
BaseEntity e1("edge1");
e1.setField("_from", "A");
e1.setField("_to", "B");
e1.setField("valid_from", 500); // Partially overlaps
e1.setField("valid_to", 1500);
graph.addEdge(e1);
BaseEntity e2("edge2");
e2.setField("_from", "A");
e2.setField("_to", "C");
e2.setField("valid_from", 1200); // Fully inside
e2.setField("valid_to", 1800);
graph.addEdge(e2);
BaseEntity e3("edge3");
e3.setField("_from", "B");
e3.setField("_to", "C");
e3.setField("valid_from", 2500); // No overlap
e3.setField("valid_to", 3000);
graph.addEdge(e3);
// Query: Find edges overlapping [1000, 2000]
auto [status, edges] = graph.getEdgesInTimeRange(1000, 2000);
// Result: edges = [edge1, edge2]
// edge1: overlaps (500-1500 overlaps with 1000-2000)
// edge2: fully inside (1200-1800 inside 1000-2000)
// edge3: no overlap (2500-3000 is after 2000)Find edges fully contained within time window [1000, 3000]:
// Same edges as Example 1
// Query: Find edges FULLY INSIDE [1000, 3000]
auto [status, edges] = graph.getEdgesInTimeRange(1000, 3000, true);
// Result: edges = [edge2, edge3]
// edge1: NOT included (500-1500 starts before 1000)
// edge2: included (1200-1800 fully inside 1000-3000)
// edge3: included (2500-3000 fully inside 1000-3000)Find outgoing edges from specific node in time window:
// Add edges from node "user1"
BaseEntity e1("follow1");
e1.setField("_from", "user1");
e1.setField("_to", "user2");
e1.setField("valid_from", 1000000);
e1.setField("valid_to", 2000000);
graph.addEdge(e1);
BaseEntity e2("follow2");
e2.setField("_from", "user1");
e2.setField("_to", "user3");
e2.setField("valid_from", 1500000);
e2.setField("valid_to", 2500000);
graph.addEdge(e2);
BaseEntity e3("follow3");
e3.setField("_from", "user2"); // Different source!
e3.setField("_to", "user3");
e3.setField("valid_from", 1200000);
e3.setField("valid_to", 1800000);
graph.addEdge(e3);
// Query: Find user1's outgoing edges in [1100000, 1900000]
auto [status, edges] = graph.getOutEdgesInTimeRange("user1", 1100000, 1900000);
// Result: edges = [follow1, follow2]
// follow1: from user1, overlaps query window
// follow2: from user1, overlaps query window
// follow3: NOT included (from user2, not user1)Edges without valid_from/valid_to match all time queries:
BaseEntity unbounded("always_active");
unbounded.setField("_from", "A");
unbounded.setField("_to", "B");
// NO valid_from/valid_to fields = unbounded temporal range
graph.addEdge(unbounded);
BaseEntity bounded("temporary");
bounded.setField("_from", "A");
bounded.setField("_to", "C");
bounded.setField("valid_from", 1000);
bounded.setField("valid_to", 2000);
graph.addEdge(bounded);
// Query: Find edges in [500, 1500]
auto [status, edges] = graph.getEdgesInTimeRange(500, 1500);
// Result: edges = [always_active, temporary]
// always_active: unbounded edges always included
// temporary: 1000-2000 overlaps 500-1500You can now aggregate a numeric property (SUM, AVG, MIN, MAX, COUNT) across all edges that match a time window. The aggregation supports an optional edge type filter (_type).
std::pair<Status, TemporalAggregationResult>
aggregateEdgePropertyInTimeRange(
std::string_view property, // field name on edge entity, e.g. "cost" or "_weight"
Aggregation agg, // COUNT, SUM, AVG, MIN, MAX
int64_t range_start_ms,
int64_t range_end_ms,
bool require_full_containment = false,
std::optional<std::string_view> edge_type = std::nullopt
) const;Behavior notes:
-
COUNTreturns the number of matching edges (regardless of whether the property exists). -
SUM/AVG/MIN/MAXconsider only edges that have a numeric value for the requested property. -
edge_type(optional) allows server-side filtering by_type.
Example:
// Sum cost of edges of type "A" overlapping [1000,2000]
auto [st, res] = graph.aggregateEdgePropertyInTimeRange("cost", GraphIndexManager::Aggregation::SUM, 1000, 2000, false, std::string_view("A"));
if (st.ok) {
std::cout << "count=" << res.count << " sum=" << res.value << "\n";
}Implementation detail: edge entities are stored under keys like edge:<edgeId>. To be robust across historical formats, the implementation also attempts edge:<graphId>:<edgeId> when reading entities and falls back to edge:<edgeId> if necessary. Helper functions getEdgeType_() and getEdgeWeight_() were hardened to try both formats.
Overlap (require_full_containment = false):
- Default behavior
- Returns edges with any temporal overlap with query window
- Includes partially overlapping edges
- Formula:
edge_start <= query_end AND edge_end >= query_start
Full Containment (require_full_containment = true):
- Strict containment
- Returns edges fully inside query window
- Excludes partially overlapping edges
- Formula:
edge_start >= query_start AND edge_end <= query_end
| Edge Period | Query Window | hasOverlap() | fullyContains() |
|---|---|---|---|
| [500, 1500] | [1000, 2000] | ✅ true | ❌ false |
| [1200, 1800] | [1000, 2000] | ✅ true | ✅ true |
| [2500, 3000] | [1000, 2000] | ❌ false | ❌ false |
| [null, null] | [1000, 2000] | ✅ true | ✅ true |
| [500, null] | [1000, 2000] | ✅ true | ❌ false |
| [null, 3000] | [1000, 2000] | ✅ true | ❌ false |
Unbounded Edges:
- Edges without
valid_from/valid_toare treated as unbounded (always valid) -
hasOverlap()always returnstruefor unbounded edges -
fullyContains()always returnstruefor unbounded edges
- Time Complexity: O(E) where E = total edges in database
- Space Complexity: O(R) where R = number of matching edges
-
Database Scans: Full scan of
graph:out:*prefix -
Entity Loads: One
db.get("edge:*")per edge
Optimization Opportunities:
- Add temporal index for bounded time ranges
- Sorted temporal B-tree for range scans
- Materialized views for common time windows
- Time Complexity: O(d) where d = out-degree of source node
- Space Complexity: O(R) where R = number of matching edges
-
Database Scans: Prefix scan of
graph:out:<fromPk>:* -
Entity Loads: One
db.get("edge:*")per outgoing edge
Much Faster Than Global Query:
- Only scans edges from specific node
- Leverages existing
graph:out:adjacency index - Suitable for high-frequency queries on specific nodes
# Edge entity storage
edge:<edge_id> -> BaseEntity(id, _from, _to, valid_from, valid_to, ...)
# Graph adjacency indices (temporal data stored in entity, not index)
graph:out:<from_pk>:<edge_id> -> <to_pk>
graph:in:<to_pk>:<edge_id> -> <from_pk>
Design Choice:
- Temporal fields (
valid_from,valid_to) stored in edge entity, not in index keys - Requires entity load to check temporal bounds
- Simplifies index structure (no temporal key encoding)
- Trade-off: Extra
db.get()per edge vs. complex temporal index
1. Create TimeRangeFilter from query parameters
2. Scan all edges with prefix "graph:out:"
3. For each edge key "graph:out:<from>:<edgeId>":
a. Parse edgeId from key
b. Load edge entity from "edge:<edgeId>"
c. Extract valid_from, valid_to fields
d. Check temporal match (overlap or containment)
e. If match, add EdgeInfo to results
4. Return filtered results
1. Create TimeRangeFilter from query parameters
2. Scan edges with prefix "graph:out:<fromPk>:"
3. For each edge (same as global query):
a-e. (identical to global query)
4. Return filtered results
- TimeRangeFilter_Overlap - Filter logic: overlap detection
- TimeRangeFilter_FullContainment - Filter logic: containment check
- GetEdgesInTimeRange_Overlap - Global query: overlap mode
- GetEdgesInTimeRange_FullContainment - Global query: containment mode
- GetOutEdgesInTimeRange - Node-specific query: basic functionality
- GetOutEdgesInTimeRange_NoMatch - Node-specific query: no results
- UnboundedEdges_AlwaysIncluded - Unbounded edges match all queries
- EdgeInfo_ContainsTemporalData - Result structure validation
# Run all time-range tests
./themis_tests --gtest_filter="TimeRangeQueryTest.*"
# Expected output:
# [ PASSED ] 8 tests.// Step 1: Find temporal path
RecursivePathQuery rpq;
rpq.start_node = "user1";
rpq.end_node = "user5";
rpq.max_depth = 3;
rpq.valid_from = 1500000; // Single timestamp
rpq.valid_to = 1500000;
auto [status, path] = queryEngine.executeRecursivePathQuery(rpq);
// Step 2: Verify all edges in path valid during time window
auto [st, edges] = graph.getEdgesInTimeRange(1400000, 1600000);
for (const auto& edgeInfo : edges) {
// Check if edge in path is valid throughout window
}| Feature | Single Timestamp | Time Range | Status |
|---|---|---|---|
| BFS/Dijkstra at time T | ✅ bfsAtTime()
|
❌ | Implemented |
| Shortest path at time T | ✅ dijkstraAtTime()
|
❌ | Implemented |
| Find edges in window | ❌ | ✅ getEdgesInTimeRange()
|
Implemented ✨ |
| Find node edges in window | ❌ | ✅ getOutEdgesInTimeRange()
|
Implemented ✨ |
| Temporal aggregation | ❌ | ✅ getTemporalStats()
|
Implemented ✨ |
Problem: O(E) scan for global queries
Solution: B-tree index on (valid_from, valid_to) pairs
// Hypothetical API
auto edges = graph.getEdgesInTimeRange_Indexed(1000, 2000);
// Time complexity: O(log E + R) vs. current O(E)Problem: Current path queries use single timestamp
Solution: Extend RecursivePathQuery with time windows
RecursivePathQuery rpq;
rpq.window_start = 1000000;
rpq.window_end = 2000000;
// Find paths where ALL edges valid during [1000000, 2000000]✅ Implemented! Temporal statistics now available:
auto [status, stats] = graph.getTemporalStats(1000, 2000);
// Returns: TemporalStats{
// edge_count, fully_contained_count, bounded_edge_count,
// avg_duration_ms, total_duration_ms, min/max_duration_ms,
// earliest_start, latest_end
// }
std::cout << stats.toString();Features:
- Count edges with overlap or full containment
- Duration statistics (AVG, SUM, MIN, MAX) for bounded edges
- Temporal range coverage (earliest start, latest end)
- 6/6 tests passing ✅
Problem: Large result sets exhaust memory
Solution: Iterator-based API
auto iter = graph.streamEdgesInTimeRange(1000, 2000);
while (iter.hasNext()) {
EdgeInfo edge = iter.next();
// Process one edge at a time
}- No Temporal Index: Global queries scan all edges (O(E))
-
Entity Load Overhead: One
db.get()per edge (network/disk I/O) - No Streaming API: Large result sets loaded into memory
- No Temporal Joins: Cannot join time-range results with other queries
- No Unbounded Query Optimization: Unbounded edges checked even when range is bounded
-
2025-01-15: Initial implementation
- Added
TimeRangeFilterstructure totemporal_graph.h - Added
EdgeInfostructure tograph_index.h - Implemented
getEdgesInTimeRange()ingraph_index.cpp - Implemented
getOutEdgesInTimeRange()ingraph_index.cpp - Created 8 comprehensive tests (all passing)
- Documentation created
- Added
-
2025-11-11: Temporal aggregation property support
- Added
aggregateEdgePropertyInTimeRange()supporting COUNT, SUM, AVG, MIN, MAX over a query time window - Added unit tests
test_temporal_aggregation_property.cpp(4 tests) and updated CMake - Hardened entity read paths:
getEdgeType_()andgetEdgeWeight_()try bothedge:<graphId>:<edgeId>andedge:<edgeId>formats to be backward-compatible - All new tests passing
- Added
- Recursive Path Queries - Multi-hop temporal reasoning
- Temporal Graph Design - Overall temporal architecture
- Graph Index - Adjacency index design
- MVCC Design - Transaction temporal semantics
ThemisDB v1.3.4 | GitHub | Documentation | Discussions | License
Last synced: January 02, 2026 | Commit: 6add659
Version: 1.3.0 | Stand: Dezember 2025
- Übersicht
- Home
- Dokumentations-Index
- Quick Reference
- Sachstandsbericht 2025
- Features
- Roadmap
- Ecosystem Overview
- Strategische Übersicht
- Geo/Relational Storage
- RocksDB Storage
- MVCC Design
- Transaktionen
- Time-Series
- Memory Tuning
- Chain of Thought Storage
- Query Engine & AQL
- AQL Syntax
- Explain & Profile
- Rekursive Pfadabfragen
- Temporale Graphen
- Zeitbereichs-Abfragen
- Semantischer Cache
- Hybrid Queries (Phase 1.5)
- AQL Hybrid Queries
- Hybrid Queries README
- Hybrid Query Benchmarks
- Subquery Quick Reference
- Subquery Implementation
- Content Pipeline
- Architektur-Details
- Ingestion
- JSON Ingestion Spec
- Enterprise Ingestion Interface
- Geo-Processor Design
- Image-Processor Design
- Hybrid Search Design
- Fulltext API
- Hybrid Fusion API
- Stemming
- Performance Tuning
- Migration Guide
- Future Work
- Pagination Benchmarks
- Enterprise README
- Scalability Features
- HTTP Client Pool
- Build Guide
- Implementation Status
- Final Report
- Integration Analysis
- Enterprise Strategy
- Verschlüsselungsstrategie
- Verschlüsselungsdeployment
- Spaltenverschlüsselung
- Encryption Next Steps
- Multi-Party Encryption
- Key Rotation Strategy
- Security Encryption Gap Analysis
- Audit Logging
- Audit & Retention
- Compliance Audit
- Compliance
- Extended Compliance Features
- Governance-Strategie
- Compliance-Integration
- Governance Usage
- Security/Compliance Review
- Threat Model
- Security Hardening Guide
- Security Audit Checklist
- Security Audit Report
- Security Implementation
- Development README
- Code Quality Pipeline
- Developers Guide
- Cost Models
- Todo Liste
- Tool Todo
- Core Feature Todo
- Priorities
- Implementation Status
- Roadmap
- Future Work
- Next Steps Analysis
- AQL LET Implementation
- Development Audit
- Sprint Summary (2025-11-17)
- WAL Archiving
- Search Gap Analysis
- Source Documentation Plan
- Changefeed README
- Changefeed CMake Patch
- Changefeed OpenAPI
- Changefeed OpenAPI Auth
- Changefeed SSE Examples
- Changefeed Test Harness
- Changefeed Tests
- Dokumentations-Inventar
- Documentation Summary
- Documentation TODO
- Documentation Gap Analysis
- Documentation Consolidation
- Documentation Final Status
- Documentation Phase 3
- Documentation Cleanup Validation
- API
- Authentication
- Cache
- CDC
- Content
- Geo
- Governance
- Index
- LLM
- Query
- Security
- Server
- Storage
- Time Series
- Transaction
- Utils
Vollständige Dokumentation: https://makr-code.github.io/ThemisDB/