SimpleRA: High-Performance Relational Algebra Engine

SimpleRA is a high-performance relational algebra engine optimized for large datasets that exceed available main memory. It implements SQL-like operations using advanced algorithms and data structures, featuring block-based storage, B+ tree indexing, and memory-conscious design.

✨ Key Features

🔹 Phase 1: Matrix & Table Operations

Block-based Storage: Matrices partitioned into 15×15 blocks with row-major storage
Matrix Operations:
- LOAD MATRIX: Loads matrix from CSV into block structure
- ROTATE: Efficient clockwise rotation via in-place diagonal transposition and off-diagonal swapping
- CROSSTRANSPOSE: Atomic in-place transposition of two matrices with name swapping
- CHECKANTISYM: Validates anti-symmetry (A = -Bᵀ) with transposition reversion
Cache Management: clearPool function ensures cache consistency by invalidating stale pages

🔹 Phase 2: Query Processing

External Sorting:
- Sorting Phase: (Nb-1) blocks sorted in-memory using std::sort
- Merging Phase: Priority queue-based k-way merge with O(log Nr) complexity
ORDER BY: Creates sorted tables with ASC/DESC support
GROUP BY:
- Temporary table sorting by grouping attribute
- Aggregation functions (SUM, AVG, MIN, MAX, COUNT)
- HAVING clause for grouped result filtering
Partitioned Hash JOIN:
- Hashes smaller table for efficient probing
- Preserves column order (table1 cols + table2 cols)

🔹 Phase 3: Indexed Operations

B+ Tree Indexing:
- On-demand creation at first SEARCH
- Disk-based nodes (1 node = 1 block)
- Composite keys (value, count) for duplicate handling
- Leaf node linking for efficient range queries
Operator-Specific Search:
- </<=: Leftmost leaf traversal
- >: Smallest key search + rightward scan
- ==/>=: Lower bound search
- !=: Dual search combination
Atomic Operations:
- INSERT: Appends to last page + index updates
- UPDATE: DELETE old record + INSERT new value
- DELETE: Lazy deletion with del_marker and 25% threshold rebuild

⚡ Performance Optimization

🔹 Memory-Constrained Design

10-Block Limit: Strict adherence to memory constraints
B+ Tree Optimization: Only 3 blocks loaded simultaneously (current node, parent, child)
File-Based Intermediate Storage:
- .srch files for SEARCH results (prevents memory overflow)
- .dlt files for batched DELETE operations
- .upd files for batched UPDATE operations
Efficient Result Handling: SEARCH results written directly to disk files

🔹 I/O Optimization

Batched Block Operations: Read/write full blocks instead of row-wise access
Lazy Deletion Strategy:
- Logical deletion via del_marker (minimizes disk writes)
- Physical rebuild only when >25% rows deleted (reduces costly operations)
Sequential Access Patterns: Leaf node linking enables efficient range scans

🔹 Algorithmic Efficiency

O(log n) Complexity: All index operations (search/insert/delete/update)
External Sorting: Minimizes disk accesses through:
- Run generation with (Nb-1) blocks
- Priority queue-based merging
Composite Key Handling: cnt_search() dynamically resolves duplicates

🚀 Performance Benchmarks

Operation	Without Index	With B+ Tree Index	Improvement Factor
Search	O(n) scan	O(log n)	1000x (1M rows)
Range Query	O(n)	O(log n + k)	500x (1M rows)
Insert	O(1)	O(log n)	Negligible overhead
Delete	O(n)	O(log n)	800x (1M rows)
Update	O(n)	O(log n)	800x (1M rows)

🛠️ Compilation & Execution

cd SimpleRA/src
make clean
make
./server

📚 Detailed Documentation

B+ Tree Architecture

Node Structure: Internal/leaf nodes sized to match disk blocks
Storage: .idx files organized in /data/temp/<table>/<column>/
Duplicate Handling: Composite keys (value, count) maintain stable ordering
Range Queries: Leaf node linking enables efficient sequential access

Hybrid Deletion Strategy

Logical Deletion:
- Set del_marker = 1
- Update indexes by removing references
Physical Rebuild (when >25% rows deleted):
- loadagain() reconstructs table without deleted rows
- All indexes rebuilt for consistency
Batched Processing: .dlt files group deletions by page

Update Workflow

Search: Locate target rows using B+ Tree
Modify: Update values in memory
Delete: Remove original rows via executeDELETE()
Insert: Add modified rows as new records via executeINSERT()
File Handling: .upd files batch modifications between steps

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.vscode		.vscode
data		data
docs		docs
src		src
.gitignore		.gitignore
Grammar.md		Grammar.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SimpleRA: High-Performance Relational Algebra Engine

✨ Key Features

🔹 Phase 1: Matrix & Table Operations

🔹 Phase 2: Query Processing

🔹 Phase 3: Indexed Operations

⚡ Performance Optimization

🔹 Memory-Constrained Design

🔹 I/O Optimization

🔹 Algorithmic Efficiency

🚀 Performance Benchmarks

🛠️ Compilation & Execution

📚 Detailed Documentation

B+ Tree Architecture

Hybrid Deletion Strategy

Update Workflow

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

YashK2003/BlockDB

Folders and files

Latest commit

History

Repository files navigation

SimpleRA: High-Performance Relational Algebra Engine

✨ Key Features

🔹 Phase 1: Matrix & Table Operations

🔹 Phase 2: Query Processing

🔹 Phase 3: Indexed Operations

⚡ Performance Optimization

🔹 Memory-Constrained Design

🔹 I/O Optimization

🔹 Algorithmic Efficiency

🚀 Performance Benchmarks

🛠️ Compilation & Execution

📚 Detailed Documentation

B+ Tree Architecture

Hybrid Deletion Strategy

Update Workflow

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages