Skip to content

BlockDB: A high-performance relational database engine with block-based storage, B+ tree indexing, and memory-optimized query processing.

License

Notifications You must be signed in to change notification settings

YashK2003/BlockDB

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SimpleRA: High-Performance Relational Algebra Engine

Database Engine B+ Tree Indexing External Sorting

SimpleRA is a high-performance relational algebra engine optimized for large datasets that exceed available main memory. It implements SQL-like operations using advanced algorithms and data structures, featuring block-based storage, B+ tree indexing, and memory-conscious design.

✨ Key Features

🔹 Phase 1: Matrix & Table Operations

  • Block-based Storage: Matrices partitioned into 15×15 blocks with row-major storage
  • Matrix Operations:
    • LOAD MATRIX: Loads matrix from CSV into block structure
    • ROTATE: Efficient clockwise rotation via in-place diagonal transposition and off-diagonal swapping
    • CROSSTRANSPOSE: Atomic in-place transposition of two matrices with name swapping
    • CHECKANTISYM: Validates anti-symmetry (A = -Bᵀ) with transposition reversion
  • Cache Management: clearPool function ensures cache consistency by invalidating stale pages

🔹 Phase 2: Query Processing

  • External Sorting:
    • Sorting Phase: (Nb-1) blocks sorted in-memory using std::sort
    • Merging Phase: Priority queue-based k-way merge with O(log Nr) complexity
  • ORDER BY: Creates sorted tables with ASC/DESC support
  • GROUP BY:
    • Temporary table sorting by grouping attribute
    • Aggregation functions (SUM, AVG, MIN, MAX, COUNT)
    • HAVING clause for grouped result filtering
  • Partitioned Hash JOIN:
    • Hashes smaller table for efficient probing
    • Preserves column order (table1 cols + table2 cols)

🔹 Phase 3: Indexed Operations

  • B+ Tree Indexing:
    • On-demand creation at first SEARCH
    • Disk-based nodes (1 node = 1 block)
    • Composite keys (value, count) for duplicate handling
    • Leaf node linking for efficient range queries
  • Operator-Specific Search:
    • </<=: Leftmost leaf traversal
    • >: Smallest key search + rightward scan
    • ==/>=: Lower bound search
    • !=: Dual search combination
  • Atomic Operations:
    • INSERT: Appends to last page + index updates
    • UPDATE: DELETE old record + INSERT new value
    • DELETE: Lazy deletion with del_marker and 25% threshold rebuild

⚡ Performance Optimization

🔹 Memory-Constrained Design

  • 10-Block Limit: Strict adherence to memory constraints
  • B+ Tree Optimization: Only 3 blocks loaded simultaneously (current node, parent, child)
  • File-Based Intermediate Storage:
    • .srch files for SEARCH results (prevents memory overflow)
    • .dlt files for batched DELETE operations
    • .upd files for batched UPDATE operations
  • Efficient Result Handling: SEARCH results written directly to disk files

🔹 I/O Optimization

  • Batched Block Operations: Read/write full blocks instead of row-wise access
  • Lazy Deletion Strategy:
    • Logical deletion via del_marker (minimizes disk writes)
    • Physical rebuild only when >25% rows deleted (reduces costly operations)
  • Sequential Access Patterns: Leaf node linking enables efficient range scans

🔹 Algorithmic Efficiency

  • O(log n) Complexity: All index operations (search/insert/delete/update)
  • External Sorting: Minimizes disk accesses through:
    • Run generation with (Nb-1) blocks
    • Priority queue-based merging
  • Composite Key Handling: cnt_search() dynamically resolves duplicates

🚀 Performance Benchmarks

Operation Without Index With B+ Tree Index Improvement Factor
Search O(n) scan O(log n) 1000x (1M rows)
Range Query O(n) O(log n + k) 500x (1M rows)
Insert O(1) O(log n) Negligible overhead
Delete O(n) O(log n) 800x (1M rows)
Update O(n) O(log n) 800x (1M rows)

🛠️ Compilation & Execution

cd SimpleRA/src
make clean
make
./server

📚 Detailed Documentation

B+ Tree Architecture

  • Node Structure: Internal/leaf nodes sized to match disk blocks
  • Storage: .idx files organized in /data/temp/<table>/<column>/
  • Duplicate Handling: Composite keys (value, count) maintain stable ordering
  • Range Queries: Leaf node linking enables efficient sequential access

Hybrid Deletion Strategy

  1. Logical Deletion:
    • Set del_marker = 1
    • Update indexes by removing references
  2. Physical Rebuild (when >25% rows deleted):
    • loadagain() reconstructs table without deleted rows
    • All indexes rebuilt for consistency
  3. Batched Processing: .dlt files group deletions by page

Update Workflow

  1. Search: Locate target rows using B+ Tree
  2. Modify: Update values in memory
  3. Delete: Remove original rows via executeDELETE()
  4. Insert: Add modified rows as new records via executeINSERT()
  5. File Handling: .upd files batch modifications between steps

About

BlockDB: A high-performance relational database engine with block-based storage, B+ tree indexing, and memory-optimized query processing.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages