Skip to content

Performance#9

Open
christopherkarani wants to merge 9 commits intomainfrom
performance
Open

Performance#9
christopherkarani wants to merge 9 commits intomainfrom
performance

Conversation

@christopherkarani
Copy link
Copy Markdown
Owner

No description provided.

christopherkarani and others added 7 commits January 7, 2026 03:31
## New Features

### ParentChildChunker
- Hierarchical chunking strategy for 4K context windows
- Creates large parent chunks (2000 chars) with small children (400 chars)
- Maintains bidirectional parent-child relationships in metadata
- Configurable overlap, separators, and output options

### ParentChildRetriever
- Searches child embeddings but returns parent chunks for context
- Score aggregation strategies: max, average, sum
- Includes ParentLookup protocol and VectorStoreParentLookup actor
- LRU caching for efficient parent lookups

### HuggingFaceEmbedding
- HuggingFace Inference API integration
- Supports MiniLM (384d), BGE (768d/1024d), E5 (1024d), Jina (768d)
- Configurable wait_for_model for cold starts
- Rate limit handling with retry-after support

### MistralEmbedding
- Mistral AI embedding API (mistral-embed, 1024d)
- Index-sorted response handling for order consistency
- Rate limiter factory method added

## Tests
- Comprehensive ParentChildChunkerTests covering:
  - Basic chunking, relationship integrity, overlap validation
  - Metadata completeness, edge cases, configuration

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
P1 Features (Document Preprocessing):
- VisionOCRProcessor: Actor-based OCR using Apple Vision framework
  - extractText(from: URL/CGImage) for images
  - extractTextFromPDF(at:) for multi-page PDFs
  - Configurable languages and recognition levels

- TableExtractor: Extracts tables from document images
  - ExtractedTable struct with markdown and CSV output
  - Uses VNDetectRectanglesRequest + VNRecognizeTextRequest
  - Groups text by Y-coordinate into rows

P2 Features (Advanced Retrieval & Evaluation):
- ChunkGraph: Graph data structure for chunk relationships
  - EdgeType: sequential, semantic, reference
  - Builds bidirectional edges for adjacent chunks
  - Creates semantic edges for high-similarity chunks (threshold: 0.8)

- GraphRetriever: BFS-based graph traversal retrieval
  - Combines vector search with graph expansion
  - Configurable hops (default: 2) and edge threshold (0.7)
  - Score propagation with decay factor

- RAGEvaluator: Evaluation framework for RAG pipelines
  - RetrievalMetrics: precision, recall, F1, MRR, NDCG
  - GenerationMetrics: faithfulness, relevance, coherence
  - Optional LLM judge for faithfulness scoring

All implementations use Swift 6.0 actor-based concurrency with
Sendable conformance. Build verified with zero errors.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Code Review Fixes (6 warnings resolved):

1. Task Cancellation in ParentChildRetriever
   - Added Task.checkCancellation() before embedding, after embedding,
     and during parent fetching loop
   - Prevents wasted resources when caller cancels mid-operation

2. O(n²) Complexity Limit in ChunkGraph
   - Added 100,000 comparison limit to buildSemanticEdges()
   - Documented O(n²) complexity in method documentation
   - Debug logging when limit is hit, suggesting k-NN alternatives

3. BFS Cancellation in GraphRetriever
   - Added Task.checkCancellation() at start of each BFS hop
   - Prevents long graph traversals from blocking cancellation

4. Error Logging in ParentChildRetriever
   - Added DEBUG-only logging when parent fetch fails
   - Improves production debugging without performance impact

5. Overflow Protection in ParentChildChunker
   - Used addingReportingOverflow() for offset calculation
   - Prevents integer overflow with very large (>2GB) documents

6. Documentation for nonisolated(unsafe) in TableExtractor
   - Added SAFETY comment explaining why VNRectangleObservation
     is safe to transfer despite missing Sendable conformance

All fixes maintain Swift 6.0 strict concurrency compliance.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Code Review Fixes (3 Critical + 4 Warnings):

**Critical Issues:**
1. TableExtractor Vision callback - Replaced `nonisolated(unsafe)` with
   Sendable `DetectedRegion` struct to safely cross actor boundaries
2. SQLiteVectorStore schema init - Added NSLock protection to prevent
   potential race condition in `createSchemaIfNeeded()`
3. ChunkGraph O(n²) truncation - Added `truncationWarning` property with
   detailed progress tracking (% processed, chunks affected)

**Warnings:**
1. OllamaEmbedding batch - Added Task.checkCancellation() between iterations
2. ParentChildChunker overflow - Now throws ZoniError.chunkingFailed instead
   of using Int.max fallback
3. RAGEvaluator regex - Added proper error handling and DEBUG logging for
   regex failures
4. SQLiteVectorStore dimensions - Fixed validation to use actualDimensions
   when available

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The truncationWarning property was private(set), preventing callers from
checking if semantic edge building was truncated. Changed to
public private(set) with usage example in documentation.

This ensures callers can detect and respond to partial graph construction.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Implements Phase 1 of GPU optimization plan with 10-432x speedups on large datasets (>10k vectors). Adds adaptive CPU/GPU backend selection, 6 Metal compute kernels (cosine similarity, pairwise matrix, MMR scoring), comprehensive test coverage, and automatic fallback for filtered searches. Fixes lambda validation and candidateMultiplier bounds in MMR retriever for consistency.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@christopherkarani
Copy link
Copy Markdown
Owner Author

@claude do a comprehensive code review

@claude
Copy link
Copy Markdown

claude bot commented Jan 8, 2026

Claude encountered an error —— View job


Comprehensive Code Review

Todo List

  • Read repository CLAUDE.md files for guidelines
  • Review Metal GPU acceleration implementation
  • Review chunking strategies (ParentChildChunker)
  • Review new embedding providers (HuggingFace, Mistral)
  • Review RAG evaluator
  • Review graph-based retrieval
  • Review OCR and table extraction processors
  • Review test coverage and quality
  • Review security and error handling
  • Compile comprehensive feedback

christopherkarani and others added 2 commits January 9, 2026 03:00
- Added GPUVectorStore for Metal-accelerated retrieval in ZoniApple
- Refactored SemanticChunker to use injectable SimilarityCalculator
- Implemented MetalSimilarityCalculator for 7x chunking speedup
- Added SemanticChunkerBenchmarks
- Finalized cleanup of ZoniServer integrations (removed legacy ZoniVapor/ZoniHummingbird folders)
Both server framework integrations were behind #if compilation flags that
were never activated — no SPM dependencies declared, no tests running,
and no feature parity between them. Removes 20 integration files (~2.8k LOC),
6 stale audit docs, 25 auto-generated CLAUDE.md files, and an empty
JobProtocol.swift duplicate. Updates README and doc comments to reflect
the removal. Adds ZoniConduit integration layer and various test/source
improvements from the performance branch.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@claude
Copy link
Copy Markdown

claude bot commented Feb 19, 2026

PR Review: Performance (#9)

This is a substantial PR adding significant new features to the Zoni RAG framework. The overall code quality is high — excellent Swift 6 actor usage, comprehensive documentation, and solid architectural patterns. Below is a detailed review grouped by category.


Summary

Category Assessment
Swift 6 Concurrency Excellent — actors, Sendable, proper isolation throughout
Documentation Outstanding — comprehensive DocC comments on all public APIs
Architecture Well-designed protocols with clean separation of concerns
Security Strong authentication; one critical release-build gap
Performance A few significant issues to address
Test Coverage Good breadth, but notable gaps in integration and edge cases

Bugs / Correctness Issues

Critical: precondition Disabled in Release Builds (SQL Injection Risk)

Sources/ZoniServer/VectorStore/Stores/PgVectorStore.swift uses precondition for table name validation. precondition is compiled out in release builds (-O). This means invalid table names can reach SQL construction in production. Replace with explicit runtime validation that throws:

guard !tableName.isEmpty,
      tableName.first?.isLetter == true,
      tableName.allSatisfy({ $0.isLetter || $0.isNumber || $0 == "_" }) else {
    throw ZoniError.configurationError("Invalid table name: \(tableName)")
}

Critical: PineconeStore Silent Metadata Loss

Sources/Zoni/VectorStore/Stores/PineconeStore.swift documents that nested dicts in custom metadata are converted to null — but there is no warning or error when this happens. A user storing nested metadata will silently lose data. At minimum, log a warning or throw an error when nested dicts are detected before conversion.

MMRRetriever: Mutable Actor Properties

Sources/Zoni/Retrieval/Retrievers/MMRRetriever.swift exposes public mutable var properties directly on the actor (lambda, candidateMultiplier). These should be configurable at init time or exposed through explicit setter methods. Making them freely mutable var on an actor makes the concurrency contract confusing for callers.

RAGEvaluator: Silent Production Failures

Score parsing failures in Sources/Zoni/Evaluation/RAGEvaluator.swift silently produce 0.0 default values in production (the log is debug-only, ~line 556). This should log a warning in all build configurations so silent mis-evaluation is surfaced.

VectorStoreParentLookup: O(n) LRU Cache Operations

Sources/Zoni/Retrieval/VectorStoreParentLookup.swift maintains access order with an array, resulting in O(n) firstIndex(of:) on every cache hit. For a lookup called frequently during retrieval, this accumulates. Consider OrderedDictionary from swift-collections or a dictionary + linked list.

SQLiteVectorStore: O(n·d) Search Unscalable

Sources/Zoni/VectorStore/Stores/SQLiteVectorStore.swift loads all rows into memory and computes cosine similarity serially. At 100K vectors × 1536 dims this is ~600MB heap + heavy CPU. The implementation is fine for development, but the public API should prominently document this scaling limit so users don't use SQLite in production with large corpora.

PineconeStore: Retry Logic Inconsistently Applied

Sources/Zoni/VectorStore/Stores/PineconeStore.swift wraps search in executeWithRetry but add(), delete(), and count() call the API directly without retry. Transient network failures during bulk ingest surface as hard errors.

SemanticChunker: Fragile Offset Calculation

Sources/Zoni/Chunking/Strategies/SemanticChunker.swift reconstructs character offsets by splitting on "." and doing prefix-based string matching (~lines 358-365). This breaks with abbreviations, decimal numbers, or ellipses. Track offsets directly during the chunking pass instead.

GraphRetriever: Hardcoded Decay Factor

Sources/Zoni/Retrieval/Retrievers/GraphRetriever.swift uses 1.0 / (hop + 2) as the score decay formula with no rationale. Either document why this formula was chosen or expose it as a configurable parameter (decayFunction: (Int) -> Float).


Performance Issues

GPUVectorStore: Full Buffer Rebuild on Every Mutation

Sources/ZoniApple/VectorStore/GPUVectorStore.swift calls rebuildFlatStorage() after every add/delete, iterating the entire store each time. For bulk ingest this is O(n²). A dirty flag with lazy rebuild (already partially set up) or incremental buffer appends for add-only operations would avoid this.

OCRProcessor: Sequential PDF Page Processing

Sources/ZoniApple/Preprocessing/OCRProcessor.swift processes pages in a serial for loop (~line 256). Vision requests are independent and can be parallelized with TaskGroup for 5-10x improvement on multi-page documents.

InMemoryVectorStore+Metal: Full Sort Instead of Partial Selection

Sources/ZoniApple/VectorStore/InMemoryVectorStore+Metal.swift (~line 332) acknowledges that a heap would be more efficient for top-K selection, but still does a full sort. For small K relative to N, use partialSort or a heap — O(n log k) vs O(n log n).


Security

  • Authentication: TenantManager is excellent — PBKDF2-SHA256 with 120K iterations, random salt, pepper support, and constant-time comparison. Well done.
  • Parameterized queries: PgVectorStore consistently uses PostgresBindings. LIKE patterns are properly escaped.
  • Connection string parsing: Passwords containing @ or : in PgVectorStore (~line 1309) may fail URL parsing silently. Add explicit validation.
  • API keys: Not leaked in any description or log calls — good.

Test Coverage Gaps

PineconeStore has no tests. Given the silent metadata loss issue and incomplete retry logic, this is a blocker before merging.

SQLiteVectorStore is missing tests for:

  • Dimension mismatch (second add with different vector dims)
  • Concurrent access correctness
  • Transaction rollback recovery

PerformanceTests print metrics but do not assert baselines. Add #expect(elapsed < threshold) assertions so CI catches regressions.

MetalComputeTests hardcode the 5000/10000 vector thresholds from ComputeBackend. These should reference the same constants to avoid drift.


Non-Code Observations

IMPLEMENTATION_TASK_BREAKDOWN.md contains absolute local paths (/Users/chriskarani/CodingProjects/zoni/...). These should be scrubbed before merging. Consider whether a 1,771-line planning document belongs in the repository at all vs. a GitHub Project/issue.

Agent model downgrade (opus → sonnet): Valid cost tradeoff, but security-auditor and penetration-tester agents doing less thorough work on complex tasks is worth documenting explicitly in the PR description.


Actionable Items

Priority Item
Must Fix Replace precondition with runtime-throwing validation in PgVectorStore
Must Fix Add warning/error for PineconeStore nested metadata loss
Must Fix Add PineconeStore tests before merging
Should Fix Fix GPUVectorStore O(n²) rebuild on bulk insert
Should Fix Fix O(n) LRU operations in VectorStoreParentLookup
Should Fix Add retry logic to PineconeStore add()/delete()
Should Fix Make RAGEvaluator score parsing failures visible in production
Should Fix Remove local machine paths from IMPLEMENTATION_TASK_BREAKDOWN.md
Consider Parallelize OCRProcessor PDF page extraction with TaskGroup
Consider Add baseline assertions to PerformanceTests
Consider Make GraphRetriever decay factor configurable
Consider Document SQLiteVectorStore scaling limits prominently in the README

Overall this is a well-engineered PR with excellent Swift 6 concurrency patterns and documentation. The precondition/release-build SQL validation issue and missing PineconeStore tests are the two items I'd treat as blockers; the rest are good follow-up candidates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant