Skip to content

feat: add obsidian ingestion support#222

Closed
0xrushi wants to merge 28 commits intoSimpleOpenSoftware:devfrom
0xrushi:feat/add-obsidian
Closed

feat: add obsidian ingestion support#222
0xrushi wants to merge 28 commits intoSimpleOpenSoftware:devfrom
0xrushi:feat/add-obsidian

Conversation

@0xrushi
Copy link
Contributor

@0xrushi 0xrushi commented Dec 20, 2025

Needs neo4j to be enabled, thats disabled by default

image

new note

image

client connected to chronicle neo4j

image

enabled obsidian memory, asked a query, then disabled and asked same query

sarah1

new chat with obsidian memory disabled

sarah2

Summary by CodeRabbit

  • New Features

    • Obsidian vault import: upload ZIPs, start ingestion, and view real‑time progress/status in UI
    • Chat option "Include Obsidian Memory" to augment responses with vault content
    • New Obsidian import UI/component for authenticated/admin users
  • Enhancements

    • Admin‑only gating and clearer error messages for memory/ingest workflows
    • Backend integration for Neo4j‑based storage and improved embedding/search
  • Tests

    • Added unit tests covering Obsidian ingestion and search flows

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 20, 2025

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

📝 Walkthrough

Walkthrough

Adds full Obsidian ingestion: new frontend components and UI, backend API routes for upload/start/status, Redis/RQ enqueueing and worker job, Obsidian parsing/chunking/embedding service, Neo4j client/storage, chat integration flag, utilities, tests, and deployment/profile adjustments.

Changes

Cohort / File(s) Summary
Frontend: lightweight ingest component
app/app/components/ObsidianIngest.tsx, app/app/index.tsx
New React Native component and mount; sends POST to /api/obsidian/ingest with vault path and Authorization; UI input + loading state.
Frontend: Upload & Chat UI
backends/advanced/webui/src/pages/Upload.tsx, backends/advanced/webui/src/pages/Chat.tsx
Admin-only ZIP upload/start/poll/status UI for Obsidian vaults; Chat adds "Include Obsidian Memory" checkbox and forwards flag.
Frontend: API service
backends/advanced/webui/src/services/api.ts
New obsidianApi (uploadZip/start/status/cancel) added (duplicated block present); chatApi.sendMessage gains includeObsidianMemory; session/statistics endpoints added.
Backend: API routing / modules
backends/advanced/src/advanced_omi_backend/routers/api_router.py, backends/advanced/src/advanced_omi_backend/routers/modules/__init__.py
Registers and exports new obsidian_router under /api.
Backend: Obsidian API routes
backends/advanced/src/advanced_omi_backend/routers/modules/obsidian_routes.py
New endpoints: POST /obsidian/ingest, /obsidian/upload_zip, /obsidian/start, GET /obsidian/status; uses Redis pending keys, RQ enqueueing, ZIP handling, auth guards and error handling.
Backend: Worker jobs
backends/advanced/src/advanced_omi_backend/workers/obsidian_jobs.py
New RQ job ingest_obsidian_vault_job and count_markdown_files: traverse vault, parse→chunk→embed→ingest loop, cancellation support, job metadata updates and error aggregation.
Backend: Obsidian service & utils
backends/advanced/src/advanced_omi_backend/services/obsidian_service.py, .../services/neo4j_client.py, .../utils/file_utils.py, .../utils/model_utils.py
New ObsidianService (parsing, chunking, embedding, Neo4j ingestion, search) with lazy init; Neo4j client abstraction; zip extraction utility; model config lookup helper.
Backend: Memory/config/LLM changes
backends/advanced/src/advanced_omi_backend/services/memory/config.py, backends/advanced/src/advanced_omi_backend/llm_client.py, .../utils/config_utils.py
Adds load_config_yml and resolve_value; memory config rework and registry-driven defaults; LLM clients accept injected params (no env fallback).
Backend: Chat service integration
backends/advanced/src/advanced_omi_backend/chat_service.py, .../routers/modules/chat_routes.py
ChatService signatures updated to accept include_obsidian_memory; integrates Obsidian search when flagged; route schema extended to carry flag.
Backend: Embedding provider updates
backends/advanced/src/advanced_omi_backend/services/memory/providers/llm_providers.py
New async helper generate_openai_embeddings; OpenAIProvider delegates to it and re-raises exceptions preserving traceback.
Backend: System routes & validation
backends/advanced/src/advanced_omi_backend/routers/modules/system_routes.py, .../controllers/system_controller.py
Memory config validation endpoints added; docstrings updated to reflect config.yml as source of truth.
Backend: Tests
backends/advanced/tests/test_obsidian_service.py
New unit tests for ObsidianService covering DB setup, chunking, embedding, ingestion and error cases (mocks for external deps).
Config & deployment / services/status/wizard
backends/advanced/pyproject.toml, backends/advanced/docker-compose.yml, services.py, status.py, wizard.py
Adds neo4j and httpx deps, mem0ai ref update; enables obsidian docker-compose profile when NEO4J_HOST present; status and wizard detect and report obsidian profile.

Sequence Diagram(s)

sequenceDiagram
    participant User as Frontend User
    participant UI as WebUI / Mobile UI
    participant API as Backend API
    participant Redis as Redis / RQ
    participant Worker as RQ Worker
    participant Neo4j as Neo4j DB

    Note over User,Neo4j: Obsidian Ingest — Upload → Queue → Worker → DB
    User->>UI: Upload vault.zip
    UI->>API: POST /api/obsidian/upload_zip (multipart)
    API->>API: extract zip, count .md, set pending key
    API-->>UI: { job_id, vault_path, total_files }

    User->>UI: Start ingestion
    UI->>API: POST /api/obsidian/start { job_id }
    API->>Redis: enqueue ingest_obsidian_vault_job(job_id, vault_path)
    API->>Redis: delete pending key
    API-->>UI: queued job_id

    loop Poll status
        UI->>API: GET /api/obsidian/status?job_id
        API->>Redis: fetch RQ job + meta
        API-->>UI: { status, processed, total, last_file, errors }
    end

    par Worker execution
        Redis->>Worker: run ingest_obsidian_vault_job
        Worker->>Worker: iterate .md files → parse → chunk → embed
        Worker->>Neo4j: MERGE nodes/relationships / write vectors
        Worker->>Redis: update job metadata (processed, last_file)
    end
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Suggested reviewers

  • thestumonkey
  • AnkushMalaker

Poem

"I hopped through vaults both wide and deep,
Found notes in nests where shadows sleep.
I chunked and stitched with carrot glee,
Sent vectors hopping to Neo4j tree.
🥕 A little rabbit cheers — ingest complete!"

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 63.33% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title 'feat: add obsidian ingestion support' clearly summarizes the main change—adding Obsidian vault ingestion functionality to the project.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 15

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

🟡 Minor comments (6)
backends/advanced/src/advanced_omi_backend/services/memory/config.py-165-174 (1)

165-174: Potential TypeError if embedding_dimensions is not set.

If embed_def.get('embedding_dimensions') returns None, the int() conversion on line 167 will raise a TypeError. Consider providing a default value or adding validation.

🔎 Proposed fix with default value
-    embedding_dims = int(resolve_value(embed_def.get('embedding_dimensions')))
+    raw_dims = embed_def.get('embedding_dimensions')
+    if raw_dims is None:
+        raise ValueError("'embedding_dimensions' must be specified in the embedding model config")
+    embedding_dims = int(resolve_value(raw_dims))

Or with a sensible default:

-    embedding_dims = int(resolve_value(embed_def.get('embedding_dimensions')))
+    embedding_dims = int(resolve_value(embed_def.get('embedding_dimensions', 1536)))
backends/advanced/src/advanced_omi_backend/routers/modules/health_routes.py-23-24 (1)

23-24: Reconsider the underscore prefixes in import aliases for public functions.

The imported functions load_config_yml and resolve_value are defined as public APIs in the source module (no underscore prefix), but the aliases use underscore prefixes (_load_root_config, _resolve_value), which conventionally marks them as private/internal. Either import them without the underscore aliases for clarity, or document why these public APIs are being treated as internal in this module.

extras/obsidian-injest/config.py-51-54 (1)

51-54: Missing config.yml doesn't raise — silent empty config.

If config.yml doesn't exist, config_data remains an empty dict and the code continues until it fails at get_model_config. Consider raising early with a clear error message.

🔎 Proposed fix
 config_data = {}
 if CONFIG_YML_PATH.exists():
     with open(CONFIG_YML_PATH, 'r') as f:
         config_data = yaml.safe_load(f)
+else:
+    raise FileNotFoundError(f"config.yml not found at {CONFIG_YML_PATH}")
backends/advanced/src/advanced_omi_backend/llm_client.py-57-62 (1)

57-62: Error message references environment variables but config is from YAML.

The error message mentions OPENAI_API_KEY, OPENAI_BASE_URL, and OPENAI_MODEL which are environment variable names, but the values now come from config.yml. Update the message to reflect the actual config source.

🔎 Proposed fix
         self.api_key = api_key
         self.base_url = base_url
         self.model = model
         if not self.api_key or not self.base_url or not self.model:
-            raise ValueError("OPENAI_API_KEY, OPENAI_BASE_URL, and OPENAI_MODEL must be set")
+            raise ValueError("api_key, base_url, and model must be provided (check config.yml)")
extras/obsidian-injest/config.py-84-85 (1)

84-85: Type annotation mismatch — .get() can return None.

NEO4J_USER and NEO4J_PASSWORD are typed as str but dict.get() returns Optional[str]. This could cause runtime issues if the values are missing.

🔎 Proposed fix
-NEO4J_USER: str = env_data.get("NEO4J_USER")
-NEO4J_PASSWORD: str = env_data.get("NEO4J_PASSWORD")
+NEO4J_USER: str = env_data.get("NEO4J_USER", "neo4j")
+NEO4J_PASSWORD: str = env_data.get("NEO4J_PASSWORD", "")

Or add validation similar to NEO4J_HOST if these are required.

backends/advanced/src/advanced_omi_backend/routers/modules/obsidian_routes.py-192-200 (1)

192-200: Cancellation endpoint sets status but worker never checks it.

The /cancel endpoint sets status to "cancel_requested", but the _ingest_worker loop never checks this flag. Users calling this endpoint will receive a success response but ingestion will continue until completion. Consider either implementing cooperative cancellation in the worker or removing this endpoint to avoid user confusion.

Option 1: Implement cooperative cancellation in worker

Add a check in the worker loop:

             for file in files:
+                # Check for cancellation
+                with _jobs_lock:
+                    if _jobs.get(job_id, {}).get("status") == "cancel_requested":
+                        _jobs[job_id]["status"] = "cancelled"
+                        logger.info(f"Job {job_id} cancelled by user")
+                        return
                 if not file.endswith('.md'):
                     continue

Committable suggestion skipped: line range outside the PR's diff.

🧹 Nitpick comments (14)
backends/advanced/webui/src/components/MemorySettings.tsx (2)

4-4: Admin gating implemented correctly.

The component properly restricts access to admins using the auth context and returns null for non-admin users, effectively hiding the entire UI. This is a solid security pattern.

💡 Optional: Prevent unnecessary API call for non-admins

The useEffect hook at line 21 calls loadMemoryConfig() even when the user is not an admin (hooks execute before the early return). While this is likely protected by routing, you could optimize it by guarding the effect:

 useEffect(() => {
+  if (!isAdmin) return
   loadMemoryConfig()
-}, [])
+}, [isAdmin])

This prevents the API call from being made for non-admin users who might briefly access this component.

Also applies to: 18-18, 134-137


210-215: Contextual admin guidance is helpful.

The conditional helper text guides admins when authorization or endpoint availability errors occur, improving the troubleshooting experience.

💡 Optional: Use error codes instead of string matching

The current implementation uses string matching (error.includes('does not expose')) to determine which guidance to show. While this works, it's fragile—if the error message text changes (lines 37, 39), the condition will silently break.

Consider introducing error codes or flags:

const [errorCode, setErrorCode] = useState<'unauthorized' | 'no_endpoints' | 'generic' | null>(null)

// In loadMemoryConfig catch block:
if (status === 401) {
  setErrorCode('unauthorized')
  setError('Unauthorized: admin privileges required...')
} else if (status === 404 || status === 405) {
  setErrorCode('no_endpoints')
  setError('Backend does not expose memory configuration endpoints')
}

// In helper text:
{errorCode && ['unauthorized', 'no_endpoints'].includes(errorCode) && (
  // ... guidance
)}

This decouples the UI logic from the exact error message wording.

backends/advanced/webui/src/pages/Upload.tsx (2)

36-36: Consider adding a proper type for obsidianStatus.

Using any type loses type safety. Define an interface for the status response to improve maintainability.

🔎 Suggested type definition
interface ObsidianIngestionStatus {
  status: 'pending' | 'running' | 'completed' | 'failed'
  processed: number
  total: number
  percent: number
  last_file?: string
  error?: string
  errors?: string[]
}

// Then use:
const [obsidianStatus, setObsidianStatus] = useState<ObsidianIngestionStatus | null>(null)

190-212: Minor: Consider adding a mounted flag to prevent state updates after unmount.

If the component unmounts while a poll() request is in flight, setObsidianStatus or setObsidianPolling may be called on an unmounted component.

🔎 Suggested fix with mounted flag
  useEffect(() => {
    let interval: number | undefined
+   let mounted = true
    if (obsidianPolling && obsidianJobId) {
      const poll = async () => {
        try {
          const res = await obsidianApi.status(obsidianJobId)
+         if (!mounted) return
          setObsidianStatus(res.data)
          if (res.data.status === 'completed' || res.data.status === 'failed') {
            setObsidianPolling(false)
            setObsidianMessage(res.data.status === 'completed' ? '✅ Ingestion completed' : `❌ Failed: ${res.data.error || 'unknown error'}`)
          }
        } catch (err) {
+         if (!mounted) return
          setObsidianPolling(false)
          setObsidianError('Failed to fetch status')
        }
      }
      poll()
      interval = window.setInterval(poll, 1500)
    }
    return () => {
+     mounted = false
      if (interval) window.clearInterval(interval)
    }
  }, [obsidianPolling, obsidianJobId])
app/app/components/ObsidianIngest.tsx (2)

1-4: Remove unused import and blank line.

ActivityIndicator is imported but never used in the component. Also, there's an unnecessary blank line at the start of the file.

🔎 Proposed fix
-
-import React, { useState } from 'react';
-import { View, Text, TextInput, TouchableOpacity, StyleSheet, Alert, ActivityIndicator } from 'react-native';
+import React, { useState } from 'react';
+import { View, Text, TextInput, TouchableOpacity, StyleSheet, Alert } from 'react-native';

39-46: Consider adding a timeout for the fetch request.

The ingestion operation could take a while for large vaults. Without a timeout, the request could hang indefinitely if the server becomes unresponsive. Consider using AbortController with a timeout.

🔎 Proposed fix using AbortController
+        const controller = new AbortController();
+        const timeoutId = setTimeout(() => controller.abort(), 300000); // 5 min timeout
+
         const response = await fetch(`${baseUrl}/api/obsidian/ingest`, {
             method: 'POST',
             headers: {
                 'Content-Type': 'application/json',
                 'Authorization': `Bearer ${jwtToken}`
             },
-            body: JSON.stringify({ vault_path: vaultPath })
+            body: JSON.stringify({ vault_path: vaultPath }),
+            signal: controller.signal
         });
+
+        clearTimeout(timeoutId);
backends/advanced/src/advanced_omi_backend/services/memory/config.py (1)

90-91: Misleading function name.

The function is named build_memory_config_from_env but the docstring correctly states it builds configuration "strictly from config.yml". Consider renaming to build_memory_config_from_yml or build_memory_config for clarity.

backends/advanced/src/advanced_omi_backend/services/obsidian_service.py (2)

41-65: Duplicate utility functions — consider reusing from shared module.

load_env_file and resolve_value are duplicated across multiple files:

  • backends/advanced/src/advanced_omi_backend/services/memory/config.py
  • extras/obsidian-injest/config.py
  • This file

Consider importing from the shared services/memory/config.py module instead of duplicating these functions.

🔎 Proposed refactor
-from advanced_omi_backend.services.memory.providers.llm_providers import _get_openai_client
-from advanced_omi_backend.services.memory.config import load_config_yml as load_root_config
+from advanced_omi_backend.services.memory.providers.llm_providers import _get_openai_client
+from advanced_omi_backend.services.memory.config import (
+    load_config_yml as load_root_config,
+    resolve_value,
+)
+from advanced_omi_backend.services.memory.config import load_env_file  # If added there

Then remove the local load_env_file and resolve_value definitions.

Also applies to: 67-81


107-113: Redundant checks after get_model_config already raises.

The inner get_model_config function raises ValueError if the config is not found (lines 101, 105), so the subsequent if not llm_config and if not embed_config checks on lines 108-113 are unreachable. Remove the redundant checks or adjust the helper to return None instead of raising.

🔎 Proposed fix — remove redundant checks
         llm_config = get_model_config('llm')
-        if not llm_config:
-            raise ValueError("Configuration for 'defaults.llm' not found in config.yml")

         embed_config = get_model_config('embedding')
-        if not embed_config:
-            raise ValueError("Configuration for 'defaults.embedding' not found in config.yml")
extras/obsidian-injest/search_query.py (1)

24-27: Consider adding error handling for embedding failures.

Unlike the ObsidianService.get_embedding() which returns None on failure, this version will propagate exceptions. For a CLI test tool this may be acceptable, but adding a try-except with logging.exception() would improve debuggability.

backends/advanced/src/advanced_omi_backend/controllers/system_controller.py (1)

496-526: Consider atomic write for config updates.

update_memory_config_raw writes directly to the config file. If the write fails mid-way, the file could be corrupted. Consider writing to a temp file first, then renaming (atomic on most filesystems).

🔎 Proposed atomic write pattern
+import tempfile
+
 async def update_memory_config_raw(config_yaml: str):
     """Save memory YAML into config.yml and hot-reload memory service."""
     try:
         mem = _parse_memory_yaml(config_yaml)
         path, data = _find_config_yml_path()
         # Merge and write
         data["memory"] = mem
         # Ensure parents
         path.parent.mkdir(parents=True, exist_ok=True)
-        with open(path, "w") as f:
-            yaml.safe_dump(data, f, sort_keys=False)
+        # Write atomically via temp file
+        with tempfile.NamedTemporaryFile(
+            mode="w", dir=path.parent, delete=False, suffix=".tmp"
+        ) as tmp:
+            yaml.safe_dump(data, tmp, sort_keys=False)
+            tmp_path = Path(tmp.name)
+        tmp_path.replace(path)
extras/obsidian-injest/config.py (2)

16-31: Significant code duplication across files.

load_env_file, resolve_value, and get_model_config are duplicated in:

  • backends/advanced/src/advanced_omi_backend/services/obsidian_service.py
  • backends/advanced/src/advanced_omi_backend/services/memory/config.py
  • This file

Since extras/obsidian-injest/ is a separate utility, some duplication may be intentional for standalone operation. However, if these utilities share the same codebase, consider extracting to a shared package.

Also applies to: 34-42, 59-67


124-139: Minor: __all__ is unsorted.

Static analysis suggests sorting __all__ for consistency. This is a minor style issue.

🔎 Sorted __all__
 __all__ = [
-    "VAULT_PATH",
-    "NEO4J_URI",
-    "NEO4J_USER",
-    "NEO4J_PASSWORD",
+    "CHUNK_CHAR_LIMIT",
+    "CHUNK_OVERLAP",
     "EMBEDDING_MODEL",
     "EMBEDDING_DIMENSIONS",
     "LLM_MODEL",
-    "CHUNK_CHAR_LIMIT",
-    "CHUNK_OVERLAP",
+    "LOG_FORMAT",
+    "LOG_LEVEL",
+    "NEO4J_PASSWORD",
+    "NEO4J_URI",
+    "NEO4J_USER",
     "OPENAI_BASE_URL",
     "OPENAI_API_KEY",
-    "LOG_LEVEL",
-    "LOG_FORMAT",
+    "VAULT_PATH",
     "client",
 ]
backends/advanced/src/advanced_omi_backend/routers/modules/obsidian_routes.py (1)

10-10: Remove unused import BackgroundTasks.

BackgroundTasks is imported but never used in this module.

Proposed fix
-from fastapi import APIRouter, Depends, HTTPException, BackgroundTasks, UploadFile, File, Body
+from fastapi import APIRouter, Depends, HTTPException, UploadFile, File, Body
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2e79b70 and 8d7f262.

📒 Files selected for processing (20)
  • app/app/components/ObsidianIngest.tsx (1 hunks)
  • app/app/index.tsx (2 hunks)
  • backends/advanced/docker-compose.yml (5 hunks)
  • backends/advanced/pyproject.toml (1 hunks)
  • backends/advanced/src/advanced_omi_backend/controllers/system_controller.py (4 hunks)
  • backends/advanced/src/advanced_omi_backend/llm_client.py (3 hunks)
  • backends/advanced/src/advanced_omi_backend/memory_config_loader.py (0 hunks)
  • backends/advanced/src/advanced_omi_backend/routers/api_router.py (2 hunks)
  • backends/advanced/src/advanced_omi_backend/routers/modules/__init__.py (2 hunks)
  • backends/advanced/src/advanced_omi_backend/routers/modules/health_routes.py (5 hunks)
  • backends/advanced/src/advanced_omi_backend/routers/modules/obsidian_routes.py (1 hunks)
  • backends/advanced/src/advanced_omi_backend/routers/modules/system_routes.py (1 hunks)
  • backends/advanced/src/advanced_omi_backend/services/memory/config.py (2 hunks)
  • backends/advanced/src/advanced_omi_backend/services/obsidian_service.py (1 hunks)
  • backends/advanced/src/advanced_omi_backend/services/transcription/__init__.py (2 hunks)
  • backends/advanced/webui/src/components/MemorySettings.tsx (8 hunks)
  • backends/advanced/webui/src/pages/Upload.tsx (4 hunks)
  • backends/advanced/webui/src/services/api.ts (2 hunks)
  • extras/obsidian-injest/config.py (1 hunks)
  • extras/obsidian-injest/search_query.py (1 hunks)
💤 Files with no reviewable changes (1)
  • backends/advanced/src/advanced_omi_backend/memory_config_loader.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use Black formatter with 100-character line length for Python code
Use isort for Python import organization
ALL imports must be at the top of the file after the docstring - never import modules in the middle of functions or files
Group imports in Python files: standard library, third-party, then local imports
Use lazy imports sparingly and only when absolutely necessary for circular import issues in Python
Always raise errors in Python, never silently ignore - use explicit error handling with proper exceptions rather than silent failures
Avoid defensive hasattr() checks in Python - research and understand input/response or class structure instead

Files:

  • backends/advanced/src/advanced_omi_backend/routers/api_router.py
  • backends/advanced/src/advanced_omi_backend/services/transcription/__init__.py
  • backends/advanced/src/advanced_omi_backend/services/memory/config.py
  • extras/obsidian-injest/config.py
  • backends/advanced/src/advanced_omi_backend/services/obsidian_service.py
  • backends/advanced/src/advanced_omi_backend/routers/modules/health_routes.py
  • extras/obsidian-injest/search_query.py
  • backends/advanced/src/advanced_omi_backend/routers/modules/__init__.py
  • backends/advanced/src/advanced_omi_backend/controllers/system_controller.py
  • backends/advanced/src/advanced_omi_backend/routers/modules/obsidian_routes.py
  • backends/advanced/src/advanced_omi_backend/routers/modules/system_routes.py
  • backends/advanced/src/advanced_omi_backend/llm_client.py
app/**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Use TypeScript following standard React Native conventions for mobile app development

Files:

  • app/app/index.tsx
  • app/app/components/ObsidianIngest.tsx
🧠 Learnings (1)
📚 Learning: 2025-12-08T23:52:34.959Z
Learnt from: AnkushMalaker
Repo: chronicler-ai/chronicle PR: 178
File: backends/advanced/src/advanced_omi_backend/services/memory/providers/mycelia.py:218-223
Timestamp: 2025-12-08T23:52:34.959Z
Learning: In Python code (chronicle project), prefer logging.exception() inside except blocks to automatically log the full stack trace. When re-raising exceptions, always chain with 'raise ... from e' to preserve the original context; use 'raise ... from None' only if you explicitly want to suppress the context. This improves debuggability across Python files.

Applied to files:

  • backends/advanced/src/advanced_omi_backend/routers/api_router.py
  • backends/advanced/src/advanced_omi_backend/services/transcription/__init__.py
  • backends/advanced/src/advanced_omi_backend/services/memory/config.py
  • extras/obsidian-injest/config.py
  • backends/advanced/src/advanced_omi_backend/services/obsidian_service.py
  • backends/advanced/src/advanced_omi_backend/routers/modules/health_routes.py
  • extras/obsidian-injest/search_query.py
  • backends/advanced/src/advanced_omi_backend/routers/modules/__init__.py
  • backends/advanced/src/advanced_omi_backend/controllers/system_controller.py
  • backends/advanced/src/advanced_omi_backend/routers/modules/obsidian_routes.py
  • backends/advanced/src/advanced_omi_backend/routers/modules/system_routes.py
  • backends/advanced/src/advanced_omi_backend/llm_client.py
🧬 Code graph analysis (12)
backends/advanced/webui/src/components/MemorySettings.tsx (1)
backends/advanced/webui/src/contexts/AuthContext.tsx (1)
  • useAuth (115-121)
app/app/index.tsx (1)
app/app/components/ObsidianIngest.tsx (1)
  • ObsidianIngest (10-91)
backends/advanced/src/advanced_omi_backend/services/transcription/__init__.py (1)
backends/advanced/src/advanced_omi_backend/services/memory/config.py (2)
  • load_config_yml (72-88)
  • resolve_value (57-70)
backends/advanced/src/advanced_omi_backend/services/memory/config.py (3)
extras/obsidian-injest/config.py (1)
  • resolve_value (34-42)
backends/advanced/src/advanced_omi_backend/models/conversation.py (1)
  • MemoryProvider (29-34)
backends/advanced/src/advanced_omi_backend/services/transcription/base.py (1)
  • name (65-67)
backends/advanced/webui/src/pages/Upload.tsx (1)
backends/advanced/webui/src/services/api.ts (1)
  • obsidianApi (230-247)
extras/obsidian-injest/config.py (2)
backends/advanced/src/advanced_omi_backend/services/obsidian_service.py (3)
  • load_env_file (41-65)
  • resolve_value (67-81)
  • get_model_config (98-105)
backends/advanced/src/advanced_omi_backend/services/memory/config.py (1)
  • resolve_value (57-70)
backends/advanced/src/advanced_omi_backend/services/obsidian_service.py (3)
backends/advanced/src/advanced_omi_backend/services/memory/providers/llm_providers.py (1)
  • _get_openai_client (45-69)
extras/obsidian-injest/config.py (3)
  • load_env_file (16-31)
  • resolve_value (34-42)
  • get_model_config (59-67)
extras/obsidian-injest/search_query.py (1)
  • get_embedding (24-27)
backends/advanced/src/advanced_omi_backend/routers/modules/health_routes.py (1)
backends/advanced/src/advanced_omi_backend/services/memory/config.py (2)
  • load_config_yml (72-88)
  • resolve_value (57-70)
extras/obsidian-injest/search_query.py (1)
backends/advanced/src/advanced_omi_backend/services/obsidian_service.py (2)
  • get_embedding (199-216)
  • close (171-175)
backends/advanced/src/advanced_omi_backend/controllers/system_controller.py (3)
backends/advanced/src/advanced_omi_backend/services/memory/config.py (1)
  • load_config_yml (72-88)
backends/advanced/src/advanced_omi_backend/services/memory/service_factory.py (2)
  • reset_memory_service (127-133)
  • get_memory_service (73-110)
backends/advanced/src/advanced_omi_backend/routers/modules/system_routes.py (3)
  • get_memory_config_raw (77-79)
  • validate_memory_config_raw (82-87)
  • update_memory_config_raw (90-95)
backends/advanced/src/advanced_omi_backend/routers/modules/system_routes.py (1)
backends/advanced/src/advanced_omi_backend/controllers/system_controller.py (4)
  • get_memory_config_raw (441-451)
  • validate_memory_config_raw (470-493)
  • update_memory_config_raw (496-526)
  • reload_memory_config (529-538)
backends/advanced/src/advanced_omi_backend/llm_client.py (2)
backends/advanced/src/advanced_omi_backend/services/memory/config.py (2)
  • load_config_yml (72-88)
  • resolve_value (57-70)
backends/advanced/src/advanced_omi_backend/services/obsidian_service.py (1)
  • resolve_value (67-81)
🪛 Ruff (0.14.8)
backends/advanced/src/advanced_omi_backend/services/transcription/__init__.py

62-62: Local variable api_family is assigned to but never used

Remove assignment to unused variable api_family

(F841)


80-80: Use raise without specifying exception name

Remove exception name

(TRY201)

backends/advanced/src/advanced_omi_backend/services/memory/config.py

88-88: Avoid specifying long messages outside the exception class

(TRY003)


98-98: Avoid specifying long messages outside the exception class

(TRY003)


137-137: Avoid specifying long messages outside the exception class

(TRY003)


141-141: Avoid specifying long messages outside the exception class

(TRY003)


159-159: Avoid specifying long messages outside the exception class

(TRY003)

extras/obsidian-injest/config.py

71-71: Avoid specifying long messages outside the exception class

(TRY003)


75-75: Avoid specifying long messages outside the exception class

(TRY003)


80-80: Avoid specifying long messages outside the exception class

(TRY003)


124-139: __all__ is not sorted

Apply an isort-style sorting to __all__

(RUF022)

backends/advanced/src/advanced_omi_backend/services/obsidian_service.py

101-101: Avoid specifying long messages outside the exception class

(TRY003)


105-105: Avoid specifying long messages outside the exception class

(TRY003)


109-109: Avoid specifying long messages outside the exception class

(TRY003)


113-113: Avoid specifying long messages outside the exception class

(TRY003)


133-133: Avoid specifying long messages outside the exception class

(TRY003)


210-210: Multiple statements on one line (colon)

(E701)


214-214: Do not catch blind exception: Exception

(BLE001)


215-215: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


329-329: Probable use of insecure hash functions in hashlib: md5

(S324)


362-362: Avoid specifying long messages outside the exception class

(TRY003)


370-370: Unused blanket noqa directive

Remove unused noqa directive

(RUF100)


380-380: Do not catch blind exception: Exception

(BLE001)


381-381: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


382-382: Use explicit conversion flag

Replace with conversion flag

(RUF010)

backends/advanced/src/advanced_omi_backend/routers/modules/health_routes.py

100-100: Do not catch blind exception: Exception

(BLE001)


102-102: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


102-102: Avoid specifying long messages outside the exception class

(TRY003)


102-102: Undefined name e

(F821)


246-246: Do not catch blind exception: Exception

(BLE001)


248-248: Use explicit conversion flag

Replace with conversion flag

(RUF010)

backends/advanced/src/advanced_omi_backend/controllers/system_controller.py

448-448: Consider moving this statement to an else block

(TRY300)


449-449: Do not catch blind exception: Exception

(BLE001)


450-450: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


451-451: Use explicit conversion flag

Replace with conversion flag

(RUF010)


458-458: Prefer TypeError exception for invalid type

(TRY004)


458-458: Abstract raise to an inner function

(TRY301)


458-458: Avoid specifying long messages outside the exception class

(TRY003)


464-464: Prefer TypeError exception for invalid type

(TRY004)


464-464: Abstract raise to an inner function

(TRY301)


464-464: Avoid specifying long messages outside the exception class

(TRY003)


465-465: Consider moving this statement to an else block

(TRY300)


466-466: Do not catch blind exception: Exception

(BLE001)


467-467: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


467-467: Avoid specifying long messages outside the exception class

(TRY003)


488-488: Consider moving this statement to an else block

(TRY300)


491-491: Do not catch blind exception: Exception

(BLE001)


492-492: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


493-493: Use explicit conversion flag

Replace with conversion flag

(RUF010)


513-513: Do not catch blind exception: Exception

(BLE001)


524-524: Do not catch blind exception: Exception

(BLE001)


525-525: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


526-526: Use explicit conversion flag

Replace with conversion flag

(RUF010)


535-535: Consider moving this statement to an else block

(TRY300)


536-536: Do not catch blind exception: Exception

(BLE001)


537-537: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


538-538: Use explicit conversion flag

Replace with conversion flag

(RUF010)

backends/advanced/src/advanced_omi_backend/routers/modules/obsidian_routes.py

27-27: Unused function argument: current_user

(ARG001)


27-27: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


38-38: Consider moving this statement to an else block

(TRY300)


39-39: Do not catch blind exception: Exception

(BLE001)


40-40: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


41-41: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


51-51: Loop control variable root not used within loop body

Rename unused root to _root

(B007)


64-64: Do not catch blind exception: Exception

(BLE001)


94-94: Do not catch blind exception: Exception

(BLE001)


95-95: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


98-98: Use explicit conversion flag

Replace with conversion flag

(RUF010)


103-103: Do not catch blind exception: Exception

(BLE001)


104-104: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


113-113: Do not perform function call File in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


114-114: Unused function argument: current_user

(ARG001)


114-114: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


149-149: Do not catch blind exception: Exception

(BLE001)


150-150: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


151-151: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


157-157: Unused function argument: current_user

(ARG001)


157-157: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


179-179: Unused function argument: current_user

(ARG001)


179-179: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


193-193: Unused function argument: current_user

(ARG001)


193-193: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

backends/advanced/src/advanced_omi_backend/routers/modules/system_routes.py

77-77: Unused function argument: current_user

(ARG001)


77-77: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


84-84: Unused function argument: current_user

(ARG001)


84-84: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


92-92: Unused function argument: current_user

(ARG001)


92-92: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


98-98: Unused function argument: current_user

(ARG001)


98-98: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

backends/advanced/src/advanced_omi_backend/llm_client.py

158-158: Avoid specifying long messages outside the exception class

(TRY003)


162-162: Avoid specifying long messages outside the exception class

(TRY003)


166-166: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (21)
backends/advanced/webui/src/components/MemorySettings.tsx (2)

34-42: Enhanced error handling improves UX.

The differentiated error messages (401 unauthorized, 404/405 endpoints not exposed, generic errors) help users quickly understand the root cause and take appropriate action.


224-224: Button disabling logic prevents futile operations.

Disabling validate and save actions when an error exists (e.g., unauthorized, endpoints unavailable) prevents users from attempting operations that will fail. Users can still reload or reset to recover.

Also applies to: 242-242

backends/advanced/docker-compose.yml (1)

11-11: LGTM!

The config.yml volume mounts enable the config-driven architecture for both the backend and workers containers.

Also applies to: 67-67

backends/advanced/src/advanced_omi_backend/routers/modules/__init__.py (1)

23-23: LGTM!

The import and export follow the established pattern for router modules in this package.

Also applies to: 36-36

backends/advanced/src/advanced_omi_backend/routers/api_router.py (1)

18-18: LGTM!

The obsidian router is correctly imported and included in the main API router, consistent with other sub-routers.

Also applies to: 38-38

backends/advanced/webui/src/pages/Upload.tsx (1)

449-542: LGTM!

The Obsidian Vault Import UI is well-structured with appropriate loading states, progress indicators, and error handling. The isAdmin guard at line 451 is redundant (the component returns early at line 239 for non-admins) but serves as defensive programming.

app/app/index.tsx (1)

31-31: LGTM!

The ObsidianIngest component is correctly integrated with authentication-gated rendering. The component appropriately handles WebSocket-to-HTTP URL conversion internally for API calls.

Also applies to: 542-548

backends/advanced/src/advanced_omi_backend/services/transcription/__init__.py (1)

53-77: Config-driven provider selection looks correct.

The logic properly handles provider selection from config.yml with appropriate warnings for missing configurations. The transition from environment variables to config-driven architecture is consistent with the broader PR goals.

Verify that all deployment environments have been updated with config.yml containing the required STT configuration, as this change removes environment variable fallbacks.

backends/advanced/pyproject.toml (1)

13-13: LGTM!

The neo4j>=5.0.0 dependency is required for the ObsidianService Neo4j integration. The version constraint is appropriate for Neo4j 5.x and newer versions.

backends/advanced/webui/src/services/api.ts (1)

230-247: LGTM! Well-structured Obsidian API client.

The implementation follows existing patterns in the file (similar to uploadApi) and correctly handles:

  • Multipart form data upload with progress tracking
  • Reasonable 5-minute timeout for large vault uploads
  • Consistent endpoint naming matching the backend routes
backends/advanced/src/advanced_omi_backend/routers/modules/system_routes.py (1)

75-100: LGTM! Clean transition to config.yml-based configuration.

The endpoint updates correctly:

  • Added dedicated validation endpoint (/admin/memory/config/validate) that doesn't persist
  • Switched to media_type="text/plain" for accepting raw YAML content
  • Updated docstrings to reflect config.yml usage

Regarding the static analysis hints: The Depends() in argument defaults and "unused" current_user parameters are standard FastAPI patterns for dependency injection and authorization enforcement—these are false positives.

app/app/components/ObsidianIngest.tsx (1)

61-91: LGTM! Clean and accessible UI implementation.

The component structure is well-organized with appropriate:

  • Visual feedback for loading state via button text and disabled styling
  • Helpful instructions for users about the vault path requirement
  • Proper accessibility attributes (autoCapitalize, autoCorrect)
backends/advanced/src/advanced_omi_backend/services/memory/config.py (2)

56-70: LGTM! Environment variable resolution logic.

The implementation correctly handles both ${VAR} and ${VAR:-default} syntax patterns and safely returns the original value for non-matching inputs.

Note: Similar logic exists in extras/obsidian-injest/config.py. Consider extracting to a shared utility if this pattern is used across more modules to avoid duplication.


72-88: LGTM! Robust config file discovery.

The multi-path search strategy appropriately handles Docker, development, and relative path scenarios. Using yaml.safe_load is the correct secure choice.

backends/advanced/src/advanced_omi_backend/routers/modules/health_routes.py (1)

88-101: Consider graceful degradation vs failing the health check.

The current logic sets fallback values but then immediately raises an exception, so the fallbacks are never used. Decide on the intended behavior:

  1. Fail health check if config is missing - remove fallback assignments (lines 101)
  2. Gracefully degrade - log the error and continue with fallbacks (remove the raise)

The current implementation is contradictory.

backends/advanced/src/advanced_omi_backend/services/obsidian_service.py (1)

326-336: MD5 for chunk IDs is fine for non-cryptographic use.

The static analysis flags MD5 as insecure (S324), but here it's used for generating deterministic chunk identifiers, not for security purposes. This is acceptable. However, consider using hashlib.sha256 if you want to avoid lint warnings without suppression.

backends/advanced/src/advanced_omi_backend/llm_client.py (1)

150-172: Config-driven factory implementation looks good.

The factory correctly loads config, validates the api_family, resolves values, and constructs the client. Good use of shared utilities from memory/config.py instead of duplicating code.

extras/obsidian-injest/search_query.py (1)

108-117: Logger output won't appear without explicit handler setup.

The __main__ block uses logger.info() but since this is module-level logging and Python's default logger has no handlers, output won't appear unless config.py's _configure_logging() runs first. Since config is imported, this should work, but consider using print() for CLI output or verifying the logging setup.

backends/advanced/src/advanced_omi_backend/controllers/system_controller.py (1)

420-438: Good: Consistent config path resolution matching other modules.

The _find_config_yml_path function follows the same search order as load_config_yml in memory/config.py, maintaining consistency across the codebase.

extras/obsidian-injest/config.py (1)

118-119: Module-level OpenAI client instantiation may crash imports.

Similar to obsidian_service.py, instantiating the OpenAI client at module load time means any configuration error will prevent the entire module from being imported. For a CLI utility this is likely acceptable, but worth noting.

backends/advanced/src/advanced_omi_backend/routers/modules/obsidian_routes.py (1)

178-189: LGTM!

The status endpoint correctly computes progress percentage and handles the job-not-found case appropriately.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backends/advanced/docker-compose.yml (1)

44-45: Document the steps to enable Neo4j for Obsidian ingestion.

The PR description states that Neo4j is required for Obsidian ingestion but is disabled by default. Users will need to manually uncomment the neo4j-mem0 service (lines 184-203) and its corresponding depends_on entry (lines 44-45) to use this feature.

Consider adding a comment block above the neo4j service or updating the README with clear instructions on:

  1. Which lines to uncomment
  2. How to configure NEO4J_PASSWORD in .env
  3. Any initialization steps required after enabling the service

Note: The past review comment about a critical mismatch between depends_on and service definition is now resolved since both are consistently commented out.

Also applies to: 184-203

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8d7f262 and 8f1579e.

📒 Files selected for processing (1)
  • backends/advanced/docker-compose.yml (4 hunks)
🔇 Additional comments (1)
backends/advanced/docker-compose.yml (1)

186-186: LGTM! Configuration improvements for Neo4j service.

The additions to the commented neo4j-mem0 service definition are well-chosen:

  • hostname enables reliable service discovery
  • Listen address configurations on 0.0.0.0 allow proper network access
  • Memory heap settings (512m initial, 2G max) provide reasonable defaults

These will be valuable when users enable the Neo4j service.

Also applies to: 195-199

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (6)
backends/advanced/src/advanced_omi_backend/controllers/system_controller.py (2)

491-494: Use logging.exception() instead of logging.error() in except block.

This was flagged in a previous review. Per project learnings, logging.exception() should be used in except blocks to automatically log the full stack trace.


530-539: LGTM with minor note on logging.

The reload logic is straightforward and correctly uses reset_memory_service() followed by get_memory_service() to reinitialize. The logger.error vs logger.exception issue was already flagged in a previous review.

backends/advanced/src/advanced_omi_backend/services/transcription/__init__.py (2)

62-62: Remove unused api_family variable.

This issue was already flagged in a previous review. The api_family variable is assigned but never used.


78-80: Consider using logger.exception() instead of logger.warning() in the except block.

The bare raise is correct now. However, the past review comment also suggested using logger.exception() to automatically log the full stack trace. Based on learnings from this repository.

backends/advanced/src/advanced_omi_backend/routers/modules/health_routes.py (1)

100-102: Exception handling has dead code and missing exception chaining.

Two issues remain:

  1. Dead code: Line 101 assigns default values, but line 102 immediately raises an exception, so those defaults are never used.

  2. Missing exception chaining: Per static analysis (B904) and project conventions, use raise ... from e to preserve the original context.

For a health check endpoint, consider gracefully degrading instead of failing:

🔎 Proposed fix (graceful degradation)
     except Exception as e:
+        logger.warning(f"Failed to load configuration from config.yml for health check: {e}")
         _llm_provider, _llm_model, _llm_base_url, _stt_name = "openai", None, None, None
-        raise ValueError(f"Failed to load configuration from config.yml for health check: {e}")
🔎 Alternative fix (if raising is intended)
     except Exception as e:
-        _llm_provider, _llm_model, _llm_base_url, _stt_name = "openai", None, None, None
-        raise ValueError(f"Failed to load configuration from config.yml for health check: {e}")
+        raise ValueError("Failed to load configuration from config.yml for health check") from e
backends/advanced/src/advanced_omi_backend/services/obsidian_service.py (1)

208-216: Use logging.exception() and fix multi-statement line.

Line 210 has multiple statements on one line. Per project learnings, use logging.exception() inside except blocks to automatically capture the full stack trace.

🔎 Proposed fix
     def get_embedding(self, text: str) -> Optional[List[float]]:
         ...
         try:
             clean_text = re.sub(r'\s+', ' ', text).strip()
-            if not clean_text: return None
+            if not clean_text:
+                return None
             
             response = self.client.embeddings.create(input=[clean_text], model=self.embedding_model)
             return response.data[0].embedding
         except Exception as e:
-            logger.error(f"Embedding failed for text length {len(text)}: {e}")
+            logger.exception("Embedding failed for text length %d", len(text))
             return None

Based on learnings, always use logging.exception() in except blocks to preserve the full stack trace for debugging.

🧹 Nitpick comments (7)
extras/obsidian-injest/search_query.py (1)

106-115: Consider adding error handling in the CLI test block.

While this is a testing script and crashing on errors might be acceptable, adding a try/except block would provide better user experience with informative error messages rather than stack traces.

🔎 Proposed enhancement
 if __name__ == "__main__":
     user_query: str = "How do I create a table in dataview?"
     
-    final_response = graph_rag_query(user_query)
-    
-    logger.info("%s", "=" * 50)
-    logger.info("QUESTION: %s", user_query)
-    logger.info("%s", "-" * 50)
-    logger.info("AI ANSWER:\n%s", final_response)
-    logger.info("%s", "=" * 50)
+    try:
+        final_response = graph_rag_query(user_query)
+        
+        logger.info("%s", "=" * 50)
+        logger.info("QUESTION: %s", user_query)
+        logger.info("%s", "-" * 50)
+        logger.info("AI ANSWER:\n%s", final_response)
+        logger.info("%s", "=" * 50)
+    except Exception as e:
+        logger.exception("Query failed")
+        raise
backends/advanced/src/advanced_omi_backend/controllers/system_controller.py (4)

19-23: Group third-party imports together per coding guidelines.

The yaml import (third-party) at line 20 is placed between local import blocks. Per coding guidelines, imports should be grouped: standard library, third-party, then local imports.

🔎 Proposed fix
 from advanced_omi_backend.models.user import User
 from advanced_omi_backend.task_manager import get_task_manager
 from fastapi.responses import JSONResponse
-import yaml
-
-from advanced_omi_backend.services.memory.config import load_config_yml as _load_root_config
+import yaml  # Move after fastapi.responses in third-party group
+
+from advanced_omi_backend.config import (
+    load_diarization_settings_from_file,
+    save_diarization_settings_to_file,
+)
+from advanced_omi_backend.models.user import User
+from advanced_omi_backend.task_manager import get_task_manager
+from advanced_omi_backend.services.memory.config import load_config_yml as _load_root_config
 from advanced_omi_backend.services.memory.service_factory import reset_memory_service, get_memory_service

Or simply reorganize so all third-party imports (fastapi, yaml) are together before local imports.


22-22: Unused import and code duplication with load_config_yml.

_load_root_config is imported at line 22 but never used. Additionally, _find_config_yml_path duplicates the path search logic from load_config_yml in services/memory/config.py.

Consider either:

  1. Removing the unused import and keeping the local implementation, or
  2. Refactoring load_config_yml to also return the resolved path, then reusing it here
🔎 Option 1: Remove unused import
-from advanced_omi_backend.services.memory.config import load_config_yml as _load_root_config
 from advanced_omi_backend.services.memory.service_factory import reset_memory_service, get_memory_service

Also applies to: 420-438


449-451: Simplify exception handling in except block.

logger.exception() automatically logs the full traceback, making the {e} in the message redundant. Also, use bare raise instead of raise e to re-raise the original exception cleanly.

🔎 Proposed fix
     except Exception as e:
-        logger.exception(f"Failed to load memory configuration: {e}")
-        raise e
+        logger.exception("Failed to load memory configuration")
+        raise

454-468: Consider TypeError for type checks and simplify exception handling.

  1. Lines 458 and 464 use ValueError for type validation, but TypeError is more semantically appropriate when validating types.
  2. Same as above: {e} is redundant in logger.exception() and raise e should be bare raise.
🔎 Proposed fix
 def _parse_memory_yaml(config_yaml: str) -> dict:
     try:
         doc = yaml.safe_load(config_yaml) or {}
         if not isinstance(doc, dict):
-            raise ValueError("YAML must define a mapping")
+            raise TypeError("YAML must define a mapping")
         if "memory" in doc:
             mem = doc.get("memory") or {}
         else:
             mem = doc
         if not isinstance(mem, dict):
-            raise ValueError("'memory' section must be a mapping")
+            raise TypeError("'memory' section must be a mapping")
         return mem
     except Exception as e:
-        logger.exception(f"Invalid YAML: {e}")
-        raise e
+        logger.exception("Invalid YAML")
+        raise
backends/advanced/src/advanced_omi_backend/services/obsidian_service.py (2)

41-81: Consolidate duplicated utility functions.

Both load_env_file and resolve_value are duplicated from extras/obsidian-injest/config.py. The resolve_value function also exists in backends/advanced/src/advanced_omi_backend/services/memory/config.py (line 71-87).

Consider moving these utilities to a shared location (e.g., a common config utilities module) to reduce duplication and improve maintainability.


328-336: Consider using SHA-256 or UUID instead of MD5 for chunk IDs.

While MD5 is acceptable for non-cryptographic ID generation, using a more modern hash function like SHA-256 or UUID would avoid static analysis warnings and follow current best practices.

🔎 Proposed alternatives

Option 1: Use SHA-256

-                chunk_id = hashlib.md5(f"{note_data['path']}_{i}".encode()).hexdigest()
+                chunk_id = hashlib.sha256(f"{note_data['path']}_{i}".encode()).hexdigest()

Option 2: Use UUID5 (namespace-based)

+import uuid
...
-                chunk_id = hashlib.md5(f"{note_data['path']}_{i}".encode()).hexdigest()
+                chunk_id = str(uuid.uuid5(uuid.NAMESPACE_DNS, f"{note_data['path']}_{i}"))
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8f1579e and bd0a316.

📒 Files selected for processing (5)
  • backends/advanced/src/advanced_omi_backend/controllers/system_controller.py (4 hunks)
  • backends/advanced/src/advanced_omi_backend/routers/modules/health_routes.py (5 hunks)
  • backends/advanced/src/advanced_omi_backend/services/obsidian_service.py (1 hunks)
  • backends/advanced/src/advanced_omi_backend/services/transcription/__init__.py (2 hunks)
  • extras/obsidian-injest/search_query.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use Black formatter with 100-character line length for Python code
Use isort for Python import organization
ALL imports must be at the top of the file after the docstring - never import modules in the middle of functions or files
Group imports in Python files: standard library, third-party, then local imports
Use lazy imports sparingly and only when absolutely necessary for circular import issues in Python
Always raise errors in Python, never silently ignore - use explicit error handling with proper exceptions rather than silent failures
Avoid defensive hasattr() checks in Python - research and understand input/response or class structure instead

Files:

  • backends/advanced/src/advanced_omi_backend/services/transcription/__init__.py
  • backends/advanced/src/advanced_omi_backend/services/obsidian_service.py
  • backends/advanced/src/advanced_omi_backend/controllers/system_controller.py
  • extras/obsidian-injest/search_query.py
  • backends/advanced/src/advanced_omi_backend/routers/modules/health_routes.py
🧠 Learnings (1)
📚 Learning: 2025-12-08T23:52:34.959Z
Learnt from: AnkushMalaker
Repo: chronicler-ai/chronicle PR: 178
File: backends/advanced/src/advanced_omi_backend/services/memory/providers/mycelia.py:218-223
Timestamp: 2025-12-08T23:52:34.959Z
Learning: In Python code (chronicle project), prefer logging.exception() inside except blocks to automatically log the full stack trace. When re-raising exceptions, always chain with 'raise ... from e' to preserve the original context; use 'raise ... from None' only if you explicitly want to suppress the context. This improves debuggability across Python files.

Applied to files:

  • backends/advanced/src/advanced_omi_backend/services/transcription/__init__.py
  • backends/advanced/src/advanced_omi_backend/services/obsidian_service.py
  • backends/advanced/src/advanced_omi_backend/controllers/system_controller.py
  • extras/obsidian-injest/search_query.py
  • backends/advanced/src/advanced_omi_backend/routers/modules/health_routes.py
🧬 Code graph analysis (4)
backends/advanced/src/advanced_omi_backend/services/obsidian_service.py (4)
backends/advanced/src/advanced_omi_backend/services/memory/providers/llm_providers.py (1)
  • _get_openai_client (45-69)
backends/advanced/src/advanced_omi_backend/services/memory/config.py (2)
  • load_config_yml (72-88)
  • resolve_value (57-70)
extras/obsidian-injest/config.py (3)
  • load_env_file (16-31)
  • resolve_value (34-42)
  • get_model_config (59-67)
extras/obsidian-injest/search_query.py (1)
  • get_embedding (24-27)
backends/advanced/src/advanced_omi_backend/controllers/system_controller.py (3)
backends/advanced/src/advanced_omi_backend/services/memory/config.py (1)
  • load_config_yml (72-88)
backends/advanced/src/advanced_omi_backend/services/memory/service_factory.py (2)
  • reset_memory_service (127-133)
  • get_memory_service (73-110)
backends/advanced/src/advanced_omi_backend/routers/modules/system_routes.py (4)
  • get_memory_config_raw (77-79)
  • validate_memory_config_raw (82-87)
  • update_memory_config_raw (90-95)
  • reload_memory_config (98-100)
extras/obsidian-injest/search_query.py (1)
backends/advanced/src/advanced_omi_backend/services/obsidian_service.py (1)
  • get_embedding (199-216)
backends/advanced/src/advanced_omi_backend/routers/modules/health_routes.py (1)
backends/advanced/src/advanced_omi_backend/services/memory/config.py (2)
  • load_config_yml (72-88)
  • resolve_value (57-70)
🪛 Ruff (0.14.8)
backends/advanced/src/advanced_omi_backend/services/transcription/__init__.py

62-62: Local variable api_family is assigned to but never used

Remove assignment to unused variable api_family

(F841)

backends/advanced/src/advanced_omi_backend/services/obsidian_service.py

101-101: Avoid specifying long messages outside the exception class

(TRY003)


105-105: Avoid specifying long messages outside the exception class

(TRY003)


109-109: Avoid specifying long messages outside the exception class

(TRY003)


113-113: Avoid specifying long messages outside the exception class

(TRY003)


133-133: Avoid specifying long messages outside the exception class

(TRY003)


210-210: Multiple statements on one line (colon)

(E701)


214-214: Do not catch blind exception: Exception

(BLE001)


215-215: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


329-329: Probable use of insecure hash functions in hashlib: md5

(S324)


362-362: Avoid specifying long messages outside the exception class

(TRY003)


382-382: Use explicit conversion flag

Replace with conversion flag

(RUF010)

backends/advanced/src/advanced_omi_backend/controllers/system_controller.py

448-448: Consider moving this statement to an else block

(TRY300)


450-450: Redundant exception object included in logging.exception call

(TRY401)


451-451: Use raise without specifying exception name

Remove exception name

(TRY201)


458-458: Prefer TypeError exception for invalid type

(TRY004)


458-458: Abstract raise to an inner function

(TRY301)


458-458: Avoid specifying long messages outside the exception class

(TRY003)


464-464: Prefer TypeError exception for invalid type

(TRY004)


464-464: Abstract raise to an inner function

(TRY301)


464-464: Avoid specifying long messages outside the exception class

(TRY003)


465-465: Consider moving this statement to an else block

(TRY300)


467-467: Redundant exception object included in logging.exception call

(TRY401)


468-468: Use raise without specifying exception name

Remove exception name

(TRY201)


489-489: Consider moving this statement to an else block

(TRY300)


492-492: Do not catch blind exception: Exception

(BLE001)


493-493: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


494-494: Use explicit conversion flag

Replace with conversion flag

(RUF010)


514-514: Do not catch blind exception: Exception

(BLE001)


525-525: Do not catch blind exception: Exception

(BLE001)


526-526: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


527-527: Use explicit conversion flag

Replace with conversion flag

(RUF010)


536-536: Consider moving this statement to an else block

(TRY300)


537-537: Do not catch blind exception: Exception

(BLE001)


538-538: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


539-539: Use explicit conversion flag

Replace with conversion flag

(RUF010)

backends/advanced/src/advanced_omi_backend/routers/modules/health_routes.py

100-100: Do not catch blind exception: Exception

(BLE001)


102-102: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


102-102: Avoid specifying long messages outside the exception class

(TRY003)


246-246: Do not catch blind exception: Exception

(BLE001)


248-248: Use explicit conversion flag

Replace with conversion flag

(RUF010)

🔇 Additional comments (16)
extras/obsidian-injest/search_query.py (2)

1-22: LGTM! Clean module structure.

The module docstring clearly explains the CLI testing purpose, and imports are properly organized following the coding guidelines (standard library, third-party, then local imports).


55-104: LGTM! Neo4j driver resource leak fixed.

The function now correctly uses context managers for both the Neo4j driver (line 83) and session (line 84), ensuring they are always properly closed even when exceptions occur. This addresses the resource leak concern from the previous review.

The query logic is well-structured: vector search, graph traversal for tags and links, context formatting, and fallback handling for empty results.

backends/advanced/src/advanced_omi_backend/services/transcription/__init__.py (1)

53-77: Config-driven provider selection looks good.

The logic correctly handles:

  • Loading configuration with fallback to empty dict
  • Resolving provider-specific values via _resolve_value
  • Returning appropriate provider instance (streaming vs batch)
  • Warning and returning None when required values are missing
backends/advanced/src/advanced_omi_backend/routers/modules/health_routes.py (2)

128-130: LGTM - Config-driven display fields properly wired.

The LLM configuration values are now correctly sourced from config.yml rather than environment variables, aligning with the broader migration to config-driven settings across the backend.


235-250: Provider field usage is consistent.

The _llm_provider is correctly used in the audioai service status for all code paths (success, timeout, and error). This provides consistent display of the configured LLM provider in health responses.

backends/advanced/src/advanced_omi_backend/services/obsidian_service.py (11)

1-26: LGTM! Well-structured module with clear documentation.

The module docstring is comprehensive, and imports are properly organized following the coding guidelines (grouped by standard library, third-party, and local imports).


28-39: LGTM! Clear type definitions.

The TypedDict definitions provide good type safety for note data and chunk payloads.


86-155: LGTM! Complex initialization appropriately handles multiple deployment scenarios.

The initialization logic properly handles both containerized (Docker) and local development environments with appropriate fallbacks for configuration loading. The inline get_model_config helper is reasonable given its single use case.


171-179: LGTM! Clean resource management.

The close and reset_driver methods properly manage the Neo4j driver lifecycle.


181-197: LGTM! Proper database schema initialization.

The method correctly creates constraints and vector indexes with appropriate idempotent IF NOT EXISTS clauses.


218-238: LGTM! Straightforward chunking implementation.

The character-based chunking with overlap is implemented correctly and handles edge cases appropriately.


240-285: LGTM! Robust file parsing with encoding fallbacks.

The method handles various encoding issues gracefully and correctly extracts Obsidian metadata including frontmatter, links, and tags.


287-309: LGTM! Resilient chunk processing.

The method appropriately handles embedding failures for individual chunks while continuing to process the rest, with clear logging.


369-384: LGTM! Past review issues have been addressed.

The method now correctly uses logger.exception() to capture full stack traces (line 381), and the unused noqa directive has been removed (line 370). The resilient error handling ensures the ingestion process continues even when individual files fail.


386-407: LGTM! Excellent lazy initialization pattern addressing past concerns.

The lazy initialization with proxy pattern elegantly solves the module-level instantiation issue flagged in previous reviews. This prevents configuration errors from crashing the application at import time while maintaining backward compatibility for callers expecting a module-level obsidian_service instance.


341-344: This code correctly creates placeholder Note nodes for unresolved links.

The MERGE clause with ON CREATE SET is the standard Neo4j pattern for idempotent upserts. When a link references a note that doesn't exist in the vault, this creates a placeholder Note node with a synthesized path, allowing the graph to represent all vault connections. This is intentional behavior, not a problem requiring correction.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backends/advanced/src/advanced_omi_backend/services/transcription/__init__.py (1)

30-47: provider_name parameter is declared but never used.

The function signature accepts provider_name and the docstring documents it as supporting explicit provider selection ('deepgram', 'parakeet'), but the implementation completely ignores this parameter and always relies on config.yml. This could break existing callers.

Either remove the parameter and update the docstring, or implement the explicit provider selection logic.

🔎 Option 1: Remove unused parameter
 def get_transcription_provider(
-    provider_name: Optional[str] = None,
     mode: Optional[str] = None,
 ) -> Optional[BaseTranscriptionProvider]:
     """
     Factory function to get the appropriate transcription provider.

     Args:
-        provider_name: Name of the provider ('deepgram', 'parakeet').
-                      If None, will auto-select based on available configuration.
         mode: Processing mode ('streaming', 'batch'). If None, defaults to 'batch'.
🧹 Nitpick comments (2)
backends/advanced/src/advanced_omi_backend/services/transcription/__init__.py (2)

10-10: Unused os import.

The os module is imported but no longer used since environment variable access is now delegated to _resolve_value.

🔎 Proposed fix
 import logging
-import os
 from typing import Optional

77-79: Use logger.exception() to log the full stack trace.

Based on learnings, prefer logging.exception() inside except blocks to automatically log the full stack trace. The bare raise is correct, but logger.warning loses traceback context.

🔎 Proposed fix
     except Exception as e:
-        logger.warning(f"Failed to load STT provider from config.yml: {e}")
+        logger.exception("Failed to load STT provider from config.yml")
         raise
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bd0a316 and c9fcd1e.

📒 Files selected for processing (1)
  • backends/advanced/src/advanced_omi_backend/services/transcription/__init__.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use Black formatter with 100-character line length for Python code
Use isort for Python import organization
ALL imports must be at the top of the file after the docstring - never import modules in the middle of functions or files
Group imports in Python files: standard library, third-party, then local imports
Use lazy imports sparingly and only when absolutely necessary for circular import issues in Python
Always raise errors in Python, never silently ignore - use explicit error handling with proper exceptions rather than silent failures
Avoid defensive hasattr() checks in Python - research and understand input/response or class structure instead

Files:

  • backends/advanced/src/advanced_omi_backend/services/transcription/__init__.py
🧠 Learnings (1)
📚 Learning: 2025-12-08T23:52:34.959Z
Learnt from: AnkushMalaker
Repo: chronicler-ai/chronicle PR: 178
File: backends/advanced/src/advanced_omi_backend/services/memory/providers/mycelia.py:218-223
Timestamp: 2025-12-08T23:52:34.959Z
Learning: In Python code (chronicle project), prefer logging.exception() inside except blocks to automatically log the full stack trace. When re-raising exceptions, always chain with 'raise ... from e' to preserve the original context; use 'raise ... from None' only if you explicitly want to suppress the context. This improves debuggability across Python files.

Applied to files:

  • backends/advanced/src/advanced_omi_backend/services/transcription/__init__.py
🧬 Code graph analysis (1)
backends/advanced/src/advanced_omi_backend/services/transcription/__init__.py (1)
backends/advanced/src/advanced_omi_backend/services/memory/config.py (2)
  • load_config_yml (72-88)
  • resolve_value (57-70)
🔇 Additional comments (1)
backends/advanced/src/advanced_omi_backend/services/transcription/__init__.py (1)

86-93: LGTM!

The __all__ exports are consistent and maintain the expected public API surface.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (2)
backends/advanced/src/advanced_omi_backend/services/transcription/__init__.py (1)

57-83: Misleading warning when no STT is configured or model not found.

The warning on line 82 runs in two different scenarios with different meanings:

  1. When stt_name is None (no STT default configured) — logs "STT default 'None' configured..."
  2. When stt_name exists but no matching model is found in models list

These are distinct cases requiring different messages. Also, when stt_name is None, returning None silently (or with a debug log) may be more appropriate than a warning.

🔎 Proposed fix
         stt_name = defaults.get("stt")
         if stt_name:
             stt_def = next((m for m in models if m.get("name") == stt_name), None)
+            if stt_def is None:
+                logger.warning(f"STT default '{stt_name}' configured but no matching model definition found")
+                return None
             if stt_def:
                 provider = (stt_def.get("model_provider") or "").lower()
                 if provider == "deepgram":
                     api_key = str(_resolve_value(stt_def.get("api_key", "")))
                     if not api_key:
                         logger.warning("Deepgram selected in config.yml but api_key missing or empty")
                         return None
                     return DeepgramStreamingProvider(api_key) if mode == "streaming" else DeepgramProvider(api_key)
                 elif provider == "parakeet":
                     service_url = str(_resolve_value(stt_def.get("model_url", "")))
                     if not service_url:
                         logger.warning("Parakeet selected in config.yml but model_url missing or empty")
                         return None
                     return ParakeetStreamingProvider(service_url) if mode == "streaming" else ParakeetProvider(service_url)
                 else:
                     logger.warning(f"Unsupported STT provider in config.yml: {provider}")
                     return None
     except Exception as e:
         logger.warning(f"Failed to load STT provider from config.yml: {e}")
         raise
 
-    # If no config present, no provider configured
-    logger.warning(f"STT default '{stt_name}' configured but no matching model definition found")
+    # No STT default configured in config.yml
+    logger.debug("No STT default configured in config.yml")
     return None
backends/advanced/src/advanced_omi_backend/services/obsidian_service.py (1)

207-215: Use logging.exception() and fix multi-statement line.

Line 209 has multiple statements on one line. Per project learnings, use logging.exception() inside except blocks to automatically capture the full stack trace. Also, use % formatting with logger methods instead of f-strings.

🔎 Proposed fix
     def get_embedding(self, text: str) -> Optional[List[float]]:
         ...
         try:
             clean_text = re.sub(r'\s+', ' ', text).strip()
-            if not clean_text: return None
+            if not clean_text:
+                return None
             
             response = self.client.embeddings.create(input=[clean_text], model=self.embedding_model)
             return response.data[0].embedding
         except Exception as e:
-            logger.error(f"Embedding failed for text length {len(text)}: {e}")
+            logger.exception("Embedding failed for text length %d", len(text))
             return None

Based on learnings, use logging.exception() to capture full stack traces in except blocks.

🧹 Nitpick comments (6)
extras/obsidian-injest/search_query.py (2)

36-38: Prefer logging.exception with bare raise for better exception context.

The current pattern uses logger.error followed by raise e. Per Python best practices and learnings, use logging.exception (which automatically logs the full stack trace) with bare raise (which preserves the exception context).

🔎 Proposed refactor
     except Exception as e:
-        logger.error(f"Failed to generate embedding: {e}")
-        raise e
+        logger.exception("Failed to generate embedding")
+        raise

Based on learnings, prefer logging.exception() inside except blocks to automatically log the full stack trace, and use bare raise to preserve the original exception context.


74-76: Prefer logging.exception with bare raise for better exception context.

Same pattern as in get_embedding: use logging.exception with bare raise for better exception handling.

🔎 Proposed refactor
     except Exception as e:
-        logger.error(f"Failed to generate answer: {e}")
-        raise e
+        logger.exception("Failed to generate answer")
+        raise

Based on learnings, prefer logging.exception() inside except blocks to automatically log the full stack trace, and use bare raise to preserve the original exception context.

backends/advanced/src/advanced_omi_backend/services/transcription/__init__.py (2)

10-10: Unused os import.

The os module is no longer used in this file after the migration to config-driven selection. The resolve_value utility handles environment variable resolution internally.

🔎 Proposed fix
 import logging
-import os
 from typing import Optional

77-79: Use logger.exception() to capture full stack trace.

The current logger.warning() only logs the exception message, losing the full traceback. Based on learnings, prefer logging.exception() inside except blocks to automatically capture the complete stack trace for debugging.

🔎 Proposed fix
     except Exception as e:
-        logger.warning(f"Failed to load STT provider from config.yml: {e}")
+        logger.exception("Failed to load STT provider from config.yml")
         raise
backends/advanced/src/advanced_omi_backend/services/obsidian_service.py (2)

41-66: Consider using python-dotenv for .env parsing.

The manual .env parser reimplements functionality available in the widely-used python-dotenv library, which handles edge cases like multi-line values, variable expansion, and various quote styles more robustly.

🔎 Alternative using python-dotenv
+from dotenv import dotenv_values
+
-def load_env_file(filepath: Path) -> dict[str, str]:
-    """Load environment variables from a .env file.
-    
-    Args:
-        filepath: Path to the .env file to load.
-    
-    Returns:
-        Dictionary of key-value pairs from the .env file.
-    """
-    env_vars = {}
-    if filepath.exists():
-        with open(filepath, 'r') as f:
-            for line in f:
-                line = line.strip()
-                if not line or line.startswith('#'):
-                    continue
-                if '=' in line:
-                    parts = line.split('=', 1)
-                    key = parts[0].strip()
-                    value = parts[1].strip() if len(parts) > 1 else ""
-                    # Handle quotes
-                    if (value.startswith("'") and value.endswith("'")) or (value.startswith('"') and value.endswith('"')):
-                        value = value[1:-1]
-                    env_vars[key] = value
-    return env_vars
+def load_env_file(filepath: Path) -> dict[str, str]:
+    """Load environment variables from a .env file."""
+    return dict(dotenv_values(filepath)) if filepath.exists() else {}

380-381: Use % formatting with logger and remove redundant str() conversion.

Line 380 uses an f-string with logger.exception, but Python logging best practices recommend % formatting. Line 381 has a redundant str(e) conversion since exceptions are already stringified when interpolated.

🔎 Proposed fix
                     except Exception as e:
-                        logger.exception(f"Processing {file} failed")
-                        errors.append(f"{file}: {str(e)}")
+                        logger.exception("Processing %s failed", file)
+                        errors.append(f"{file}: {e}")
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c9fcd1e and cb3c725.

📒 Files selected for processing (4)
  • backends/advanced/src/advanced_omi_backend/routers/modules/health_routes.py (5 hunks)
  • backends/advanced/src/advanced_omi_backend/services/obsidian_service.py (1 hunks)
  • backends/advanced/src/advanced_omi_backend/services/transcription/__init__.py (2 hunks)
  • extras/obsidian-injest/search_query.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use Black formatter with 100-character line length for Python code
Use isort for Python import organization
ALL imports must be at the top of the file after the docstring - never import modules in the middle of functions or files
Group imports in Python files: standard library, third-party, then local imports
Use lazy imports sparingly and only when absolutely necessary for circular import issues in Python
Always raise errors in Python, never silently ignore - use explicit error handling with proper exceptions rather than silent failures
Avoid defensive hasattr() checks in Python - research and understand input/response or class structure instead

Files:

  • backends/advanced/src/advanced_omi_backend/services/transcription/__init__.py
  • backends/advanced/src/advanced_omi_backend/services/obsidian_service.py
  • extras/obsidian-injest/search_query.py
  • backends/advanced/src/advanced_omi_backend/routers/modules/health_routes.py
🧠 Learnings (1)
📚 Learning: 2025-12-08T23:52:34.959Z
Learnt from: AnkushMalaker
Repo: chronicler-ai/chronicle PR: 178
File: backends/advanced/src/advanced_omi_backend/services/memory/providers/mycelia.py:218-223
Timestamp: 2025-12-08T23:52:34.959Z
Learning: In Python code (chronicle project), prefer logging.exception() inside except blocks to automatically log the full stack trace. When re-raising exceptions, always chain with 'raise ... from e' to preserve the original context; use 'raise ... from None' only if you explicitly want to suppress the context. This improves debuggability across Python files.

Applied to files:

  • backends/advanced/src/advanced_omi_backend/services/transcription/__init__.py
  • backends/advanced/src/advanced_omi_backend/services/obsidian_service.py
  • extras/obsidian-injest/search_query.py
  • backends/advanced/src/advanced_omi_backend/routers/modules/health_routes.py
🧬 Code graph analysis (3)
backends/advanced/src/advanced_omi_backend/services/transcription/__init__.py (1)
backends/advanced/src/advanced_omi_backend/services/memory/config.py (2)
  • load_config_yml (72-88)
  • resolve_value (57-70)
extras/obsidian-injest/search_query.py (1)
backends/advanced/src/advanced_omi_backend/services/obsidian_service.py (1)
  • get_embedding (198-215)
backends/advanced/src/advanced_omi_backend/routers/modules/health_routes.py (1)
backends/advanced/src/advanced_omi_backend/services/memory/config.py (2)
  • load_config_yml (72-88)
  • resolve_value (57-70)
🪛 Ruff (0.14.8)
backends/advanced/src/advanced_omi_backend/services/obsidian_service.py

101-101: Avoid specifying long messages outside the exception class

(TRY003)


105-105: Avoid specifying long messages outside the exception class

(TRY003)


109-109: Avoid specifying long messages outside the exception class

(TRY003)


113-113: Avoid specifying long messages outside the exception class

(TRY003)


133-133: Avoid specifying long messages outside the exception class

(TRY003)


209-209: Multiple statements on one line (colon)

(E701)


213-213: Do not catch blind exception: Exception

(BLE001)


214-214: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


328-328: Probable use of insecure hash functions in hashlib: md5

(S324)


361-361: Avoid specifying long messages outside the exception class

(TRY003)


381-381: Use explicit conversion flag

Replace with conversion flag

(RUF010)

extras/obsidian-injest/search_query.py

37-37: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


38-38: Use raise without specifying exception name

Remove exception name

(TRY201)


75-75: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


76-76: Use raise without specifying exception name

Remove exception name

(TRY201)

backends/advanced/src/advanced_omi_backend/routers/modules/health_routes.py

101-101: Use raise without specifying exception name

Remove exception name

(TRY201)


245-245: Do not catch blind exception: Exception

(BLE001)


247-247: Use explicit conversion flag

Replace with conversion flag

(RUF010)

🔇 Additional comments (5)
extras/obsidian-injest/search_query.py (1)

105-117: Excellent use of nested context managers!

The Neo4j driver and session are now properly managed using context managers, ensuring resources are always released even if exceptions occur. This addresses the resource leak concern from the previous review.

backends/advanced/src/advanced_omi_backend/routers/modules/health_routes.py (4)

23-24: LGTM!

The imports correctly reference the config loading utilities from the memory.config module, following the underscore naming convention for internal usage.


88-98: LGTM!

The config loading logic properly handles missing sections and uses defensive programming with .get() and fallback defaults. The use of resolve_value correctly supports environment variable substitution in config values.


119-119: LGTM!

The config-derived values are correctly used for health response display fields. The fallback to "not set" for transcription provider is appropriate when config doesn't specify it.

Also applies to: 127-129


234-234: LGTM!

The _llm_provider field is consistently added to all audioai service health responses, providing useful configuration visibility for debugging.

Also applies to: 241-241, 249-249

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backends/advanced/src/advanced_omi_backend/chat_service.py (1)

284-296: Replace silent exception handling with explicit error propagation.

The return type List[MemoryEntry] correctly matches the memory_service.search_memories return type. However, the exception handling violates coding guidelines by silently catching all exceptions and returning an empty list (lines 294-296). Instead of masking errors with return [], propagate the exception or raise an explicit error. This allows callers to handle failures properly rather than receiving empty data that could be mistaken for a successful but empty result.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cb3c725 and f09c8ff.

📒 Files selected for processing (1)
  • backends/advanced/src/advanced_omi_backend/chat_service.py (4 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use Black formatter with 100-character line length for Python code
Use isort for Python import organization
ALL imports must be at the top of the file after the docstring - never import modules in the middle of functions or files
Group imports in Python files: standard library, third-party, then local imports
Use lazy imports sparingly and only when absolutely necessary for circular import issues in Python
Always raise errors in Python, never silently ignore - use explicit error handling with proper exceptions rather than silent failures
Avoid defensive hasattr() checks in Python - research and understand input/response or class structure instead

Files:

  • backends/advanced/src/advanced_omi_backend/chat_service.py
🧠 Learnings (1)
📚 Learning: 2025-12-08T23:52:34.959Z
Learnt from: AnkushMalaker
Repo: chronicler-ai/chronicle PR: 178
File: backends/advanced/src/advanced_omi_backend/services/memory/providers/mycelia.py:218-223
Timestamp: 2025-12-08T23:52:34.959Z
Learning: In Python code (chronicle project), prefer logging.exception() inside except blocks to automatically log the full stack trace. When re-raising exceptions, always chain with 'raise ... from e' to preserve the original context; use 'raise ... from None' only if you explicitly want to suppress the context. This improves debuggability across Python files.

Applied to files:

  • backends/advanced/src/advanced_omi_backend/chat_service.py
🧬 Code graph analysis (1)
backends/advanced/src/advanced_omi_backend/chat_service.py (2)
backends/advanced/src/advanced_omi_backend/services/memory/base.py (1)
  • MemoryEntry (26-63)
backends/advanced/src/advanced_omi_backend/models/user.py (1)
  • user_id (71-73)
🔇 Additional comments (3)
backends/advanced/src/advanced_omi_backend/chat_service.py (3)

26-26: LGTM! Import properly placed.

The MemoryEntry import is correctly positioned within the local imports section and is necessary for the type-safe memory handling refactoring.


307-307: LGTM! Correct attribute access for MemoryEntry.

The change from dictionary access to attribute access is correct for the MemoryEntry dataclass, and the defensive check for memory.id appropriately filters out any invalid IDs.


316-318: LGTM! Simplified content access with MemoryEntry.

The migration from memory.get("memory") to memory.content correctly uses the MemoryEntry dataclass attribute and simplifies the code by eliminating fallback dictionary key checks.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
backends/advanced/src/advanced_omi_backend/services/obsidian_service.py (2)

198-215: Previous review comment still applies.

This code segment was already reviewed with suggestions to split the multi-statement line (line 209) and use logging.exception() instead of logging.error() (line 214) for better stack trace logging.


385-397: Previous review comment about thread safety still applies.

The singleton initialization lacks thread synchronization. The check-then-set pattern on lines 395-396 has a race condition where multiple threads could create separate service instances concurrently, leading to multiple Neo4j driver connections and inconsistent state.

Based on the previous review, consider adding a threading lock with double-check pattern to ensure thread-safe initialization.

🧹 Nitpick comments (4)
backends/advanced/src/advanced_omi_backend/services/obsidian_service.py (4)

41-65: Consider importing from shared module to eliminate duplication.

This load_env_file function duplicates the implementation in extras/obsidian-injest/config.py (lines 15-30). Consider importing from the shared module instead to follow the DRY principle and reduce maintenance overhead.

🔎 Proposed refactor

If extras/obsidian-injest/config.py is accessible, import the function:

 from neo4j import GraphDatabase, Driver
 from advanced_omi_backend.services.memory.providers.llm_providers import _get_openai_client
 from advanced_omi_backend.services.memory.config import load_config_yml as load_root_config
+# Import shared utilities
+import sys
+from pathlib import Path
+sys.path.insert(0, str(Path(__file__).parent.parent.parent.parent.parent / "extras" / "obsidian-injest"))
+from config import load_env_file, resolve_value

 logger = logging.getLogger(__name__)
 
-def load_env_file(filepath: Path) -> dict[str, str]:
-    """Load environment variables from a .env file.
-    
-    Args:
-        filepath: Path to the .env file to load.
-    
-    Returns:
-        Dictionary of key-value pairs from the .env file.
-    """
-    env_vars = {}
-    if filepath.exists():
-        with open(filepath, 'r') as f:
-            for line in f:
-                line = line.strip()
-                if not line or line.startswith('#'):
-                    continue
-                if '=' in line:
-                    parts = line.split('=', 1)
-                    key = parts[0].strip()
-                    value = parts[1].strip() if len(parts) > 1 else ""
-                    # Handle quotes
-                    if (value.startswith("'") and value.endswith("'")) or (value.startswith('"') and value.endswith('"')):
-                        value = value[1:-1]
-                    env_vars[key] = value
-    return env_vars
-
-def resolve_value(value: str | int | float) -> str | int | float:
-    """Resolve environment variable references in configuration values.
-    
-    Supports ${VAR} and ${VAR:-default} syntax. Returns the original value
-    if it's not a string or doesn't match the pattern.
-    """
-    if isinstance(value, str) and value.startswith("${") and value.endswith("}"):
-        content = value[2:-1]
-        if ":-" in content:
-            var_name, default_val = content.split(":-", 1)
-            # Try to get from real environment first, then default
-            return os.getenv(var_name, default_val)
-        else:
-            return os.getenv(content, "")
-    return value

Alternatively, move these utilities to a shared package within advanced_omi_backend that both modules can import.


146-147: Consider making chunking parameters configurable.

The chunk_char_limit and chunk_overlap are hardcoded. Consider loading these from config.yml to allow tuning without code changes, similar to how embedding and model configurations are handled.

🔎 Proposed enhancement

Add chunking configuration to config.yml and load it during initialization:

         self.openai_api_key = str(resolve_value(llm_config['api_key']))

         # Chunking - can have defaults
-        self.chunk_char_limit = 1000
-        self.chunk_overlap = 200
+        chunking_config = config_data.get('chunking', {})
+        self.chunk_char_limit = int(chunking_config.get('char_limit', 1000))
+        self.chunk_overlap = int(chunking_config.get('overlap', 200))

296-298: Consider lazy formatting for logging.

Line 297 uses an f-string in the log statement. For consistency with Python logging best practices, consider using lazy formatting with % style:

-        logger.info(
-            f"Processing: {note_data['path']} ({len(note_data['content'])} chars -> {len(text_chunks)} chunks)"
-        )
+        logger.info(
+            "Processing: %s (%d chars -> %d chunks)",
+            note_data['path'], len(note_data['content']), len(text_chunks)
+        )

368-383: Main issue addressed; consider lazy formatting.

The switch from logger.error to logger.exception on line 380 properly captures stack traces (addressing the previous review). As a minor refinement, consider using lazy formatting instead of an f-string:

                     except Exception as e:
-                        logger.exception(f"Processing {file} failed")
+                        logger.exception("Processing %s failed", file)
                         errors.append(f"{file}: {str(e)}")
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f09c8ff and 9b37c03.

📒 Files selected for processing (1)
  • backends/advanced/src/advanced_omi_backend/services/obsidian_service.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use Black formatter with 100-character line length for Python code
Use isort for Python import organization
ALL imports must be at the top of the file after the docstring - never import modules in the middle of functions or files
Group imports in Python files: standard library, third-party, then local imports
Use lazy imports sparingly and only when absolutely necessary for circular import issues in Python
Always raise errors in Python, never silently ignore - use explicit error handling with proper exceptions rather than silent failures
Avoid defensive hasattr() checks in Python - research and understand input/response or class structure instead

Files:

  • backends/advanced/src/advanced_omi_backend/services/obsidian_service.py
🧠 Learnings (1)
📚 Learning: 2025-12-08T23:52:34.959Z
Learnt from: AnkushMalaker
Repo: chronicler-ai/chronicle PR: 178
File: backends/advanced/src/advanced_omi_backend/services/memory/providers/mycelia.py:218-223
Timestamp: 2025-12-08T23:52:34.959Z
Learning: In Python code (chronicle project), prefer logging.exception() inside except blocks to automatically log the full stack trace. When re-raising exceptions, always chain with 'raise ... from e' to preserve the original context; use 'raise ... from None' only if you explicitly want to suppress the context. This improves debuggability across Python files.

Applied to files:

  • backends/advanced/src/advanced_omi_backend/services/obsidian_service.py
🧬 Code graph analysis (1)
backends/advanced/src/advanced_omi_backend/services/obsidian_service.py (4)
backends/advanced/src/advanced_omi_backend/services/memory/providers/llm_providers.py (1)
  • _get_openai_client (45-69)
backends/advanced/src/advanced_omi_backend/services/memory/config.py (2)
  • load_config_yml (72-88)
  • resolve_value (57-70)
extras/obsidian-injest/config.py (2)
  • load_env_file (16-31)
  • resolve_value (34-42)
extras/obsidian-injest/search_query.py (1)
  • get_embedding (24-38)
🪛 Ruff (0.14.8)
backends/advanced/src/advanced_omi_backend/services/obsidian_service.py

101-101: Avoid specifying long messages outside the exception class

(TRY003)


105-105: Avoid specifying long messages outside the exception class

(TRY003)


109-109: Avoid specifying long messages outside the exception class

(TRY003)


113-113: Avoid specifying long messages outside the exception class

(TRY003)


133-133: Avoid specifying long messages outside the exception class

(TRY003)


209-209: Multiple statements on one line (colon)

(E701)


213-213: Do not catch blind exception: Exception

(BLE001)


214-214: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


328-328: Probable use of insecure hash functions in hashlib: md5

(S324)


361-361: Avoid specifying long messages outside the exception class

(TRY003)


381-381: Use explicit conversion flag

Replace with conversion flag

(RUF010)

🔇 Additional comments (10)
backends/advanced/src/advanced_omi_backend/services/obsidian_service.py (10)

1-26: LGTM!

The module docstring is comprehensive, and imports are properly organized following the coding guidelines (standard library, third-party, then local imports, all at the top of the file).


28-39: LGTM!

The TypedDict definitions provide clear type hints for note data and chunk payloads.


118-122: Comment clarity improved with TODO note.

The updated comment on line 120 now includes a TODO acknowledging the cleanup needed after Kubernetes migration, which addresses the previous concern about the misleading comment. The fallback path structure is now clearer.


157-168: LGTM!

The driver initialization correctly passes authentication credentials. The previously flagged misleading comment has been removed.


170-178: LGTM!

Resource cleanup is properly implemented with driver closure and reset functionality.


180-196: Verify the necessity of reset_driver() call.

Line 183 calls reset_driver() to "ensure we use current credentials," but credentials are set during __init__ and don't appear to change. This closes and recreates the driver connection unnecessarily on every setup_database() call. Consider removing this unless there's a specific use case where credentials change between initialization and setup.

If credentials can indeed change, please clarify the scenario; otherwise, consider simplifying:

     def setup_database(self) -> None:
         """Create database constraints and vector index for notes and chunks."""
-        # Reset driver to ensure we use current credentials
-        self.reset_driver()
         driver = self._get_driver()

217-237: LGTM!

The character-based chunking logic correctly implements overlapping windows and handles edge cases appropriately.


239-284: LGTM!

The parsing logic is robust with proper encoding fallbacks, frontmatter extraction, and regex patterns for links and tags. The Windows path fallback is a thoughtful addition.


310-343: LGTM!

The Neo4j ingestion logic properly handles notes, chunks, tags, and links with appropriate MERGE operations. The MD5 usage on line 328 is acceptable for generating chunk IDs (non-cryptographic purpose). The forward reference handling for links (lines 341-342) correctly creates placeholder nodes that will be populated when the target note is processed.


399-406: LGTM - clever lazy initialization pattern!

The _ObsidianServiceProxy provides backward-compatible module-level access while enabling lazy initialization. This addresses the earlier concern about module-level instantiation causing startup failures.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
backends/advanced/src/advanced_omi_backend/services/obsidian_service.py (2)

207-215: Use logging.exception() and split multi-statement line.

Per project learnings and static analysis, use logging.exception() inside except blocks and avoid multiple statements on one line.

🔎 Proposed fix
     def get_embedding(self, text: str) -> Optional[List[float]]:
         ...
         try:
             clean_text = re.sub(r'\s+', ' ', text).strip()
-            if not clean_text: return None
+            if not clean_text:
+                return None
             
             response = self.client.embeddings.create(input=[clean_text], model=self.embedding_model)
             return response.data[0].embedding
         except Exception as e:
-            logger.error(f"Embedding failed for text length {len(text)}: {e}")
+            logger.exception("Embedding failed for text length %d", len(text))
             return None

444-453: Singleton initialization is not thread-safe.

The check-then-set pattern has a race condition: multiple threads could pass the if _obsidian_service is None check simultaneously and create multiple instances, leading to resource leaks with multiple Neo4j driver connections.

🔎 Proposed fix using threading.Lock
+import threading
+
 # Lazy initialization to avoid startup failures
 _obsidian_service: Optional[ObsidianService] = None
+_service_lock = threading.Lock()

 def get_obsidian_service() -> ObsidianService:
     """Get or create the Obsidian service singleton.
     
     Returns:
         ObsidianService instance, creating it on first access.
     """
     global _obsidian_service
     if _obsidian_service is None:
-        _obsidian_service = ObsidianService()
+        with _service_lock:
+            # Double-check pattern after acquiring lock
+            if _obsidian_service is None:
+                _obsidian_service = ObsidianService()
     return _obsidian_service
🧹 Nitpick comments (3)
backends/advanced/webui/src/pages/Chat.tsx (1)

519-531: Consider disabling the checkbox while sending.

The checkbox remains interactive during message sending. While functionally safe (the state is captured at send time), it could confuse users if they toggle it mid-stream and expect the change to affect the current response.

🔎 Proposed fix
               <input
                 type="checkbox"
                 id="includeObsidian"
                 checked={includeObsidian}
                 onChange={(e) => setIncludeObsidian(e.target.checked)}
-                className="h-4 w-4 text-blue-600 rounded border-gray-300 focus:ring-blue-500"
+                className="h-4 w-4 text-blue-600 rounded border-gray-300 focus:ring-blue-500 disabled:opacity-50"
+                disabled={isSending}
               />
backends/advanced/tests/test_obsidian_service.py (1)

3-6: Consider using pytest with proper test discovery instead of sys.path manipulation.

The sys.path.append pattern is a common workaround but can lead to import issues and makes the test less portable. Consider configuring pytest with proper package discovery or using pip install -e . for development installs.

backends/advanced/src/advanced_omi_backend/services/obsidian_service.py (1)

436-438: Use logging.exception() for better stack traces.

Per project learnings, use logging.exception() inside except blocks to automatically capture the full stack trace.

🔎 Proposed fix
         except Exception as e:
-            logger.error(f"Obsidian search failed: {e}")
+            logger.exception("Obsidian search failed")

Based on learnings from previous reviews.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9b37c03 and ecfa5c3.

📒 Files selected for processing (6)
  • backends/advanced/src/advanced_omi_backend/chat_service.py (6 hunks)
  • backends/advanced/src/advanced_omi_backend/routers/modules/chat_routes.py (2 hunks)
  • backends/advanced/src/advanced_omi_backend/services/obsidian_service.py (1 hunks)
  • backends/advanced/tests/test_obsidian_service.py (1 hunks)
  • backends/advanced/webui/src/pages/Chat.tsx (3 hunks)
  • backends/advanced/webui/src/services/api.ts (3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use Black formatter with 100-character line length for Python code
Use isort for Python import organization
ALL imports must be at the top of the file after the docstring - never import modules in the middle of functions or files
Group imports in Python files: standard library, third-party, then local imports
Use lazy imports sparingly and only when absolutely necessary for circular import issues in Python
Always raise errors in Python, never silently ignore - use explicit error handling with proper exceptions rather than silent failures
Avoid defensive hasattr() checks in Python - research and understand input/response or class structure instead

Files:

  • backends/advanced/src/advanced_omi_backend/services/obsidian_service.py
  • backends/advanced/src/advanced_omi_backend/routers/modules/chat_routes.py
  • backends/advanced/src/advanced_omi_backend/chat_service.py
  • backends/advanced/tests/test_obsidian_service.py
🧠 Learnings (1)
📚 Learning: 2025-12-08T23:52:34.959Z
Learnt from: AnkushMalaker
Repo: chronicler-ai/chronicle PR: 178
File: backends/advanced/src/advanced_omi_backend/services/memory/providers/mycelia.py:218-223
Timestamp: 2025-12-08T23:52:34.959Z
Learning: In Python code (chronicle project), prefer logging.exception() inside except blocks to automatically log the full stack trace. When re-raising exceptions, always chain with 'raise ... from e' to preserve the original context; use 'raise ... from None' only if you explicitly want to suppress the context. This improves debuggability across Python files.

Applied to files:

  • backends/advanced/src/advanced_omi_backend/services/obsidian_service.py
  • backends/advanced/src/advanced_omi_backend/routers/modules/chat_routes.py
  • backends/advanced/src/advanced_omi_backend/chat_service.py
  • backends/advanced/tests/test_obsidian_service.py
🧬 Code graph analysis (2)
backends/advanced/src/advanced_omi_backend/chat_service.py (2)
backends/advanced/src/advanced_omi_backend/services/memory/base.py (1)
  • MemoryEntry (26-63)
backends/advanced/src/advanced_omi_backend/services/obsidian_service.py (2)
  • get_obsidian_service (444-453)
  • search_obsidian (385-439)
backends/advanced/tests/test_obsidian_service.py (1)
backends/advanced/src/advanced_omi_backend/services/obsidian_service.py (4)
  • search_obsidian (385-439)
  • setup_database (180-196)
  • character_chunker (217-237)
  • ingest_note_and_chunks (310-343)
🪛 Ruff (0.14.8)
backends/advanced/src/advanced_omi_backend/services/obsidian_service.py

101-101: Avoid specifying long messages outside the exception class

(TRY003)


105-105: Avoid specifying long messages outside the exception class

(TRY003)


109-109: Avoid specifying long messages outside the exception class

(TRY003)


113-113: Avoid specifying long messages outside the exception class

(TRY003)


133-133: Avoid specifying long messages outside the exception class

(TRY003)


209-209: Multiple statements on one line (colon)

(E701)


213-213: Do not catch blind exception: Exception

(BLE001)


214-214: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


328-328: Probable use of insecure hash functions in hashlib: md5

(S324)


361-361: Avoid specifying long messages outside the exception class

(TRY003)


381-381: Use explicit conversion flag

Replace with conversion flag

(RUF010)


436-436: Do not catch blind exception: Exception

(BLE001)


437-437: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

backends/advanced/src/advanced_omi_backend/chat_service.py

334-334: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


335-335: Use raise without specifying exception name

Remove exception name

(TRY201)

🔇 Additional comments (13)
backends/advanced/src/advanced_omi_backend/routers/modules/chat_routes.py (2)

30-34: Clean addition of the Obsidian memory flag.

The include_obsidian_memory field is well-designed with a sensible False default, ensuring backward compatibility. The field description clearly communicates its purpose.


310-315: LGTM - flag properly propagated to service layer.

The include_obsidian_memory flag is correctly passed to generate_response_stream, maintaining the request's intent through the service call.

backends/advanced/webui/src/pages/Chat.tsx (1)

47-47: LGTM - state initialization is clean.

The includeObsidian state with false default aligns with the backend's default behavior.

backends/advanced/tests/test_obsidian_service.py (3)

12-51: Well-structured test setup with comprehensive mocking.

The setUp method properly mocks all external dependencies (config loading, OpenAI client, Neo4j driver, environment variables) and uses addCleanup for reliable teardown. The mock session context manager is correctly configured.


53-99: Comprehensive test for the search flow.

The test properly validates the embedding call, Neo4j query execution, and result formatting. The assertions check for proper parameter passing and output structure.


155-188: Good failure and edge case coverage.

The tests for embedding failure, database failure, and empty results ensure the service degrades gracefully and returns empty lists instead of raising exceptions to callers.

backends/advanced/webui/src/services/api.ts (2)

230-247: Clean API namespace for Obsidian operations.

The obsidianApi namespace is well-structured with appropriate methods for the full ingestion lifecycle. The 5-minute timeout for zip uploads is reasonable for large vaults.


271-278: LGTM - conditional inclusion handles the flag correctly.

The flag is only added when truthy, which aligns with the backend's default of False. The snake_case key matches the Pydantic model.

backends/advanced/src/advanced_omi_backend/chat_service.py (2)

26-27: Good type imports for better type safety.

Importing MemoryEntry and get_obsidian_service at the top of the file follows the coding guidelines and improves type annotations throughout the service.


285-320: Clean migration to typed MemoryEntry.

The refactoring from dictionary access (memory.get("id"), memory.get("memory")) to typed attribute access (memory.id, memory.content) improves type safety and aligns with the MemoryEntry dataclass.

backends/advanced/src/advanced_omi_backend/services/obsidian_service.py (3)

28-81: Clean utility functions and type definitions.

The NoteData and ChunkPayload TypedDicts provide clear contracts for data structures. The load_env_file and resolve_value utilities handle edge cases well (quotes, comments, default values).


455-462: Proxy pattern enables lazy module-level access.

The _ObsidianServiceProxy class provides backward-compatible module-level access (obsidian_service.method()) while maintaining lazy initialization. This is a clean solution to avoid eager instantiation at import time.


345-383: Vault ingestion with proper error handling.

The ingest_vault method correctly uses logger.exception for failed note processing and continues processing other notes, collecting errors for the final report. This provides good resilience for large vault ingestion.

Comment on lines 322 to 335
# Add Obsidian context if requested
if include_obsidian_memory:
try:
obsidian_service = get_obsidian_service()
obsidian_context = obsidian_service.search_obsidian(current_message)
if obsidian_context:
context_parts.append("# Relevant Obsidian Notes:")
for entry in obsidian_context:
context_parts.append(entry)
context_parts.append("")
logger.info(f"Added {len(obsidian_context)} Obsidian notes to context")
except Exception as e:
logger.error(f"Failed to get Obsidian context: {e}")
raise e
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Use logging.exception() and bare raise for better stack traces.

Per project learnings and static analysis hints, use logging.exception() inside except blocks to automatically capture the full stack trace, and use bare raise to preserve the original exception context.

🔎 Proposed fix
         # Add Obsidian context if requested
         if include_obsidian_memory:
             try:
                 obsidian_service = get_obsidian_service()
                 obsidian_context = obsidian_service.search_obsidian(current_message)
                 if obsidian_context:
                     context_parts.append("# Relevant Obsidian Notes:")
                     for entry in obsidian_context:
                         context_parts.append(entry)
                     context_parts.append("")
                     logger.info(f"Added {len(obsidian_context)} Obsidian notes to context")
             except Exception as e:
-                logger.error(f"Failed to get Obsidian context: {e}")
-                raise e
+                logger.exception("Failed to get Obsidian context")
+                raise

Based on learnings from previous reviews in this project.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Add Obsidian context if requested
if include_obsidian_memory:
try:
obsidian_service = get_obsidian_service()
obsidian_context = obsidian_service.search_obsidian(current_message)
if obsidian_context:
context_parts.append("# Relevant Obsidian Notes:")
for entry in obsidian_context:
context_parts.append(entry)
context_parts.append("")
logger.info(f"Added {len(obsidian_context)} Obsidian notes to context")
except Exception as e:
logger.error(f"Failed to get Obsidian context: {e}")
raise e
# Add Obsidian context if requested
if include_obsidian_memory:
try:
obsidian_service = get_obsidian_service()
obsidian_context = obsidian_service.search_obsidian(current_message)
if obsidian_context:
context_parts.append("# Relevant Obsidian Notes:")
for entry in obsidian_context:
context_parts.append(entry)
context_parts.append("")
logger.info(f"Added {len(obsidian_context)} Obsidian notes to context")
except Exception as e:
logger.exception("Failed to get Obsidian context")
raise
🧰 Tools
🪛 Ruff (0.14.8)

334-334: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


335-335: Use raise without specifying exception name

Remove exception name

(TRY201)

🤖 Prompt for AI Agents
In backends/advanced/src/advanced_omi_backend/chat_service.py around lines 322
to 335, the except block logs the error with logger.error and re-raises using
raise e which loses the original traceback; replace logger.error(f"Failed to get
Obsidian context: {e}") with logger.exception("Failed to get Obsidian context")
to capture the full stack trace, and change raise e to a bare raise to preserve
the original exception context.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single config has memory and llm models

Copy link
Contributor

@thestumonkey thestumonkey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR has a lot of duplication instead of using standard functions we have already.

1: jobs. all jobs should be processed by queues and status shown on the queue management page using queue controller

2: upload (small issue): we should leverage existing upload functions we have to upload gdrive or audio to extend to upload zip.

3: service and processing. We currently extract memories from conversations, pass this to openmemory which uses neo4j as graph memory for storing inference and memories. The obsidian service is writing direct to neo4j, which creates multiple memory banks for chat to leverage instead of a single one.

4: there are some config changes here and use of llm models. this should be linked into the existing llm provider class

We should use openmemory to process the obsidian files directly, and use a type to send to it. (conversation or note).

Otherwise, if we don't want to do this, we should connect to neo4j via a proper client which isn't part of obsidian service.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it should live in utils/job_utils or queue_controller (which should maybe be job controller). this seems like an interface to redis and only the job prefix has anything to do with obsidian service

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we also have a job base class which is used for processing jobs on redis using RQ. Please use this for processing jobs, as this is handled natively with RQ.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also functions in queue_routes that should provide what you need. This will then hook it into the queue management page so we can see progress.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can see, this service goes through the markdown files creates an LLM and writes directly to NEO4J.

This can be achieved using the current infra by just passing the text to the memory router add_memory
(which writes to neo4J using openmemory)

If there are specific obsidian characteristics, then I would make a model for obsidianmemory/memory and then use a custom prompt to process it using memory controller.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this in extras? duplicates existing fucntions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this in exteras

@thestumonkey
Copy link
Contributor

Can you also provide some obsidian test data and put into tests/test_assets ?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

♻️ Duplicate comments (4)
backends/advanced/src/advanced_omi_backend/chat_service.py (1)

322-335: Use logging.exception() and bare raise for better stack traces.

This is the same issue flagged in a previous review. Use logging.exception() to capture the full stack trace and bare raise to preserve the original exception context.

🔎 Proposed fix
         # Add Obsidian context if requested
         if include_obsidian_memory:
             try:
                 obsidian_service = get_obsidian_service()
                 obsidian_context = await obsidian_service.search_obsidian(current_message)
                 if obsidian_context:
                     context_parts.append("# Relevant Obsidian Notes:")
                     for entry in obsidian_context:
                         context_parts.append(entry)
                     context_parts.append("")
                     logger.info(f"Added {len(obsidian_context)} Obsidian notes to context")
             except Exception as e:
-                logger.error(f"Failed to get Obsidian context: {e}")
-                raise e
+                logger.exception("Failed to get Obsidian context")
+                raise
backends/advanced/src/advanced_omi_backend/routers/modules/obsidian_routes.py (2)

46-48: Improve exception handling with stack trace and exception chaining.

Same issue flagged in previous review. Use logging.exception() and chain with from e.

🔎 Proposed fix
     except Exception as e:
-        logger.error(f"Ingestion failed: {e}")
-        raise HTTPException(status_code=500, detail=str(e))
+        logger.exception("Ingestion failed")
+        raise HTTPException(status_code=500, detail=str(e)) from e

111-113: Improve exception handling with stack trace and exception chaining.

Same issue flagged in previous review.

🔎 Proposed fix
     except Exception as e:
-        logger.error(f"Failed to process uploaded zip: {e}")
-        raise HTTPException(status_code=500, detail=f"Failed to process uploaded zip: {e}")
+        logger.exception("Failed to process uploaded zip")
+        raise HTTPException(status_code=500, detail=f"Failed to process uploaded zip: {e}") from e
backends/advanced/src/advanced_omi_backend/services/obsidian_service.py (1)

265-274: Use logging.exception() for automatic stack trace capture.

Per project learnings, logging.exception() should be used inside except blocks to automatically capture the full stack trace, improving debuggability.

Based on learnings, when re-raising exceptions, chain with raise ... from e.

🔎 Proposed fix
         try:
             vectors = await generate_openai_embeddings(
                 cleaned_chunks,
                 api_key=self.openai_api_key,
                 base_url=self.openai_base_url,
                 model=self.embedding_model,
             )
         except Exception as e:
-            logger.error(f"Embedding generation failed for {note_data['path']}: {e}")
+            logger.exception("Embedding generation failed for %s", note_data['path'])
             return []
🧹 Nitpick comments (12)
backends/advanced/src/advanced_omi_backend/utils/model_utils.py (1)

39-39: Optional: Consider extracting error messages.

Ruff's TRY003 rule suggests avoiding long messages directly in exception constructors. You could optionally extract these to constants or a separate error message module for better maintainability, though the current inline messages are clear and contextual.

Also applies to: 45-45

backends/advanced/src/advanced_omi_backend/services/memory/providers/llm_providers.py (1)

277-279: Use logging.exception for better error diagnostics.

Replace memory_logger.error with memory_logger.exception to automatically capture the full stack trace.

Based on learnings, using logging.exception() inside except blocks improves debuggability by logging the complete traceback.

🔎 Proposed fix
         except Exception as e:
-            memory_logger.error(f"OpenAI embedding generation failed: {e}")
+            memory_logger.exception("OpenAI embedding generation failed")
             raise
backends/advanced/src/advanced_omi_backend/utils/file_utils.py (1)

66-81: Use logging.exception() in except blocks for full stack traces.

Per repository conventions, use logging.exception() instead of logging.error() inside except blocks to automatically capture the full stack trace for better debuggability. The exception chaining with from e is correctly implemented.

🔎 Proposed fix
     except zipfile.BadZipFile as e:
         error_msg = f"Invalid zip file: {zip_path} - {e}"
-        logger.error(error_msg)
+        logger.exception(error_msg)
         raise zipfile.BadZipFile(error_msg) from e
     except zipfile.LargeZipFile as e:
         error_msg = f"Zip file too large: {zip_path} - {e}"
-        logger.error(error_msg)
+        logger.exception(error_msg)
         raise ZipExtractionError(error_msg) from e
     except PermissionError as e:
         error_msg = f"Permission denied extracting zip file: {zip_path} - {e}"
-        logger.error(error_msg)
+        logger.exception(error_msg)
         raise ZipExtractionError(error_msg) from e
     except Exception as e:
         error_msg = f"Error extracting zip file {zip_path}: {e}"
-        logger.error(error_msg)
+        logger.exception(error_msg)
         raise ZipExtractionError(error_msg) from e

Based on learnings from previous reviews in this project.

backends/advanced/src/advanced_omi_backend/workers/obsidian_jobs.py (3)

17-25: Rename unused loop variable root to _root.

The root variable in the loop is not used directly in the loop body (only dirs and files are used). While the in-place modification of dirs affects the walk, the root variable itself is unused.

🔎 Proposed fix
 def count_markdown_files(vault_path: str) -> int:
     """Recursively count markdown files in a vault."""
     count = 0
-    for root, dirs, files in os.walk(vault_path):
+    for _root, dirs, files in os.walk(vault_path):
         dirs[:] = [d for d in dirs if not d.startswith(".")]
         for filename in files:
             if filename.endswith(".md"):
                 count += 1
     return count

93-97: Use logging.exception() for per-file processing errors.

Per repository conventions, use logging.exception() instead of logger.error() to capture the full stack trace for debugging.

🔎 Proposed fix
             except Exception as exc:
-                logger.error("Processing %s failed: %s", filename, exc)
+                logger.exception("Processing %s failed", filename)
                 errors.append(f"{filename}: {exc}")
                 job.meta["errors"] = errors
                 job.save_meta()

Based on learnings from previous reviews in this project.


46-51: Remove redundant exception object from logging.exception call.

When using logging.exception(), the exception info is automatically included, making the exc parameter in the format string redundant.

🔎 Proposed fix
     except Exception as exc:
-        logger.exception("Database setup failed for job %s: %s", job.id, exc)
+        logger.exception("Database setup failed for job %s", job.id)
         job.meta["status"] = "failed"
         job.meta["error"] = f"Database setup failed: {exc}"
         job.save_meta()
         raise
backends/advanced/src/advanced_omi_backend/routers/modules/obsidian_routes.py (1)

152-154: Use logging.exception() and chain the exception.

🔎 Proposed fix
         except Exception as e:
-            logger.error(f"Failed to start job {job_id}: {e}")
-            raise HTTPException(status_code=500, detail=f"Failed to start job: {e}")
+            logger.exception(f"Failed to start job {job_id}")
+            raise HTTPException(status_code=500, detail=f"Failed to start job: {e}") from e

Based on learnings from previous reviews in this project.

backends/advanced/src/advanced_omi_backend/services/obsidian_service.py (5)

48-88: Consider consolidating with existing utility functions.

Per previous feedback, environment loading utilities already exist at upload_audio/load_env_file, scripts/lib/env_utils, and wizard.py. The resolve_value function is also generic enough to be shared across services. Consider moving these to a shared utilities module to reduce duplication and improve maintainability.

Based on learnings, this consolidation would align with the DRY principle and make these utilities available for other services.


118-118: Address technical debt: cleanup after k8s migration.

The inline comment indicates this path handling needs cleanup after k8s migration is complete and the root .env file is removed. Consider creating a tracking issue for this technical debt to ensure it's addressed.


277-277: Add explicit strict=True parameter to zip().

Without strict=True, mismatched lengths between original_chunks and vectors would silently truncate to the shorter sequence. If embedding generation partially fails, this could mask data loss.

🔎 Proposed fix
-        for orig_text, vector in zip(original_chunks, vectors):
+        for orig_text, vector in zip(original_chunks, vectors, strict=True):
             if vector:
                 chunk_payloads.append({"text": orig_text, "embedding": vector})
             else:
                 logger.warning(f"Embedding missing for chunk in {note_data['path']}")

302-302: Consider UUID-based chunk IDs for consistency.

While MD5 is acceptable here for generating deterministic chunk IDs (not cryptographic use), consider using hashlib.sha256() or uuid.uuid5() for consistency with security best practices and to avoid static analysis flags.


426-438: Consider adding thread safety to singleton initialization.

Per previous review, the check-then-set pattern has a theoretical race condition where multiple threads could simultaneously create service instances. While Python's GIL makes this less likely in practice, using a threading.Lock with double-check pattern would follow best practices.

🔎 Proposed fix with threading.Lock
+import threading
+
 # Lazy initialization to avoid startup failures
 _obsidian_service: Optional[ObsidianService] = None
+_service_lock = threading.Lock()

 def get_obsidian_service() -> ObsidianService:
     """Get or create the Obsidian service singleton.
     
     Returns:
         ObsidianService instance, creating it on first access.
     """
     global _obsidian_service
     if _obsidian_service is None:
-        _obsidian_service = ObsidianService()
+        with _service_lock:
+            # Double-check: another thread may have initialized it
+            if _obsidian_service is None:
+                _obsidian_service = ObsidianService()
     return _obsidian_service
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ecfa5c3 and 1f2e1d2.

📒 Files selected for processing (9)
  • backends/advanced/src/advanced_omi_backend/chat_service.py (7 hunks)
  • backends/advanced/src/advanced_omi_backend/routers/modules/obsidian_routes.py (1 hunks)
  • backends/advanced/src/advanced_omi_backend/services/memory/providers/llm_providers.py (2 hunks)
  • backends/advanced/src/advanced_omi_backend/services/neo4j_client.py (1 hunks)
  • backends/advanced/src/advanced_omi_backend/services/obsidian_service.py (1 hunks)
  • backends/advanced/src/advanced_omi_backend/utils/file_utils.py (1 hunks)
  • backends/advanced/src/advanced_omi_backend/utils/model_utils.py (1 hunks)
  • backends/advanced/src/advanced_omi_backend/workers/obsidian_jobs.py (1 hunks)
  • backends/advanced/tests/test_obsidian_service.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • backends/advanced/tests/test_obsidian_service.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use Black formatter with 100-character line length for Python code
Use isort for Python import organization
ALL imports must be at the top of the file after the docstring - never import modules in the middle of functions or files
Group imports in Python files: standard library, third-party, then local imports
Use lazy imports sparingly and only when absolutely necessary for circular import issues in Python
Always raise errors in Python, never silently ignore - use explicit error handling with proper exceptions rather than silent failures
Avoid defensive hasattr() checks in Python - research and understand input/response or class structure instead

Files:

  • backends/advanced/src/advanced_omi_backend/workers/obsidian_jobs.py
  • backends/advanced/src/advanced_omi_backend/utils/file_utils.py
  • backends/advanced/src/advanced_omi_backend/chat_service.py
  • backends/advanced/src/advanced_omi_backend/routers/modules/obsidian_routes.py
  • backends/advanced/src/advanced_omi_backend/utils/model_utils.py
  • backends/advanced/src/advanced_omi_backend/services/memory/providers/llm_providers.py
  • backends/advanced/src/advanced_omi_backend/services/neo4j_client.py
  • backends/advanced/src/advanced_omi_backend/services/obsidian_service.py
🧠 Learnings (2)
📚 Learning: 2025-12-08T23:52:34.959Z
Learnt from: AnkushMalaker
Repo: chronicler-ai/chronicle PR: 178
File: backends/advanced/src/advanced_omi_backend/services/memory/providers/mycelia.py:218-223
Timestamp: 2025-12-08T23:52:34.959Z
Learning: In Python code (chronicle project), prefer logging.exception() inside except blocks to automatically log the full stack trace. When re-raising exceptions, always chain with 'raise ... from e' to preserve the original context; use 'raise ... from None' only if you explicitly want to suppress the context. This improves debuggability across Python files.

Applied to files:

  • backends/advanced/src/advanced_omi_backend/workers/obsidian_jobs.py
  • backends/advanced/src/advanced_omi_backend/utils/file_utils.py
  • backends/advanced/src/advanced_omi_backend/chat_service.py
  • backends/advanced/src/advanced_omi_backend/routers/modules/obsidian_routes.py
  • backends/advanced/src/advanced_omi_backend/utils/model_utils.py
  • backends/advanced/src/advanced_omi_backend/services/memory/providers/llm_providers.py
  • backends/advanced/src/advanced_omi_backend/services/neo4j_client.py
  • backends/advanced/src/advanced_omi_backend/services/obsidian_service.py
📚 Learning: 2025-12-20T01:11:51.486Z
Learnt from: CR
Repo: chronicler-ai/chronicle PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-20T01:11:51.486Z
Learning: Applies to **/*.py : Always raise errors in Python, never silently ignore - use explicit error handling with proper exceptions rather than silent failures

Applied to files:

  • backends/advanced/src/advanced_omi_backend/routers/modules/obsidian_routes.py
🧬 Code graph analysis (4)
backends/advanced/src/advanced_omi_backend/workers/obsidian_jobs.py (3)
backends/advanced/src/advanced_omi_backend/models/job.py (1)
  • async_job (203-290)
backends/advanced/src/advanced_omi_backend/services/obsidian_service.py (4)
  • setup_database (158-173)
  • parse_obsidian_note (180-233)
  • chunking_and_embedding (235-283)
  • ingest_note_and_chunks (285-317)
backends/advanced/src/advanced_omi_backend/routers/modules/obsidian_routes.py (1)
  • get_status (171-225)
backends/advanced/src/advanced_omi_backend/chat_service.py (2)
backends/advanced/src/advanced_omi_backend/services/memory/base.py (1)
  • MemoryEntry (26-63)
backends/advanced/src/advanced_omi_backend/services/obsidian_service.py (2)
  • get_obsidian_service (429-438)
  • search_obsidian (359-424)
backends/advanced/src/advanced_omi_backend/routers/modules/obsidian_routes.py (2)
backends/advanced/src/advanced_omi_backend/utils/file_utils.py (2)
  • extract_zip (21-81)
  • ZipExtractionError (16-18)
backends/advanced/src/advanced_omi_backend/workers/obsidian_jobs.py (2)
  • count_markdown_files (17-25)
  • ingest_obsidian_vault_job (29-107)
backends/advanced/src/advanced_omi_backend/services/neo4j_client.py (1)
backends/advanced/src/advanced_omi_backend/services/obsidian_service.py (1)
  • close (150-152)
🪛 Ruff (0.14.8)
backends/advanced/src/advanced_omi_backend/workers/obsidian_jobs.py

20-20: Loop control variable root not used within loop body

Rename unused root to _root

(B007)


29-29: Unused function argument: job_id

(ARG001)


29-29: Unused function argument: redis_client

(ARG001)


47-47: Redundant exception object included in logging.exception call

(TRY401)


93-93: Do not catch blind exception: Exception

(BLE001)


94-94: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

backends/advanced/src/advanced_omi_backend/utils/file_utils.py

65-65: Consider moving this statement to an else block

(TRY300)


68-68: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


72-72: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


76-76: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


80-80: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

backends/advanced/src/advanced_omi_backend/chat_service.py

334-334: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


335-335: Use raise without specifying exception name

Remove exception name

(TRY201)

backends/advanced/src/advanced_omi_backend/routers/modules/obsidian_routes.py

34-34: Unused function argument: current_user

(ARG001)


34-34: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


45-45: Consider moving this statement to an else block

(TRY300)


46-46: Do not catch blind exception: Exception

(BLE001)


47-47: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


48-48: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


53-53: Do not perform function call File in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


54-54: Unused function argument: current_user

(ARG001)


54-54: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


82-82: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


83-83: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


89-89: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


90-90: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


92-92: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


93-93: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


111-111: Do not catch blind exception: Exception

(BLE001)


112-112: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


113-113: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


121-121: Use raise without specifying exception name

Remove exception name

(TRY201)


127-127: Unused function argument: current_user

(ARG001)


127-127: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


151-151: Consider moving this statement to an else block

(TRY300)


152-152: Do not catch blind exception: Exception

(BLE001)


153-153: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


154-154: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


167-167: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


171-171: Unused function argument: current_user

(ARG001)


171-171: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


222-222: Do not use bare except

(E722)


222-223: try-except-pass detected, consider logging the exception

(S110)


225-225: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

backends/advanced/src/advanced_omi_backend/utils/model_utils.py

39-39: Avoid specifying long messages outside the exception class

(TRY003)


45-45: Avoid specifying long messages outside the exception class

(TRY003)

backends/advanced/src/advanced_omi_backend/services/memory/providers/llm_providers.py

278-278: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

backends/advanced/src/advanced_omi_backend/services/obsidian_service.py

107-107: Avoid specifying long messages outside the exception class

(TRY003)


111-111: Avoid specifying long messages outside the exception class

(TRY003)


131-131: Avoid specifying long messages outside the exception class

(TRY003)


272-272: Do not catch blind exception: Exception

(BLE001)


273-273: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


277-277: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)


302-302: Probable use of insecure hash functions in hashlib: md5

(S324)


335-335: Avoid specifying long messages outside the exception class

(TRY003)


355-355: Use explicit conversion flag

Replace with conversion flag

(RUF010)


379-379: Redundant exception object included in logging.exception call

(TRY401)


380-380: Use raise without specifying exception name

Remove exception name

(TRY201)


421-421: Do not catch blind exception: Exception

(BLE001)


422-422: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

🔇 Additional comments (9)
backends/advanced/src/advanced_omi_backend/utils/model_utils.py (1)

37-45: LGTM!

The implementation logic is clear and correct. It properly validates the configuration structure, safely navigates nested dictionaries, and provides explicit error handling with informative messages when defaults or model definitions are missing.

backends/advanced/src/advanced_omi_backend/services/memory/providers/llm_providers.py (1)

271-279: AI summary inconsistency with exception handling.

The AI summary states that "exceptions are logged and swallowed (no explicit return value provided on error)," but the code at line 279 contains a raise statement that re-raises the exception. The exception is not swallowed—it propagates to the caller.

backends/advanced/src/advanced_omi_backend/utils/file_utils.py (1)

1-18: Clean module structure with proper imports.

Good implementation with clear docstring, standard library imports at the top, and a well-defined custom exception class. The function signature with type hints and comprehensive docstring is helpful.

backends/advanced/src/advanced_omi_backend/workers/obsidian_jobs.py (1)

74-80: Good implementation of cancellation support.

The cancellation check pattern with job.refresh() and status check allows users to cancel long-running jobs gracefully. The metadata is properly updated before returning.

backends/advanced/src/advanced_omi_backend/services/neo4j_client.py (1)

7-33: Clean implementation of lazy driver initialization.

The Neo4jClient class properly implements lazy driver initialization with get_driver(), and the close() and reset() methods handle cleanup appropriately. The session factory method with optional access mode is well-designed.

backends/advanced/src/advanced_omi_backend/chat_service.py (2)

285-297: Correct migration to MemoryEntry type.

The return type change from List[Dict] to List[MemoryEntry] and the corresponding attribute access pattern (.id, .content) properly align with the MemoryEntry dataclass definition.


299-320: Clean integration of optional Obsidian memory parameter.

The include_obsidian_memory parameter is properly threaded through the call chain and the context formatting logic handles both memory types appropriately.

backends/advanced/src/advanced_omi_backend/routers/modules/obsidian_routes.py (2)

124-151: Good integration with RQ queue system.

The /start endpoint properly enqueues jobs to the RQ queue with appropriate parameters (job_id, job_timeout, description). The pending state is correctly managed by removing the Redis key after successful enqueueing.


170-203: Well-structured status reporting with fallback logic.

The status endpoint correctly checks RQ first and falls back to pending state. The status mapping from RQ terminology (startedrunning, canceledcancelled) and progress calculation are properly implemented.

Comment on lines 93 to 148
def __init__(self):
"""Initialize the Obsidian service with configuration from config.yml and environment."""
# Resolve paths relative to this file
# backends/advanced/src/advanced_omi_backend/services/obsidian_service.py
self.CURRENT_DIR = Path(__file__).parent.resolve()

# Load configuration strictly from standard locations
# Prefer /app/config.yml inside containers (mounted by docker-compose)
# Fallbacks handled by shared utility
config_data = load_root_config()

# Get model configurations using shared utility
llm_config = get_model_config(config_data, 'llm')
if not llm_config:
raise ValueError("Configuration for 'defaults.llm' not found in config.yml")

embed_config = get_model_config(config_data, 'embedding')
if not embed_config:
raise ValueError("Configuration for 'defaults.embedding' not found in config.yml")

# Neo4j Connection - Prefer environment variables passed by Docker Compose
neo4j_host = os.getenv("NEO4J_HOST")
# Load .env file as fallback (for local dev or if env vars not set)
candidate_env_files = [
Path("/app/.env"),
self.CURRENT_DIR.parent.parent.parent.parent / ".env", # Project root .env file ToDo cleanup needed after k8s is migrated and there is no .env file in the project root.
self.CURRENT_DIR.parent.parent.parent.parent / "backends" / "advanced" / ".env", # repo path
]
env_data = {}
for p in candidate_env_files:
if p.exists():
env_data.update(load_env_file(p))

# Use env var first, then fallback to .env file
if not neo4j_host:
neo4j_host = env_data.get("NEO4J_HOST")

if not neo4j_host:
raise KeyError("NEO4J_HOST not found in environment or .env")

self.neo4j_uri = f"bolt://{neo4j_host}:7687"
self.neo4j_user = os.getenv("NEO4J_USER") or env_data.get("NEO4J_USER", "neo4j")
self.neo4j_password = os.getenv("NEO4J_PASSWORD") or env_data.get("NEO4J_PASSWORD", "")

# Models / API - Loaded strictly from config.yml
self.embedding_model = str(resolve_value(embed_config['model_name']))
self.embedding_dimensions = int(resolve_value(embed_config['embedding_dimensions']))
self.openai_base_url = str(resolve_value(llm_config['model_url']))
self.openai_api_key = str(resolve_value(llm_config['api_key']))

# Chunking - uses shared spaCy/text fallback utility
self.chunk_word_limit = 120

self.neo4j_client = Neo4jClient(self.neo4j_uri, self.neo4j_user, self.neo4j_password)
self.read_interface = Neo4jReadInterface(self.neo4j_client)
self.write_interface = Neo4jWriteInterface(self.neo4j_client)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Re-evaluate architectural approach to leverage existing infrastructure.

Per maintainer feedback, this service appears to duplicate existing functionality. The current memory router's add_memory already writes to Neo4j via OpenMemory. Consider refactoring to use the existing memory infrastructure (services/memory/providers/llm_provider and memory router) rather than implementing a parallel ingestion system. If Obsidian-specific characteristics require custom handling, create an ObsidianMemory model and use a custom prompt through the memory controller.

This approach would reduce code duplication, maintain consistency with the existing memory system, and leverage proven infrastructure.

🧰 Tools
🪛 Ruff (0.14.8)

107-107: Avoid specifying long messages outside the exception class

(TRY003)


111-111: Avoid specifying long messages outside the exception class

(TRY003)


131-131: Avoid specifying long messages outside the exception class

(TRY003)



@async_job(redis=True, beanie=False)
async def ingest_obsidian_vault_job(job_id: str, vault_path: str, redis_client=None) -> dict:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for the function definition and call sites
cd /repo 2>/dev/null || cd . 

# Find the file
find . -name "obsidian_jobs.py" -type f 2>/dev/null | head -5

Repository: chronicler-ai/chronicle

Length of output: 136


🏁 Script executed:

#!/bin/bash
# Look for the function and how it's called
rg "ingest_obsidian_vault_job" --type=py -B2 -A10 2>/dev/null | head -100

Repository: chronicler-ai/chronicle

Length of output: 4187


🏁 Script executed:

#!/bin/bash
# Check decorator pattern and redis_client usage
rg "@async_job|redis_client" --type=py -B1 -A3 2>/dev/null | head -100

Repository: chronicler-ai/chronicle

Length of output: 9797


🏁 Script executed:

#!/bin/bash
# Read the obsidian_jobs.py file to see full function implementation
cat -n backends/advanced/src/advanced_omi_backend/workers/obsidian_jobs.py | head -150

Repository: chronicler-ai/chronicle

Length of output: 4161


Remove unused job_id and redis_client parameters.

The job_id parameter is never used; the function always retrieves the job ID via get_current_job().id instead. The redis_client parameter is injected by the @async_job decorator but is never referenced in the function body. Consider removing both parameters or documenting their intended purpose.

Additionally, fix logging inconsistencies: line 47 uses logger.exception() with exc in the format string (redundant), while line 94 uses logger.error() with the same pattern. Use logger.exception() without the exc argument (which automatically includes traceback) or use logger.error() consistently.

🧰 Tools
🪛 Ruff (0.14.8)

29-29: Unused function argument: job_id

(ARG001)


29-29: Unused function argument: redis_client

(ARG001)

🤖 Prompt for AI Agents
In backends/advanced/src/advanced_omi_backend/workers/obsidian_jobs.py around
lines 29, 47 and 94: remove the unused function parameters `job_id` and
`redis_client` from the async def ingest_obsidian_vault_job signature (the
function already uses get_current_job().id), and update any decorators/call
sites that inject `redis_client` so they no longer expect those args; also fix
logging: at line 47 remove the redundant `exc` interpolation and call
logger.exception("...") (which includes traceback automatically), and change the
error at line 94 to use logger.exception("...") (or use logger.error without
interpolating the exception) so logging is consistent and does not duplicate the
exception object in the formatted message.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (8)
backends/advanced/src/advanced_omi_backend/workers/obsidian_jobs.py (3)

28-29: Remove unused function parameters.

The job_id and redis_client parameters are never used in the function body. The function always retrieves the job ID via get_current_job().id instead of using the job_id parameter. Based on learnings, remove unused parameters.

🔎 Proposed fix
 @async_job(redis=True, beanie=False)
-async def ingest_obsidian_vault_job(job_id: str, vault_path: str, redis_client=None) -> dict: # type: ignore
+async def ingest_obsidian_vault_job(vault_path: str) -> dict: # type: ignore
     """
     Long-running ingestion job enqueued on the default RQ queue.
     """

46-47: Fix redundant exception logging.

The logger.exception() call automatically includes exception details, so interpolating exc in the format string is redundant. Based on learnings from this repository, use logger.exception() without the exception argument.

🔎 Proposed fix
     except Exception as exc:
-        logger.exception("Database setup failed for job %s: %s", job.id, exc)
+        logger.exception("Database setup failed for job %s", job.id)
         job.meta["status"] = "failed"
         job.meta["error"] = f"Database setup failed: {exc}"
         job.save_meta()

93-94: Use logger.exception() for automatic stack trace capture.

Per repository learnings, prefer logging.exception() inside except blocks to automatically log the full stack trace for better debuggability.

🔎 Proposed fix
             except Exception as exc:
-                logger.error("Processing %s failed: %s", filename, exc)
+                logger.exception("Processing %s failed", filename)
                 errors.append(f"{filename}: {exc}")
                 job.meta["errors"] = errors
                 job.save_meta()
backends/advanced/src/advanced_omi_backend/routers/modules/obsidian_routes.py (3)

46-48: Use logger.exception() and chain exceptions properly.

Per repository learnings, use logging.exception() inside except blocks to capture the full stack trace, and chain exceptions with from e to preserve context.

🔎 Proposed fix
     except Exception as e:
-        logger.error(f"Ingestion failed: {e}")
-        raise HTTPException(status_code=500, detail=str(e))
+        logger.exception("Ingestion failed")
+        raise HTTPException(status_code=500, detail=str(e)) from e

111-113: Use logger.exception() and add exception chaining.

Per repository learnings, use logging.exception() to capture stack traces and chain exceptions with from e.

🔎 Proposed fix
     except Exception as e:
-        logger.error(f"Failed to process uploaded zip: {e}")
-        raise HTTPException(status_code=500, detail=f"Failed to process uploaded zip: {e}")
+        logger.exception("Failed to process uploaded zip")
+        raise HTTPException(status_code=500, detail=f"Failed to process uploaded zip: {e}") from e

221-223: Replace bare except with specific exception handling.

Per coding guidelines, always raise errors explicitly with proper exceptions. The bare except violates this principle. Catch the specific exception (json.JSONDecodeError) and add exception chaining.

🔎 Proposed fix
             try:
                 job_data = json.loads(pending_data)
                 return {
                     "job_id": job_id,
                     "status": "ready",
                     "total": job_data.get("total", 0),
                     "processed": 0,
                     "percent": 0,
                     "errors": [],
                     "vault_path": job_data.get("vault_path")
                 }
-            except:
-                raise HTTPException(status_code=500, detail="Failed to get job status")
+            except json.JSONDecodeError as e:
+                logger.exception("Failed to parse pending job data for %s", job_id)
+                raise HTTPException(status_code=500, detail="Failed to get job status") from e
         raise HTTPException(status_code=404, detail="Job not found")
backends/advanced/src/advanced_omi_backend/services/obsidian_service.py (2)

420-421: Use logger.exception() for search failures.

Per repository learnings, use logging.exception() inside except blocks to automatically capture the full stack trace.

🔎 Proposed fix
         except Exception as e:
-            logger.error(f"Obsidian search failed: {e}")
+            logger.exception("Obsidian search failed")

428-437: Add thread-safe singleton initialization.

The check-then-set pattern has a race condition under concurrent access. Multiple threads could simultaneously pass the if _obsidian_service is None check and create multiple instances.

🔎 Proposed fix using threading.Lock
+import threading
+
 # Lazy initialization to avoid startup failures
 _obsidian_service: Optional[ObsidianService] = None
+_service_lock = threading.Lock()

 def get_obsidian_service() -> ObsidianService:
     """Get or create the Obsidian service singleton.
     
     Returns:
         ObsidianService instance, creating it on first access.
     """
     global _obsidian_service
     if _obsidian_service is None:
-        _obsidian_service = ObsidianService()
+        with _service_lock:
+            # Double-check pattern: another thread may have initialized it
+            if _obsidian_service is None:
+                _obsidian_service = ObsidianService()
     return _obsidian_service
🧹 Nitpick comments (4)
backends/advanced/src/advanced_omi_backend/utils/model_utils.py (1)

37-45: Logic is correct and error handling is appropriate.

The function correctly retrieves the default model name, searches for the corresponding model definition, and raises informative ValueError exceptions when configurations are missing. The error messages clearly indicate what went wrong, which aids debugging.

Optional: Consider Ruff TRY003 suggestion.

Static analysis flags the long error messages (lines 39, 45) as a stylistic concern. For simple validation errors like these, inline messages are perfectly acceptable. If you prefer to follow Ruff's recommendation strictly, you could create custom exception classes, but this is entirely optional.

backends/advanced/src/advanced_omi_backend/services/obsidian_service.py (3)

48-72: Consider consolidating environment loading utilities.

Per maintainer feedback, there are existing env loaders in upload_audio/load_env_file, root/scripts/lib/env_utils, and wizard.py. Consider consolidating to avoid duplication.

Based on learnings: Maintainer noted consolidation opportunity.


378-379: Remove redundant exception handler.

The exception is caught and immediately re-raised without any handling. This is redundant. Either handle the exception or remove the try-except block.

🔎 Proposed fix
-        try:
-            clean_query = self._clean_text(query)
-            vectors = await generate_openai_embeddings(
-                [clean_query],
-                api_key=self.openai_api_key,
-                base_url=self.openai_base_url,
-                model=self.embedding_model,
-            )
-            query_vector = vectors[0] if vectors else None
-        except Exception:
-            raise
+        clean_query = self._clean_text(query)
+        vectors = await generate_openai_embeddings(
+            [clean_query],
+            api_key=self.openai_api_key,
+            base_url=self.openai_base_url,
+            model=self.embedding_model,
+        )
+        query_vector = vectors[0] if vectors else None

90-149: Consider leveraging existing memory infrastructure.

Per maintainer feedback, this service appears to duplicate existing functionality. The memory router's add_memory already writes to Neo4j via OpenMemory. Consider refactoring to use services/memory/providers/llm_provider and the memory router rather than implementing a parallel ingestion system. If Obsidian-specific characteristics require custom handling, create an ObsidianMemory model and use a custom prompt through the memory controller.

Based on learnings: Maintainer suggested architectural alignment with existing memory infrastructure.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1f2e1d2 and a75217e.

⛔ Files ignored due to path filters (1)
  • tests/test_assets/obsidian_dataview_example_vault.zip is excluded by !**/*.zip
📒 Files selected for processing (5)
  • backends/advanced/src/advanced_omi_backend/routers/modules/obsidian_routes.py (1 hunks)
  • backends/advanced/src/advanced_omi_backend/services/neo4j_client.py (1 hunks)
  • backends/advanced/src/advanced_omi_backend/services/obsidian_service.py (1 hunks)
  • backends/advanced/src/advanced_omi_backend/utils/model_utils.py (1 hunks)
  • backends/advanced/src/advanced_omi_backend/workers/obsidian_jobs.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use Black formatter with 100-character line length for Python code
Use isort for Python import organization
ALL imports must be at the top of the file after the docstring - never import modules in the middle of functions or files
Group imports in Python files: standard library, third-party, then local imports
Use lazy imports sparingly and only when absolutely necessary for circular import issues in Python
Always raise errors in Python, never silently ignore - use explicit error handling with proper exceptions rather than silent failures
Avoid defensive hasattr() checks in Python - research and understand input/response or class structure instead

Files:

  • backends/advanced/src/advanced_omi_backend/services/neo4j_client.py
  • backends/advanced/src/advanced_omi_backend/services/obsidian_service.py
  • backends/advanced/src/advanced_omi_backend/routers/modules/obsidian_routes.py
  • backends/advanced/src/advanced_omi_backend/workers/obsidian_jobs.py
  • backends/advanced/src/advanced_omi_backend/utils/model_utils.py
🧠 Learnings (2)
📚 Learning: 2025-12-08T23:52:34.959Z
Learnt from: AnkushMalaker
Repo: chronicler-ai/chronicle PR: 178
File: backends/advanced/src/advanced_omi_backend/services/memory/providers/mycelia.py:218-223
Timestamp: 2025-12-08T23:52:34.959Z
Learning: In Python code (chronicle project), prefer logging.exception() inside except blocks to automatically log the full stack trace. When re-raising exceptions, always chain with 'raise ... from e' to preserve the original context; use 'raise ... from None' only if you explicitly want to suppress the context. This improves debuggability across Python files.

Applied to files:

  • backends/advanced/src/advanced_omi_backend/services/neo4j_client.py
  • backends/advanced/src/advanced_omi_backend/services/obsidian_service.py
  • backends/advanced/src/advanced_omi_backend/routers/modules/obsidian_routes.py
  • backends/advanced/src/advanced_omi_backend/workers/obsidian_jobs.py
  • backends/advanced/src/advanced_omi_backend/utils/model_utils.py
📚 Learning: 2025-12-20T01:11:51.486Z
Learnt from: CR
Repo: chronicler-ai/chronicle PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-20T01:11:51.486Z
Learning: Applies to **/*.py : Always raise errors in Python, never silently ignore - use explicit error handling with proper exceptions rather than silent failures

Applied to files:

  • backends/advanced/src/advanced_omi_backend/services/obsidian_service.py
  • backends/advanced/src/advanced_omi_backend/routers/modules/obsidian_routes.py
🧬 Code graph analysis (2)
backends/advanced/src/advanced_omi_backend/services/neo4j_client.py (1)
backends/advanced/src/advanced_omi_backend/services/obsidian_service.py (1)
  • close (150-152)
backends/advanced/src/advanced_omi_backend/routers/modules/obsidian_routes.py (3)
backends/advanced/src/advanced_omi_backend/utils/file_utils.py (2)
  • extract_zip (21-81)
  • ZipExtractionError (16-18)
backends/advanced/src/advanced_omi_backend/workers/obsidian_jobs.py (2)
  • count_markdown_files (17-25)
  • ingest_obsidian_vault_job (29-107)
backends/advanced/src/advanced_omi_backend/services/obsidian_service.py (1)
  • ingest_vault (319-357)
🪛 Ruff (0.14.8)
backends/advanced/src/advanced_omi_backend/services/obsidian_service.py

107-107: Avoid specifying long messages outside the exception class

(TRY003)


111-111: Avoid specifying long messages outside the exception class

(TRY003)


131-131: Avoid specifying long messages outside the exception class

(TRY003)


272-272: Do not catch blind exception: Exception

(BLE001)


273-273: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


277-277: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)


302-302: Probable use of insecure hash functions in hashlib: md5

(S324)


335-335: Avoid specifying long messages outside the exception class

(TRY003)


355-355: Use explicit conversion flag

Replace with conversion flag

(RUF010)


378-379: Remove exception handler; error is immediately re-raised

(TRY203)


420-420: Do not catch blind exception: Exception

(BLE001)


421-421: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

backends/advanced/src/advanced_omi_backend/routers/modules/obsidian_routes.py

34-34: Unused function argument: current_user

(ARG001)


34-34: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


45-45: Consider moving this statement to an else block

(TRY300)


46-46: Do not catch blind exception: Exception

(BLE001)


47-47: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


48-48: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


53-53: Do not perform function call File in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


54-54: Unused function argument: current_user

(ARG001)


54-54: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


82-82: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


83-83: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


89-89: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


90-90: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


92-92: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


93-93: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


111-111: Do not catch blind exception: Exception

(BLE001)


112-112: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


113-113: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


119-119: Do not catch blind exception: Exception

(BLE001)


126-126: Unused function argument: current_user

(ARG001)


126-126: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


150-150: Consider moving this statement to an else block

(TRY300)


151-151: Do not catch blind exception: Exception

(BLE001)


152-152: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


153-153: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


166-166: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


170-170: Unused function argument: current_user

(ARG001)


170-170: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


221-221: Do not use bare except

(E722)


222-222: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


223-223: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

backends/advanced/src/advanced_omi_backend/workers/obsidian_jobs.py

20-20: Loop control variable root not used within loop body

Rename unused root to _root

(B007)


29-29: Unused function argument: job_id

(ARG001)


29-29: Unused function argument: redis_client

(ARG001)


47-47: Redundant exception object included in logging.exception call

(TRY401)


93-93: Do not catch blind exception: Exception

(BLE001)


94-94: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

backends/advanced/src/advanced_omi_backend/utils/model_utils.py

39-39: Avoid specifying long messages outside the exception class

(TRY003)


45-45: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (2)
backends/advanced/src/advanced_omi_backend/utils/model_utils.py (1)

11-36: LGTM! Past issue resolved.

The return type has been correctly updated to Dict[str, Any] (non-optional), and the docstring accurately reflects that the function either returns a model configuration dictionary or raises a ValueError. The documentation is thorough and includes a helpful usage example.

backends/advanced/src/advanced_omi_backend/services/neo4j_client.py (1)

1-63: LGTM! Previous issues have been resolved.

The implementation correctly:

  • Uses READ_ACCESS and WRITE_ACCESS constants from the neo4j module (lines 4, 58, 63) instead of string literals
  • Consumes the result data before the session closes (line 53) by calling result.data(), preventing issues with closed Result objects

The Neo4j client abstraction is well-designed and thread-safe for session management.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (5)
backends/advanced/src/advanced_omi_backend/services/obsidian_service.py (4)

281-283: Use logger.exception() for automatic stack trace capture.

Per project learnings, use logging.exception() inside except blocks to automatically capture the full stack trace for better debuggability.

🔎 Proposed fix
         except Exception as e:
-            logger.error(f"Embedding generation failed for {note_data['path']}: {e}")
+            logger.exception("Embedding generation failed for %s", note_data['path'])
             return []

387-389: Use logger.exception() for embedding failure.

Per project learnings, use logging.exception() inside except blocks to automatically capture the full stack trace.

🔎 Proposed fix
         except Exception as exc:
-            logger.error("Obsidian search embedding failed: %s", exc)
+            logger.exception("Obsidian search embedding failed")
             raise ObsidianSearchError("embedding", str(exc)) from exc

430-432: Use logger.exception() for database failure.

Per project learnings, use logging.exception() inside except blocks to capture the full stack trace.

🔎 Proposed fix
         except Exception as e:
-            logger.error(f"Obsidian search failed: {e}")
+            logger.exception("Obsidian search failed")
             raise ObsidianSearchError("database", str(e)) from e

439-448: Singleton initialization is not thread-safe.

The check-then-set pattern has a race condition where multiple threads could create separate instances. Consider using a threading.Lock with double-check pattern if concurrent access is expected.

🔎 Proposed fix

Add at module level:

import threading
_service_lock = threading.Lock()

Then update the function:

 def get_obsidian_service() -> ObsidianService:
     global _obsidian_service
     if _obsidian_service is None:
-        _obsidian_service = ObsidianService()
+        with _service_lock:
+            if _obsidian_service is None:
+                _obsidian_service = ObsidianService()
     return _obsidian_service
backends/advanced/src/advanced_omi_backend/chat_service.py (1)

344-346: Use logging.exception() and bare raise for better stack traces.

Per project learnings, use logging.exception() inside except blocks for automatic stack trace capture, and use bare raise to preserve the original exception context.

🔎 Proposed fix
             except Exception as e:
-                logger.error(f"Failed to get Obsidian context: {e}")
-                raise e
+                logger.exception("Failed to get Obsidian context")
+                raise
🧹 Nitpick comments (4)
backends/advanced/src/advanced_omi_backend/services/obsidian_service.py (4)

57-97: Consider consolidating with existing env utilities.

Per prior feedback, the codebase already has env loaders in upload_audio/load_env_file, scripts/lib/env_utils, and wizard.py. Consider importing from an existing utility or consolidating these functions into a shared module to reduce duplication. Similarly, resolve_value could be moved to a shared utils module for reuse by other services.


286-288: Add strict=True to zip() to catch length mismatches.

If original_chunks and vectors have different lengths (e.g., API returns fewer embeddings), the current code silently drops items. Adding strict=True will raise a ValueError making such issues explicit.

🔎 Proposed fix
-        for orig_text, vector in zip(original_chunks, vectors):
+        for orig_text, vector in zip(original_chunks, vectors, strict=True):

310-311: Consider using SHA256 instead of MD5 for chunk IDs.

While MD5 is sufficient for generating unique chunk identifiers (not used for security), SHA256 is a more modern choice that avoids static analysis warnings and is future-proof.

🔎 Proposed fix
-                chunk_id = hashlib.md5(f"{note_data['path']}_{i}".encode()).hexdigest()
+                chunk_id = hashlib.sha256(f"{note_data['path']}_{i}".encode()).hexdigest()[:32]

362-364: Remove redundant exception object from logger.exception().

logger.exception() automatically includes the exception info, so {e} in the message is redundant.

🔎 Proposed fix
                     except Exception as e:
-                        logger.exception(f"Processing {file} failed")
+                        logger.exception("Processing %s failed", file)
                         errors.append(f"{file}: {str(e)}")
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a75217e and 0dfe250.

⛔ Files ignored due to path filters (1)
  • backends/advanced/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (4)
  • backends/advanced/pyproject.toml (1 hunks)
  • backends/advanced/src/advanced_omi_backend/chat_service.py (7 hunks)
  • backends/advanced/src/advanced_omi_backend/services/obsidian_service.py (1 hunks)
  • backends/advanced/tests/test_obsidian_service.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • backends/advanced/tests/test_obsidian_service.py
  • backends/advanced/pyproject.toml
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use Black formatter with 100-character line length for Python code
Use isort for Python import organization
ALL imports must be at the top of the file after the docstring - never import modules in the middle of functions or files
Group imports in Python files: standard library, third-party, then local imports
Use lazy imports sparingly and only when absolutely necessary for circular import issues in Python
Always raise errors in Python, never silently ignore - use explicit error handling with proper exceptions rather than silent failures
Avoid defensive hasattr() checks in Python - research and understand input/response or class structure instead

Files:

  • backends/advanced/src/advanced_omi_backend/services/obsidian_service.py
  • backends/advanced/src/advanced_omi_backend/chat_service.py
🧠 Learnings (2)
📚 Learning: 2025-12-08T23:52:34.959Z
Learnt from: AnkushMalaker
Repo: chronicler-ai/chronicle PR: 178
File: backends/advanced/src/advanced_omi_backend/services/memory/providers/mycelia.py:218-223
Timestamp: 2025-12-08T23:52:34.959Z
Learning: In Python code (chronicle project), prefer logging.exception() inside except blocks to automatically log the full stack trace. When re-raising exceptions, always chain with 'raise ... from e' to preserve the original context; use 'raise ... from None' only if you explicitly want to suppress the context. This improves debuggability across Python files.

Applied to files:

  • backends/advanced/src/advanced_omi_backend/services/obsidian_service.py
  • backends/advanced/src/advanced_omi_backend/chat_service.py
📚 Learning: 2025-12-20T01:11:51.486Z
Learnt from: CR
Repo: chronicler-ai/chronicle PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-20T01:11:51.486Z
Learning: Applies to **/*.py : Always raise errors in Python, never silently ignore - use explicit error handling with proper exceptions rather than silent failures

Applied to files:

  • backends/advanced/src/advanced_omi_backend/services/obsidian_service.py
🧬 Code graph analysis (1)
backends/advanced/src/advanced_omi_backend/chat_service.py (2)
backends/advanced/src/advanced_omi_backend/services/memory/base.py (1)
  • MemoryEntry (26-63)
backends/advanced/src/advanced_omi_backend/services/obsidian_service.py (3)
  • get_obsidian_service (439-448)
  • ObsidianSearchError (51-55)
  • search_obsidian (368-434)
🪛 Ruff (0.14.8)
backends/advanced/src/advanced_omi_backend/services/obsidian_service.py

116-116: Avoid specifying long messages outside the exception class

(TRY003)


120-120: Avoid specifying long messages outside the exception class

(TRY003)


140-140: Avoid specifying long messages outside the exception class

(TRY003)


281-281: Do not catch blind exception: Exception

(BLE001)


282-282: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


286-286: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)


311-311: Probable use of insecure hash functions in hashlib: md5

(S324)


344-344: Avoid specifying long messages outside the exception class

(TRY003)


364-364: Use explicit conversion flag

Replace with conversion flag

(RUF010)


388-388: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


431-431: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

backends/advanced/src/advanced_omi_backend/chat_service.py

338-342: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


345-345: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


346-346: Use raise without specifying exception name

Remove exception name

(TRY201)

🔇 Additional comments (5)
backends/advanced/src/advanced_omi_backend/services/obsidian_service.py (2)

1-56: LGTM!

The module structure, imports, and type definitions are well-organized. The ObsidianSearchError exception with stage differentiation provides useful error context for callers.


102-158: LGTM with note on configuration approach.

The initialization properly uses the shared load_root_config() utility and falls back to environment variables for Neo4j credentials. The descriptive exception messages for missing configuration are appropriate for startup validation.

backends/advanced/src/advanced_omi_backend/chat_service.py (3)

26-30: LGTM!

The new imports are properly organized within the local imports section and introduce the necessary types for Obsidian integration.


288-323: LGTM!

The return type change to List[MemoryEntry] and the corresponding attribute access updates (memory.id, memory.content) are cleaner and provide better type safety than the previous dict-based approach.


363-384: LGTM!

The include_obsidian_memory parameter is properly added to the signature and correctly propagated to format_conversation_context. The integration follows the existing patterns in the codebase.

@AnkushMalaker
Copy link
Collaborator

@0xrushi I think all the places where the config was read has a conflict due to the config refactor PR being merged

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
backends/advanced/src/advanced_omi_backend/routers/modules/health_routes.py (1)

134-136: Inconsistency: Config display values don't use new variables.

Lines 134-136 still reference the module-level _llm_def variable (defined at lines 45-49), not the newly created local variables _llm_provider, _llm_model, and _llm_base_url from the try block above. This is inconsistent with the AI summary stating that display values are now sourced from registry-based defaults.

If the intent is to use the new config-derived values for health display, these lines should be updated to:

-            "llm_provider": (_llm_def.model_provider if _llm_def else None),
-            "llm_model": (_llm_def.model_name if _llm_def else None),
-            "llm_base_url": (_llm_def.model_url if _llm_def else None),
+            "llm_provider": _llm_provider,
+            "llm_model": _llm_model,
+            "llm_base_url": _llm_base_url,
backends/advanced/src/advanced_omi_backend/routers/modules/system_routes.py (1)

81-104: Critical: Duplicate route path causes FastAPI conflict.

Lines 81-87 define POST /admin/memory/config/validate with function validate_memory_config_raw, and lines 98-104 define the exact same route POST /admin/memory/config/validate with function validate_memory_config.

FastAPI does not allow duplicate route paths with the same HTTP method. This will cause a routing conflict—either FastAPI will fail to start, raise a validation error, or only register the first endpoint.

Based on the relevant code snippets from system_controller.py, these appear to be two distinct validation functions. One path needs to be renamed.

🔎 Proposed fix

Rename one of the endpoints to differentiate them:

-@router.post("/admin/memory/config/validate")
+@router.post("/admin/memory/config/validate-raw")
 async def validate_memory_config_raw(
     config_yaml: str = Body(..., media_type="text/plain"),
     current_user: User = Depends(current_superuser),
 ):
     """Validate posted memory YAML (does not save). Admin only."""
     return await system_controller.validate_memory_config_raw(config_yaml)

Alternatively, consolidate into a single validation endpoint if the functionality overlaps.

♻️ Duplicate comments (2)
backends/advanced/src/advanced_omi_backend/routers/modules/health_routes.py (1)

95-108: Critical: Multiple issues including undefined functions and dead code.

This code block has several critical problems:

  1. Undefined functions: _load_root_config() (line 97) and _resolve_value() (lines 104-105) are not imported or defined, causing NameError at runtime.

  2. Dead code pattern (duplicate of past review): Lines 107-108 set fallback values then immediately re-raise the exception, making line 107 unreachable dead code. Past review comments already flagged this exact pattern.

  3. Unused variables: The new variables _llm_provider, _llm_model, and _llm_base_url (lines 103-105) are created but never used. Lines 134-136 still reference the module-level _llm_def variable, not these new local variables.

  4. Exception chaining: Line 108 should use raise instead of raise e, and should chain with from e per project learnings.

🔎 Proposed fix

First, ensure _load_root_config and _resolve_value are imported at the top of the file. Then apply this fix:

     # Load model config once for display fields
     try:
         _cfg = _load_root_config() or {}
         _defaults = _cfg.get("defaults", {}) or {}
         _models = _cfg.get("models", []) or []
         _llm_name = _defaults.get("llm")
         _stt_name = _defaults.get("stt")
         _llm_def = next((m for m in _models if m.get("name") == _llm_name), None)
         _llm_provider = (_llm_def.get("model_provider") if _llm_def else None) or "openai"
         _llm_model = str(_resolve_value(_llm_def.get("model_name", ""))) if _llm_def else None
         _llm_base_url = str(_resolve_value(_llm_def.get("model_url", ""))) if _llm_def else None
     except Exception as e:
+        logger.warning(f"Failed to load configuration from config.yml for health check: {e}")
         _llm_provider, _llm_model, _llm_base_url, _stt_name = "openai", None, None, None
-        raise e

Based on learnings, graceful degradation is preferred for health endpoints, and exception chaining with from e preserves context when re-raising.

Run this script to verify the required imports exist:

#!/bin/bash
# Search for definitions of _load_root_config and _resolve_value
rg -nP --type=py -C3 'def _load_root_config|def _resolve_value'
backends/advanced/src/advanced_omi_backend/chat_service.py (1)

336-345: Use logging.exception() and bare raise for better stack traces.

Both exception handlers should use logging.exception() to automatically capture the full stack trace, and the generic Exception handler should use bare raise to preserve the original exception context.

🔎 Proposed fix
             except ObsidianSearchError as exc:
-                logger.error(
-                    "Failed to get Obsidian context (%s stage): %s",
+                logger.exception(
+                    "Failed to get Obsidian context (%s stage)",
                     exc.stage,
-                    exc,
                 )
                 raise
             except Exception as e:
-                logger.error(f"Failed to get Obsidian context: {e}")
-                raise e
+                logger.exception("Failed to get Obsidian context")
+                raise

Based on learnings from previous reviews in this project and confirmed by static analysis (TRY400, TRY201).

🧹 Nitpick comments (1)
backends/advanced/src/advanced_omi_backend/services/memory/config.py (1)

59-72: Code duplication: resolve_value exists in multiple files.

This function is duplicated in backends/advanced/src/advanced_omi_backend/services/obsidian_service.py (lines 82-96). Consider consolidating into a shared utility module to maintain a single source of truth.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0dfe250 and fa7c532.

⛔ Files ignored due to path filters (1)
  • backends/advanced/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (8)
  • backends/advanced/docker-compose.yml
  • backends/advanced/src/advanced_omi_backend/chat_service.py
  • backends/advanced/src/advanced_omi_backend/controllers/system_controller.py
  • backends/advanced/src/advanced_omi_backend/llm_client.py
  • backends/advanced/src/advanced_omi_backend/routers/modules/health_routes.py
  • backends/advanced/src/advanced_omi_backend/routers/modules/system_routes.py
  • backends/advanced/src/advanced_omi_backend/services/memory/config.py
  • backends/advanced/src/advanced_omi_backend/services/memory/providers/llm_providers.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • backends/advanced/src/advanced_omi_backend/controllers/system_controller.py
  • backends/advanced/docker-compose.yml
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use Black formatter with 100-character line length for Python code
Use isort for Python import organization
ALL imports must be at the top of the file after the docstring - never import modules in the middle of functions or files
Group imports in Python files: standard library, third-party, then local imports
Use lazy imports sparingly and only when absolutely necessary for circular import issues in Python
Always raise errors in Python, never silently ignore - use explicit error handling with proper exceptions rather than silent failures
Avoid defensive hasattr() checks in Python - research and understand input/response or class structure instead

Files:

  • backends/advanced/src/advanced_omi_backend/services/memory/config.py
  • backends/advanced/src/advanced_omi_backend/services/memory/providers/llm_providers.py
  • backends/advanced/src/advanced_omi_backend/chat_service.py
  • backends/advanced/src/advanced_omi_backend/llm_client.py
  • backends/advanced/src/advanced_omi_backend/routers/modules/system_routes.py
  • backends/advanced/src/advanced_omi_backend/routers/modules/health_routes.py
🧠 Learnings (1)
📚 Learning: 2025-12-08T23:52:34.959Z
Learnt from: AnkushMalaker
Repo: chronicler-ai/chronicle PR: 178
File: backends/advanced/src/advanced_omi_backend/services/memory/providers/mycelia.py:218-223
Timestamp: 2025-12-08T23:52:34.959Z
Learning: In Python code (chronicle project), prefer logging.exception() inside except blocks to automatically log the full stack trace. When re-raising exceptions, always chain with 'raise ... from e' to preserve the original context; use 'raise ... from None' only if you explicitly want to suppress the context. This improves debuggability across Python files.

Applied to files:

  • backends/advanced/src/advanced_omi_backend/services/memory/config.py
  • backends/advanced/src/advanced_omi_backend/services/memory/providers/llm_providers.py
  • backends/advanced/src/advanced_omi_backend/chat_service.py
  • backends/advanced/src/advanced_omi_backend/llm_client.py
  • backends/advanced/src/advanced_omi_backend/routers/modules/system_routes.py
  • backends/advanced/src/advanced_omi_backend/routers/modules/health_routes.py
🧬 Code graph analysis (3)
backends/advanced/src/advanced_omi_backend/services/memory/config.py (2)
backends/advanced/src/advanced_omi_backend/services/obsidian_service.py (1)
  • resolve_value (83-97)
backends/advanced/init.py (1)
  • load_config_yml (129-143)
backends/advanced/src/advanced_omi_backend/chat_service.py (2)
backends/advanced/src/advanced_omi_backend/services/memory/base.py (1)
  • MemoryEntry (26-63)
backends/advanced/src/advanced_omi_backend/services/obsidian_service.py (3)
  • get_obsidian_service (439-448)
  • ObsidianSearchError (51-55)
  • search_obsidian (368-434)
backends/advanced/src/advanced_omi_backend/routers/modules/system_routes.py (1)
backends/advanced/src/advanced_omi_backend/controllers/system_controller.py (2)
  • get_memory_config_raw (259-279)
  • update_memory_config_raw (282-317)
🪛 Ruff (0.14.10)
backends/advanced/src/advanced_omi_backend/services/memory/config.py

90-90: Avoid specifying long messages outside the exception class

(TRY003)


281-281: Use raise without specifying exception name

Remove exception name

(TRY201)

backends/advanced/src/advanced_omi_backend/services/memory/providers/llm_providers.py

284-284: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

backends/advanced/src/advanced_omi_backend/chat_service.py

337-341: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


344-344: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


345-345: Use raise without specifying exception name

Remove exception name

(TRY201)

backends/advanced/src/advanced_omi_backend/routers/modules/system_routes.py

77-77: Unused function argument: current_user

(ARG001)


77-77: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


84-84: Unused function argument: current_user

(ARG001)


84-84: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


92-92: Unused function argument: current_user

(ARG001)


92-92: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

backends/advanced/src/advanced_omi_backend/routers/modules/health_routes.py

97-97: Undefined name _load_root_config

(F821)


104-104: Undefined name _resolve_value

(F821)


105-105: Undefined name _resolve_value

(F821)


108-108: Use raise without specifying exception name

Remove exception name

(TRY201)

🔇 Additional comments (11)
backends/advanced/src/advanced_omi_backend/services/memory/providers/llm_providers.py (1)

75-92: LGTM! Well-structured async embedding helper.

The new generate_openai_embeddings function cleanly extracts the embedding generation logic and reuses the existing _get_openai_client helper for consistent client creation with optional Langfuse tracing.

backends/advanced/src/advanced_omi_backend/services/memory/config.py (1)

74-90: LGTM! Config loading with multiple fallback paths.

The function appropriately searches multiple standard locations for config.yml with sensible defaults for both Docker and development environments.

backends/advanced/src/advanced_omi_backend/llm_client.py (1)

59-64: LGTM! Config-driven initialization without environment fallbacks.

The constructor correctly requires all parameters to be explicitly provided (removing environment variable fallbacks), which aligns with the config-driven architecture. The validation on lines 63-64 provides clear error messaging for missing configuration.

backends/advanced/src/advanced_omi_backend/chat_service.py (8)

26-30: LGTM: Clean import additions.

The new imports for MemoryEntry and Obsidian service components are properly placed and grouped with local imports, following the project's coding guidelines.


287-287: LGTM: Type safety improvement.

Changing the return type to List[MemoryEntry] improves type safety and aligns with the memory service API, enabling cleaner attribute-based access to memory data.


302-302: LGTM: Backward-compatible signature change.

The addition of include_obsidian_memory: bool = False is backward compatible and provides a clean feature toggle for Obsidian integration.


310-310: LGTM: Cleaner attribute access.

The migration from dict-style access (memory.get("id"), memory.get("memory")) to direct attribute access (memory.id, memory.content) is cleaner and type-safe with the MemoryEntry dataclass.

Also applies to: 319-319


324-335: LGTM: Well-structured Obsidian integration.

The core logic for integrating Obsidian vault search is clean:

  • Properly awaits the async search
  • Defensively checks for results before appending
  • Maintains consistent formatting with memory context
  • Includes helpful logging for observability

363-363: LGTM: Consistent parameter propagation.

The addition of include_obsidian_memory: bool = False to generate_response_stream properly propagates the feature flag through the call chain and maintains backward compatibility.


381-383: LGTM: Correct parameter forwarding.

The include_obsidian_memory parameter is correctly forwarded from generate_response_stream to format_conversation_context, enabling the Obsidian integration when requested.


577-583: LGTM: Clean service lifecycle logging.

The cleanup log message provides good visibility into the service lifecycle and is symmetric with the initialization log.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backends/advanced/src/advanced_omi_backend/routers/modules/system_routes.py (1)

92-115: Critical: Duplicate route registration.

Both validate_memory_config_raw (line 92) and validate_memory_config (line 109) are registered to the same path and HTTP method: POST /admin/memory/config/validate. FastAPI cannot handle duplicate routes and will either fail at startup or unpredictably override one handler with the other.

Looking at the pattern established by other endpoints:

  • get_memory_config_raw uses /admin/memory/config/raw (GET)
  • update_memory_config_raw uses /admin/memory/config/raw (POST)

The newly added validate_memory_config_raw should either:

  1. Be removed entirely if validate_memory_config (which accepts a structured model) is sufficient, or
  2. Use a different path like /admin/memory/config/raw/validate to follow the established naming pattern
🔎 Recommended fix: remove duplicate or use distinct path

Option 1: Remove the duplicate validate_memory_config_raw endpoint if the structured model approach is sufficient:

-@router.post("/admin/memory/config/validate")
-async def validate_memory_config_raw(
-    config_yaml: str = Body(..., media_type="text/plain"),
-    current_user: User = Depends(current_superuser),
-):
-    """Validate posted memory YAML (does not save). Admin only."""
-    return await system_controller.validate_memory_config_raw(config_yaml)
-

Option 2: Use a distinct path following the established pattern:

-@router.post("/admin/memory/config/validate")
+@router.post("/admin/memory/config/raw/validate")
 async def validate_memory_config_raw(
     config_yaml: str = Body(..., media_type="text/plain"),
     current_user: User = Depends(current_superuser),
 ):
     """Validate posted memory YAML (does not save). Admin only."""
     return await system_controller.validate_memory_config_raw(config_yaml)
♻️ Duplicate comments (11)
backends/advanced/src/advanced_omi_backend/routers/modules/system_routes.py (1)

86-86: Already flagged: Misleading comment.

This comment was already identified in a previous review as misleading because the endpoints still exist below.

backends/advanced/src/advanced_omi_backend/services/obsidian_service.py (5)

102-157: Re-evaluate architectural approach per maintainer feedback.

Per maintainer feedback, this service appears to duplicate existing functionality. The current memory router's add_memory already writes to Neo4j via OpenMemory. Consider refactoring to use the existing memory infrastructure (services/memory/providers/llm_provider and memory router) rather than implementing a parallel ingestion system.

If Obsidian-specific characteristics require custom handling, consider creating an ObsidianMemory model and using a custom prompt through the memory controller. This approach would reduce code duplication and leverage proven infrastructure.

Based on maintainer feedback: "This can be achieved using the current infra by just passing the text to the memory router add_memory (which writes to neo4J using openmemory)."


281-283: Remove redundant exception from log message.

logger.exception() automatically includes exception details, so the f"Embedding generation failed for {note_data['path']}: {e}" is redundant.

🔎 Proposed fix
         except Exception as e:
-            logger.exception(f"Embedding generation failed for {note_data['path']}: {e}")
+            logger.exception("Embedding generation failed for %s", note_data['path'])
             return []

Based on learnings from this repository.


362-364: Remove redundant exception from log message.

logger.exception() automatically includes exception details, so the f"Processing {file} failed" message is sufficient without the redundant exception reference.

🔎 Proposed fix
                     except Exception as e:
-                        logger.exception(f"Processing {file} failed")
+                        logger.exception("Processing %s failed", file)
                         errors.append(f"{file}: {str(e)}")

Based on learnings from this repository.


445-448: Lazy initialization is not thread-safe.

The check-then-set pattern has a race condition: multiple threads could simultaneously pass the if _obsidian_service is None check and create multiple service instances, each with separate Neo4j driver connections.

🔎 Proposed fix using double-checked locking
+import threading
+
 # Lazy initialization to avoid startup failures
 _obsidian_service: Optional[ObsidianService] = None
+_service_lock = threading.Lock()

 def get_obsidian_service() -> ObsidianService:
     """Get or create the Obsidian service singleton.
     
     Returns:
         ObsidianService instance, creating it on first access.
     """
     global _obsidian_service
     if _obsidian_service is None:
-        _obsidian_service = ObsidianService()
+        with _service_lock:
+            # Double-check pattern
+            if _obsidian_service is None:
+                _obsidian_service = ObsidianService()
     return _obsidian_service

430-432: Use logger.exception() for better stack traces.

Per repository learnings, use logger.exception() instead of logger.error() to capture the full stack trace for better debuggability.

🔎 Proposed fix
         except Exception as e:
-            logger.error(f"Obsidian search failed: {e}")
+            logger.exception("Obsidian search failed")
             raise ObsidianSearchError("database", str(e)) from e

Based on learnings from this repository.

backends/advanced/src/advanced_omi_backend/routers/modules/obsidian_routes.py (5)

46-48: Use logger.exception() and add exception chaining.

Per repository learnings and past review comments, replace logger.error() with logger.exception() to capture the full stack trace, and chain the exception with from e when re-raising.

🔎 Proposed fix
     except Exception as e:
-        logger.error(f"Ingestion failed: {e}")
-        raise HTTPException(status_code=500, detail=str(e))
+        logger.exception("Ingestion failed")
+        raise HTTPException(status_code=500, detail=str(e)) from e

Based on learnings from this repository.


82-83: Use logger.exception() and add exception chaining.

Per repository learnings, use logger.exception() inside except blocks to capture the full stack trace, and chain exceptions with from e.

🔎 Proposed fix
         except IOError as e:
-            logger.error(f"Error writing zip file {zip_path}: {e}")
-            raise HTTPException(status_code=500, detail=f"Failed to save uploaded zip: {e}")
+            logger.exception("Error writing zip file %s", zip_path)
+            raise HTTPException(status_code=500, detail=f"Failed to save uploaded zip: {e}") from e

Based on learnings from this repository.


88-93: Fix redundant exception in log and add exception chaining.

Line 89: logger.exception() automatically includes exception details, so the f"Invalid zip file: {e}" is redundant. Line 92: Use logger.exception() instead of logger.error(). Both lines need exception chaining.

🔎 Proposed fix
         except zipfile.BadZipFile as e:
-            logger.exception(f"Invalid zip file: {e}")
-            raise HTTPException(status_code=400, detail=f"Invalid zip file: {e}")
+            logger.exception("Invalid zip file")
+            raise HTTPException(status_code=400, detail=f"Invalid zip file: {e}") from e
         except ZipExtractionError as e:
-            logger.error(f"Error extracting zip file: {e}")
-            raise HTTPException(status_code=500, detail=f"Failed to extract zip file: {e}")
+            logger.exception("Error extracting zip file")
+            raise HTTPException(status_code=500, detail=f"Failed to extract zip file: {e}") from e

Based on learnings from this repository.


111-113: Fix redundant exception in log and add exception chaining.

logger.exception() automatically includes exception details, so the f"Failed to process uploaded zip: {e}" is redundant. Also add exception chaining.

🔎 Proposed fix
     except Exception as e:
-        logger.exception(f"Failed to process uploaded zip: {e}")
-        raise HTTPException(status_code=500, detail=f"Failed to process uploaded zip: {e}")
+        logger.exception("Failed to process uploaded zip")
+        raise HTTPException(status_code=500, detail=f"Failed to process uploaded zip: {e}") from e

Based on learnings from this repository.


151-153: Fix redundant exception in log and add exception chaining.

logger.exception() automatically includes exception details, so the f"Failed to start job {job_id}: {e}" is redundant. Also add exception chaining.

🔎 Proposed fix
         except Exception as e:
-            logger.exception(f"Failed to start job {job_id}: {e}")
-            raise HTTPException(status_code=500, detail=f"Failed to start job: {e}")
+            logger.exception("Failed to start job %s", job_id)
+            raise HTTPException(status_code=500, detail=f"Failed to start job: {e}") from e

Based on learnings from this repository.

🧹 Nitpick comments (2)
backends/advanced/src/advanced_omi_backend/services/obsidian_service.py (2)

57-97: Consider consolidating env loading with existing utilities.

Per maintainer feedback, there are existing env loaders in upload_audio/load_env_file, root/scripts/lib/env_utils, and wizard.py. Consider consolidating these utilities rather than duplicating env loading logic in this service.

Based on maintainer feedback.


286-291: Add strict=True to zip() for safety.

Without strict=True, if original_chunks and vectors have different lengths, zip() will silently truncate to the shorter length, potentially losing chunks. This could happen if embedding generation partially fails.

🔎 Proposed fix
         chunk_payloads: List[ChunkPayload] = []
-        for orig_text, vector in zip(original_chunks, vectors):
+        for orig_text, vector in zip(original_chunks, vectors, strict=True):
             if vector:
                 chunk_payloads.append({"text": orig_text, "embedding": vector})
             else:
                 logger.warning(f"Embedding missing for chunk in {note_data['path']}")

Note: This requires Python 3.10+. If targeting older Python, keep the current behavior but consider adding a length check before the loop.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fa7c532 and 4572bb7.

📒 Files selected for processing (5)
  • backends/advanced/src/advanced_omi_backend/controllers/system_controller.py
  • backends/advanced/src/advanced_omi_backend/routers/modules/obsidian_routes.py
  • backends/advanced/src/advanced_omi_backend/routers/modules/system_routes.py
  • backends/advanced/src/advanced_omi_backend/services/obsidian_service.py
  • backends/advanced/src/advanced_omi_backend/utils/model_utils.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • backends/advanced/src/advanced_omi_backend/controllers/system_controller.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use Black formatter with 100-character line length for Python code
Use isort for Python import organization
ALL imports must be at the top of the file after the docstring - never import modules in the middle of functions or files
Group imports in Python files: standard library, third-party, then local imports
Use lazy imports sparingly and only when absolutely necessary for circular import issues in Python
Always raise errors in Python, never silently ignore - use explicit error handling with proper exceptions rather than silent failures
Avoid defensive hasattr() checks in Python - research and understand input/response or class structure instead

Files:

  • backends/advanced/src/advanced_omi_backend/utils/model_utils.py
  • backends/advanced/src/advanced_omi_backend/routers/modules/system_routes.py
  • backends/advanced/src/advanced_omi_backend/services/obsidian_service.py
  • backends/advanced/src/advanced_omi_backend/routers/modules/obsidian_routes.py
🧠 Learnings (2)
📚 Learning: 2025-12-08T23:52:34.959Z
Learnt from: AnkushMalaker
Repo: chronicler-ai/chronicle PR: 178
File: backends/advanced/src/advanced_omi_backend/services/memory/providers/mycelia.py:218-223
Timestamp: 2025-12-08T23:52:34.959Z
Learning: In Python code (chronicle project), prefer logging.exception() inside except blocks to automatically log the full stack trace. When re-raising exceptions, always chain with 'raise ... from e' to preserve the original context; use 'raise ... from None' only if you explicitly want to suppress the context. This improves debuggability across Python files.

Applied to files:

  • backends/advanced/src/advanced_omi_backend/utils/model_utils.py
  • backends/advanced/src/advanced_omi_backend/routers/modules/system_routes.py
  • backends/advanced/src/advanced_omi_backend/services/obsidian_service.py
  • backends/advanced/src/advanced_omi_backend/routers/modules/obsidian_routes.py
📚 Learning: 2025-12-20T01:11:51.495Z
Learnt from: CR
Repo: chronicler-ai/chronicle PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-20T01:11:51.495Z
Learning: Applies to **/*.py : Always raise errors in Python, never silently ignore - use explicit error handling with proper exceptions rather than silent failures

Applied to files:

  • backends/advanced/src/advanced_omi_backend/routers/modules/obsidian_routes.py
🧬 Code graph analysis (3)
backends/advanced/src/advanced_omi_backend/routers/modules/system_routes.py (1)
backends/advanced/src/advanced_omi_backend/controllers/system_controller.py (2)
  • get_memory_config_raw (260-280)
  • update_memory_config_raw (283-318)
backends/advanced/src/advanced_omi_backend/services/obsidian_service.py (4)
backends/advanced/src/advanced_omi_backend/services/memory/providers/llm_providers.py (1)
  • generate_openai_embeddings (75-91)
backends/advanced/src/advanced_omi_backend/services/memory/config.py (1)
  • load_config_yml (74-90)
backends/advanced/src/advanced_omi_backend/utils/model_utils.py (1)
  • get_model_config (11-45)
backends/advanced/src/advanced_omi_backend/services/neo4j_client.py (5)
  • Neo4jClient (7-32)
  • close (26-29)
  • reset (31-32)
  • session (20-24)
  • session (42-43)
backends/advanced/src/advanced_omi_backend/routers/modules/obsidian_routes.py (3)
backends/advanced/src/advanced_omi_backend/utils/file_utils.py (2)
  • extract_zip (21-81)
  • ZipExtractionError (16-18)
backends/advanced/src/advanced_omi_backend/workers/obsidian_jobs.py (2)
  • count_markdown_files (17-25)
  • ingest_obsidian_vault_job (29-107)
backends/advanced/src/advanced_omi_backend/services/obsidian_service.py (1)
  • ingest_vault (328-366)
🪛 Ruff (0.14.10)
backends/advanced/src/advanced_omi_backend/utils/model_utils.py

39-39: Avoid specifying long messages outside the exception class

(TRY003)


45-45: Avoid specifying long messages outside the exception class

(TRY003)

backends/advanced/src/advanced_omi_backend/routers/modules/system_routes.py

88-88: Unused function argument: current_user

(ARG001)


88-88: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


95-95: Unused function argument: current_user

(ARG001)


95-95: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


103-103: Unused function argument: current_user

(ARG001)


103-103: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

backends/advanced/src/advanced_omi_backend/services/obsidian_service.py

116-116: Avoid specifying long messages outside the exception class

(TRY003)


120-120: Avoid specifying long messages outside the exception class

(TRY003)


140-140: Avoid specifying long messages outside the exception class

(TRY003)


282-282: Redundant exception object included in logging.exception call

(TRY401)


286-286: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)


311-311: Probable use of insecure hash functions in hashlib: md5

(S324)


344-344: Avoid specifying long messages outside the exception class

(TRY003)


364-364: Use explicit conversion flag

Replace with conversion flag

(RUF010)


388-388: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


431-431: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

backends/advanced/src/advanced_omi_backend/routers/modules/obsidian_routes.py

34-34: Unused function argument: current_user

(ARG001)


34-34: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


45-45: Consider moving this statement to an else block

(TRY300)


46-46: Do not catch blind exception: Exception

(BLE001)


47-47: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


48-48: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


53-53: Do not perform function call File in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


54-54: Unused function argument: current_user

(ARG001)


54-54: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


82-82: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


83-83: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


89-89: Redundant exception object included in logging.exception call

(TRY401)


90-90: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


92-92: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


93-93: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


112-112: Redundant exception object included in logging.exception call

(TRY401)


113-113: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


119-119: Do not catch blind exception: Exception

(BLE001)


126-126: Unused function argument: current_user

(ARG001)


126-126: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


150-150: Consider moving this statement to an else block

(TRY300)


152-152: Redundant exception object included in logging.exception call

(TRY401)


153-153: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


166-166: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


170-170: Unused function argument: current_user

(ARG001)


170-170: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


221-221: Do not use bare except

(E722)


222-222: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


223-223: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🔇 Additional comments (3)
backends/advanced/src/advanced_omi_backend/utils/model_utils.py (1)

1-46: LGTM! Well-structured utility with proper error handling.

The implementation correctly looks up model configurations from the config data structure, with appropriate error handling that raises ValueError when defaults or models are not found. The function signature accurately reflects that it always returns a dictionary (addressed in previous reviews), and the docstring provides clear usage examples.

The code follows the project's coding guidelines:

  • All imports at the top after the docstring
  • Explicit error raising rather than silent failures
  • Avoids defensive hasattr() checks, relying on the documented contract that config_data has the expected structure

Note: The Ruff TRY003 hints about "long messages outside exception class" can be safely ignored—inline error messages are standard practice for simple utilities like this, and custom exception classes would reduce readability without adding value.

backends/advanced/src/advanced_omi_backend/routers/modules/system_routes.py (2)

89-89: Docstring updates look good.

The updated docstrings correctly reflect that config.yml is the source of truth for memory configuration, which aligns with the implementation shown in the system_controller snippets.

Also applies to: 105-105, 120-120


92-98: Verify that validation methods exist in system_controller.

Ensure that both validate_memory_config_raw and validate_memory_config methods are implemented in system_controller. The provided code snippets only show get_memory_config_raw and update_memory_config_raw.

#!/bin/bash
# Description: Verify that the validation methods exist in system_controller

# Search for the validation method definitions in system_controller
rg -n -A 5 'async def validate_memory_config' --type=py -g '**/system_controller.py'

Also applies to: 109-115

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
backends/advanced/src/advanced_omi_backend/services/memory/config.py (1)

316-316: Use bare raise to preserve exception context.

This issue was flagged in a previous review. When re-raising exceptions, use bare raise instead of raise e to preserve the full traceback.

🔎 Proposed fix
     except Exception as e:
         memory_logger.exception(
             f"Failed to get embedding dimensions from registry for model '{embedding_model}'"
         )
-        raise e
+        raise

Based on learnings, bare raise preserves the original exception context and full traceback.

🧹 Nitpick comments (1)
backends/advanced/pyproject.toml (1)

59-59: Consider pinning to a specific commit SHA instead of the "main" branch.

Using rev = "main" instead of a specific commit SHA or tag means different builds at different times will pull different versions of mem0ai, reducing build reproducibility and potentially introducing unexpected changes.

🔎 Suggested approach to pin to a specific commit

After verifying that PR #3250 is merged to main, pin to the specific commit SHA:

-mem0ai = { git = "https://github.com/AnkushMalaker/mem0.git", rev = "main" }
+mem0ai = { git = "https://github.com/AnkushMalaker/mem0.git", rev = "<commit-sha>" }

Replace <commit-sha> with the actual commit hash from the main branch that includes the fix.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4572bb7 and 1bb02c1.

⛔ Files ignored due to path filters (1)
  • backends/advanced/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (3)
  • backends/advanced/docker-compose.yml
  • backends/advanced/pyproject.toml
  • backends/advanced/src/advanced_omi_backend/services/memory/config.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • backends/advanced/docker-compose.yml
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use Black formatter with 100-character line length for Python code
Use isort for Python import organization
ALL imports must be at the top of the file after the docstring - never import modules in the middle of functions or files
Group imports in Python files: standard library, third-party, then local imports
Use lazy imports sparingly and only when absolutely necessary for circular import issues in Python
Always raise errors in Python, never silently ignore - use explicit error handling with proper exceptions rather than silent failures
Avoid defensive hasattr() checks in Python - research and understand input/response or class structure instead

Files:

  • backends/advanced/src/advanced_omi_backend/services/memory/config.py
🧠 Learnings (1)
📚 Learning: 2025-12-08T23:52:34.959Z
Learnt from: AnkushMalaker
Repo: chronicler-ai/chronicle PR: 178
File: backends/advanced/src/advanced_omi_backend/services/memory/providers/mycelia.py:218-223
Timestamp: 2025-12-08T23:52:34.959Z
Learning: In Python code (chronicle project), prefer logging.exception() inside except blocks to automatically log the full stack trace. When re-raising exceptions, always chain with 'raise ... from e' to preserve the original context; use 'raise ... from None' only if you explicitly want to suppress the context. This improves debuggability across Python files.

Applied to files:

  • backends/advanced/src/advanced_omi_backend/services/memory/config.py
🧬 Code graph analysis (1)
backends/advanced/src/advanced_omi_backend/services/memory/config.py (2)
backends/advanced/src/advanced_omi_backend/services/obsidian_service.py (1)
  • resolve_value (83-97)
backends/advanced/init.py (1)
  • load_config_yml (129-143)
🪛 Ruff (0.14.10)
backends/advanced/src/advanced_omi_backend/services/memory/config.py

90-90: Avoid specifying long messages outside the exception class

(TRY003)


316-316: Use raise without specifying exception name

Remove exception name

(TRY201)

@AnkushMalaker
Copy link
Collaborator

Can you also provide some obsidian test data and put into tests/test_assets ?

@thestumonkey https://github.com/s-blu/obsidian_dataview_example_vault
This is the example vault used
And the edits made in the screenshot, ie, add a fake birthday name, then ingest the vault and ask the specific question

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (4)
backends/advanced/src/advanced_omi_backend/services/memory/config.py (1)

315-315: Use bare raise to preserve exception context.

As noted in previous review, use bare raise instead of raise e to preserve the full traceback.

🔎 Proposed fix
     except Exception as e:
         memory_logger.exception(
             f"Failed to get embedding dimensions from registry for model '{embedding_model}'"
         )
-        raise e
+        raise

Based on learnings, when re-raising exceptions, use bare raise to preserve the original context.

backends/advanced/src/advanced_omi_backend/services/obsidian_service.py (3)

406-408: Use logger.exception() for automatic stack trace capture.

Per project learnings, use logger.exception() inside except blocks to automatically capture the full stack trace for better debuggability, even when re-raising the exception.

🔎 Proposed fix
         except Exception as exc:
-            logger.error("Obsidian search embedding failed: %s", exc)
+            logger.exception("Obsidian search embedding failed")
             raise ObsidianSearchError("embedding", str(exc)) from exc

Based on learnings from this repository.


449-451: Use logger.exception() for better stack traces.

Per repository learnings, use logger.exception() instead of logger.error() inside except blocks to capture the full stack trace for better debuggability.

🔎 Proposed fix
         except Exception as e:
-            logger.error(f"Obsidian search failed: {e}")
+            logger.exception("Obsidian search failed")
             raise ObsidianSearchError("database", str(e)) from e

Based on learnings from this repository.


456-469: Singleton initialization is not thread-safe.

The check-then-set pattern in get_obsidian_service() has a race condition. In a concurrent environment (e.g., FastAPI handling multiple requests), multiple threads could simultaneously pass the if _obsidian_service is None check and create multiple service instances, each with separate Neo4j driver connections. This could lead to resource leaks and inconsistent state.

🔎 Proposed fix using threading.Lock
+import threading
+
 # Lazy initialization to avoid startup failures
 _obsidian_service: Optional[ObsidianService] = None
+_service_lock = threading.Lock()


 def get_obsidian_service() -> ObsidianService:
     """Get or create the Obsidian service singleton.
     
     Returns:
         ObsidianService instance, creating it on first access.
     """
     global _obsidian_service
     if _obsidian_service is None:
-        _obsidian_service = ObsidianService()
+        with _service_lock:
+            # Double-check pattern: another thread may have initialized it
+            if _obsidian_service is None:
+                _obsidian_service = ObsidianService()
     return _obsidian_service
🧹 Nitpick comments (6)
backends/advanced/src/advanced_omi_backend/services/memory/config.py (1)

102-102: Add explicit Optional type hint for api_key parameter.

The api_key parameter has a default value of None but lacks an explicit Optional type hint, which violates PEP 484.

🔎 Proposed fix
 def create_mycelia_config(
-    api_url: str = "http://localhost:8080", api_key: str = None, timeout: int = 30, **kwargs
+    api_url: str = "http://localhost:8080", api_key: Optional[str] = None, timeout: int = 30, **kwargs
 ) -> Dict[str, Any]:
backends/advanced/src/advanced_omi_backend/services/obsidian_service.py (5)

64-91: Consider consolidating with existing env loaders.

Per maintainer feedback, there are existing env loaders in upload_audio/load_env_file, root/scripts/lib/env_utils, and wizard.py. This function duplicates that functionality. Consider using or consolidating with existing utilities to reduce code duplication.


186-189: Consider moving to shared utils for reusability.

Per maintainer feedback, this text normalization function could be useful for other LLM services and isn't specific to Obsidian. Consider moving it to a shared utilities module (e.g., utils/text_utils.py or services/memory/utils.py) so other services can leverage it.


283-285: Remove redundant exception object from log message.

logger.exception() automatically includes exception details in the log output, so including {e} in the message is redundant.

🔎 Proposed fix
         except Exception as e:
-            logger.exception(f"Embedding generation failed for {note_data['path']}: {e}")
+            logger.exception("Embedding generation failed for %s", note_data['path'])
             return []

Based on learnings from this repository.


288-290: Add strict parameter to zip() for safety.

The zip() call lacks an explicit strict= parameter. If generate_openai_embeddings returns fewer vectors than expected (due to API failures or rate limiting), the silent truncation could cause data loss without any error signal. Using strict=True ensures lengths match and raises an error on mismatch.

🔎 Proposed fix
         chunk_payloads: List[ChunkPayload] = []
-        for orig_text, vector in zip(original_chunks, vectors):
+        for orig_text, vector in zip(original_chunks, vectors, strict=True):
             if vector:
                 chunk_payloads.append({"text": orig_text, "embedding": vector})
             else:

383-383: Consider using conversion flag instead of explicit str().

For consistency with modern Python string formatting conventions, consider using the !s conversion flag instead of explicit str() call.

🔎 Proposed fix
-                        errors.append(f"{file}: {str(e)}")
+                        errors.append(f"{file}: {e!s}")
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1bb02c1 and 38ea008.

⛔ Files ignored due to path filters (1)
  • backends/advanced/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (4)
  • backends/advanced/pyproject.toml
  • backends/advanced/src/advanced_omi_backend/services/memory/config.py
  • backends/advanced/src/advanced_omi_backend/services/obsidian_service.py
  • backends/advanced/src/advanced_omi_backend/utils/config_utils.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • backends/advanced/pyproject.toml
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use Black formatter with 100-character line length for Python code
Use isort for Python import organization
ALL imports must be at the top of the file after the docstring - never import modules in the middle of functions or files
Group imports in Python files: standard library, third-party, then local imports
Use lazy imports sparingly and only when absolutely necessary for circular import issues in Python
Always raise errors in Python, never silently ignore - use explicit error handling with proper exceptions rather than silent failures
Avoid defensive hasattr() checks in Python - research and understand input/response or class structure instead

Files:

  • backends/advanced/src/advanced_omi_backend/utils/config_utils.py
  • backends/advanced/src/advanced_omi_backend/services/obsidian_service.py
  • backends/advanced/src/advanced_omi_backend/services/memory/config.py
🧠 Learnings (1)
📚 Learning: 2025-12-08T23:52:34.959Z
Learnt from: AnkushMalaker
Repo: chronicler-ai/chronicle PR: 178
File: backends/advanced/src/advanced_omi_backend/services/memory/providers/mycelia.py:218-223
Timestamp: 2025-12-08T23:52:34.959Z
Learning: In Python code (chronicle project), prefer logging.exception() inside except blocks to automatically log the full stack trace. When re-raising exceptions, always chain with 'raise ... from e' to preserve the original context; use 'raise ... from None' only if you explicitly want to suppress the context. This improves debuggability across Python files.

Applied to files:

  • backends/advanced/src/advanced_omi_backend/utils/config_utils.py
  • backends/advanced/src/advanced_omi_backend/services/obsidian_service.py
  • backends/advanced/src/advanced_omi_backend/services/memory/config.py
🧬 Code graph analysis (1)
backends/advanced/src/advanced_omi_backend/services/obsidian_service.py (5)
backends/advanced/src/advanced_omi_backend/services/memory/providers/llm_providers.py (2)
  • generate_openai_embeddings (75-91)
  • chunk_text_with_spacy (102-155)
backends/advanced/src/advanced_omi_backend/services/memory/config.py (1)
  • load_config_yml (67-83)
backends/advanced/src/advanced_omi_backend/utils/model_utils.py (1)
  • get_model_config (11-45)
backends/advanced/src/advanced_omi_backend/utils/config_utils.py (1)
  • resolve_value (7-20)
backends/advanced/src/advanced_omi_backend/services/neo4j_client.py (7)
  • Neo4jClient (7-32)
  • Neo4jReadInterface (56-58)
  • Neo4jWriteInterface (61-63)
  • close (26-29)
  • reset (31-32)
  • session (20-24)
  • session (42-43)
🪛 Ruff (0.14.10)
backends/advanced/src/advanced_omi_backend/services/obsidian_service.py

110-110: Avoid specifying long messages outside the exception class

(TRY003)


114-114: Avoid specifying long messages outside the exception class

(TRY003)


138-138: Avoid specifying long messages outside the exception class

(TRY003)


284-284: Redundant exception object included in logging.exception call

(TRY401)


288-288: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)


318-318: Probable use of insecure hash functions in hashlib: md5

(S324)


363-363: Avoid specifying long messages outside the exception class

(TRY003)


383-383: Use explicit conversion flag

Replace with conversion flag

(RUF010)


407-407: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


450-450: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

backends/advanced/src/advanced_omi_backend/services/memory/config.py

83-83: Avoid specifying long messages outside the exception class

(TRY003)


102-102: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


315-315: Use raise without specifying exception name

Remove exception name

(TRY201)

🔇 Additional comments (4)
backends/advanced/src/advanced_omi_backend/utils/config_utils.py (1)

7-20: Well-implemented utility function that resolves code duplication.

This extraction addresses the duplication issue previously flagged in past reviews. The implementation cleanly handles both ${VAR} and ${VAR:-default} patterns for environment variable resolution.

backends/advanced/src/advanced_omi_backend/services/memory/config.py (3)

67-83: Clean implementation for multi-environment config loading.

The function appropriately searches standard locations (Docker, relative paths) and provides helpful error messages when config.yml is not found.


115-132: Excellent use of keyword-only parameters and sensible defaults.

The function is well-structured with clear parameter organization and appropriate default values for OpenAI-compatible configurations.


135-147: Clean configuration helper with appropriate defaults.

The function provides sensible defaults for Qdrant vector store setup, including the standard embedding dimension for OpenAI models.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
status.py (1)

71-71: Remove redundant import inside function.

The dotenv_values function is already imported at the module level (Line 21). This mid-function import violates the coding guidelines which state: "ALL imports must be at the top of the file after the docstring - never import modules in the middle of functions or files."

As per coding guidelines, all imports should be at the top of the file.

🔎 Proposed fix
         # Handle speaker-recognition profiles
         if service_name == 'speaker-recognition':
-            from dotenv import dotenv_values
             env_file = service_path / '.env'
🧹 Nitpick comments (2)
wizard.py (2)

400-400: Remove unnecessary f-string prefix.

This string has no placeholders, so the f prefix is unnecessary.

🔎 Proposed fix
-    console.print(f"\n🎊 [bold green]Setup Complete![/bold green]")
+    console.print("\n🎊 [bold green]Setup Complete![/bold green]")

408-410: Remove unnecessary f-string prefix.

This string has no placeholders, so the f prefix is unnecessary.

🔎 Proposed fix
-        console.print(f"\n📚 [bold cyan]Obsidian Integration Detected[/bold cyan]")
+        console.print("\n📚 [bold cyan]Obsidian Integration Detected[/bold cyan]")
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 38ea008 and d3cbf57.

📒 Files selected for processing (4)
  • backends/advanced/docker-compose.yml
  • services.py
  • status.py
  • wizard.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • backends/advanced/docker-compose.yml
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use Black formatter with 100-character line length for Python code
Use isort for Python import organization
ALL imports must be at the top of the file after the docstring - never import modules in the middle of functions or files
Group imports in Python files: standard library, third-party, then local imports
Use lazy imports sparingly and only when absolutely necessary for circular import issues in Python
Always raise errors in Python, never silently ignore - use explicit error handling with proper exceptions rather than silent failures
Avoid defensive hasattr() checks in Python - research and understand input/response or class structure instead

Files:

  • services.py
  • wizard.py
  • status.py
🧠 Learnings (1)
📚 Learning: 2025-12-08T23:52:34.959Z
Learnt from: AnkushMalaker
Repo: chronicler-ai/chronicle PR: 178
File: backends/advanced/src/advanced_omi_backend/services/memory/providers/mycelia.py:218-223
Timestamp: 2025-12-08T23:52:34.959Z
Learning: In Python code (chronicle project), prefer logging.exception() inside except blocks to automatically log the full stack trace. When re-raising exceptions, always chain with 'raise ... from e' to preserve the original context; use 'raise ... from None' only if you explicitly want to suppress the context. This improves debuggability across Python files.

Applied to files:

  • services.py
  • wizard.py
  • status.py
🪛 Ruff (0.14.10)
wizard.py

400-400: f-string without any placeholders

Remove extraneous f prefix

(F541)


408-408: f-string without any placeholders

Remove extraneous f prefix

(F541)

status.py

60-60: Local variable dotenv_values referenced before assignment

(F823)

🔇 Additional comments (3)
services.py (1)

77-86: LGTM! Obsidian/Neo4j profile detection implemented correctly.

The logic properly reads the environment file, validates the NEO4J_HOST value against placeholders, and conditionally enables the obsidian profile. The implementation follows the established pattern used for HTTPS profile detection above.

Minor note: The empty string '' in the exclusion list on Line 84 is redundant since the condition if neo4j_host already filters out empty strings, but this doesn't affect correctness.

wizard.py (1)

390-397: LGTM! Obsidian/Neo4j detection logic is sound.

The implementation correctly detects Obsidian configuration by reading the NEO4J_HOST value from the backend .env file and checking against known placeholders. The logic appropriately gates the check to only run when the advanced backend was selected and successfully configured.

status.py (1)

48-67: LGTM! Multiple profile detection implemented correctly.

The refactored logic properly accumulates multiple profiles (https and obsidian) and constructs the docker compose command with all applicable --profile flags. The command construction on Line 67 correctly flattens the profile list into the command arguments.

AnkushMalaker and others added 4 commits December 24, 2025 15:52
- Removed the deprecated `validate_memory_config_raw` endpoint and replaced it with a new endpoint that accepts plain text for validation.
- Updated the existing `validate_memory_config` endpoint to clarify that it now accepts JSON input.
- Adjusted the API call in the frontend to point to the new validation endpoint.
- Updated the health check function to load model configuration from the models registry instead of the root config.
- Improved error handling by logging warnings when model configuration loading fails.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants