Skip to content

audio upload extension with gdrive credentials#175

Closed
01PrathamS wants to merge 11 commits intoSimpleOpenSoftware:mainfrom
01PrathamS:audio_upload_extend
Closed

audio upload extension with gdrive credentials#175
01PrathamS wants to merge 11 commits intoSimpleOpenSoftware:mainfrom
01PrathamS:audio_upload_extend

Conversation

@01PrathamS
Copy link
Contributor

@01PrathamS 01PrathamS commented Dec 5, 2025

Feature: Google Drive Public Link Audio Upload

This PR introduces the ability to import and process audio files directly from a publicly shared Google Drive folder link, leveraging Google Cloud Service Account credentials for access.


🛠️ Implementation Details

1. Service Account Setup

A new service account JSON key file has been added to the data/ folder. This key was generated from:

Google Cloud ConsoleIAMService AccountsKeysCreate new key

This credential is used by the backend service to authorize access to Google Drive files.

2. Code Flow Overview

The flow is initiated by the frontend submission of a public Google Drive URL/folder ID and is handled entirely by the new backend logic.

  1. Client Request: The user submits a publicly shareable Google Drive folder ID/URL to the frontend.
  2. API Endpoint: The client calls the new endpoint:
    @router.post("/upload_audio_from_url")
    async def upload_audio_from_drive_folder(...)
  3. Drive Download: Inside this endpoint, the function download_audio_files_from_drive(drive_folder_id) is called. This function:
    • Authenticates using the newly added service account JSON key.
    • Iterates through the specified Drive folder.
    • Downloads all found .wav files.
  4. Audio Processing: The list of downloaded file objects (files) is then passed directly to the existing audio processing utility:
    audio_controller.upload_and_process_audio_files(current_user, files, device_name, auto_generate_client)
    This ensures that audio files from Drive follow the exact same pipeline (transcription, memory extraction, etc.) as files uploaded directly via the previously written /upload router.

Testing Notes

  • Ensure the Service Account has the necessary "Drive API" and "Google Drive Sharing" permissions to access the folder.
  • The Drive folder must be configured with a public sharing link.

Future Enhancements

  • Error Handling: Implement more granular exception handling for Drive-specific failures (e.g., file not found, permission denied, API rate limits).

addressing #151

Summary by CodeRabbit

  • New Features
    • Added Google Drive integration to upload audio files directly from Google Drive folders via the web UI and API endpoint.
    • New "Paste Google Drive Folder ID" input field in the Upload page for easy folder-based audio submission.

✏️ Tip: You can customize this high-level summary in your review settings.

@AnkushMalaker
Copy link
Collaborator

@coderabbitai review

1 similar comment
@AnkushMalaker
Copy link
Collaborator

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 5, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 5, 2025

Walkthrough

Adds Google Drive audio file integration to the application by introducing dependencies, configuration properties, utilities for downloading audio files from Drive folders, a Google Drive API client, an API endpoint, and frontend UI controls to upload audio from Google Drive.

Changes

Cohort / File(s) Summary
Dependencies & Configuration
backends/advanced/pyproject.toml, backends/advanced/src/advanced_omi_backend/app_config.py
Added three Google authentication dependencies (google-api-python-client, google-auth-oauthlib, google-auth-httplib2). Extended AppConfig with gdrive_credentials_path and gdrive_scopes attributes.
Google Drive Utilities
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py, backends/advanced/src/advanced_omi_backend/clients/gdrive_audio_client.py
Created new module to download and wrap audio files from Google Drive folders as Starlette UploadFile objects; includes audio format filtering, duplicate detection, and error handling. Created singleton Google Drive API client manager with credential loading and caching.
Audio Model & Processing
backends/advanced/src/advanced_omi_backend/models/audio_file.py, backends/advanced/src/advanced_omi_backend/controllers/audio_controller.py, backends/advanced/src/advanced_omi_backend/utils/audio_utils.py
Added source field to AudioFile model. Updated upload_and_process_audio_files controller method to accept source parameter and conditionally use file.audio_uuid for "gdrive" source. Updated write_audio_file utility to persist source field.
API Route
backends/advanced/src/advanced_omi_backend/routers/modules/audio_routes.py
Added new POST endpoint /upload_audio_from_gdrive requiring superuser authentication; downloads audio files from Drive folder and passes them to audio processor with source="gdrive".
Frontend API & UI
backends/advanced/webui/src/services/api.ts, backends/advanced/webui/src/pages/Upload.tsx
Added uploadFromGDriveFolder API method with 5-minute timeout. Extended Upload component with Google Drive folder ID input, submit button, and status tracking; refactored existing state updates to functional form.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant Frontend as React UI
    participant API as Backend API
    participant GDrive as Google Drive API
    participant Controller as Audio Controller
    participant DB as Database

    User->>Frontend: Paste Drive Folder ID + Click Submit
    Frontend->>API: POST /api/audio/upload_audio_from_gdrive
    
    API->>API: Load Google Drive Client
    API->>GDrive: Query folder for audio files
    GDrive-->>API: File list
    
    loop For each audio file
        API->>GDrive: Download file stream
        GDrive-->>API: File content
        API->>API: Wrap as UploadFile<br/>(store Drive ID as audio_uuid)
    end
    
    API->>Controller: upload_and_process_audio_files<br/>(files, source="gdrive")
    Controller->>DB: Store AudioFile with source="gdrive"
    Controller->>DB: Process audio (transcription, etc.)
    Controller-->>API: Success
    
    API-->>Frontend: Upload completed
    Frontend->>User: Show success message
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Areas requiring attention:
    • gdrive_audio_utils.py: File streaming/memory management (spooled temp file), duplicate detection logic via AudioFile.find_one, error handling and wrapping behavior
    • gdrive_audio_client.py: Credential file validation, service account authentication, singleton cache correctness
    • Audio controller & utils: Source parameter propagation through call chain, conditional audio_uuid handling
    • Frontend: State management for Google Drive upload tracking, form validation and error states

Possibly related PRs

Suggested reviewers

  • AnkushMalaker

Poem

🐰 Hop, hop, hooray! A Drive full of sound,
Now fetched through the clouds without making a bound.
Files wrapped up and ready to sing,
From folder to audio, what joy it will bring! 🎵

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 28.57% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'audio upload extension with gdrive credentials' directly relates to the main change—adding Google Drive audio upload capability with service account authentication to the existing audio upload system.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1ff28cb and 8e5a6b2.

📒 Files selected for processing (1)
  • backends/advanced/src/advanced_omi_backend/controllers/audio_controller.py (3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use uv run python command instead of direct python or python3 for executing Python scripts
Use Black formatter with 100-character line length for Python code style
Use isort for Python import organization
NEVER import modules in the middle of functions or files - ALL imports must be at the top of the file after the docstring
Group imports in order: standard library, third-party, local imports
Use lazy imports sparingly and only when absolutely necessary for circular import issues in Python
Always raise errors explicitly in Python code, never silently ignore errors or failures
Use explicit error handling with proper exceptions in Python rather than silent failures
Understand data structures and research input/response or class structure instead of adding defensive hasattr() checks in Python

Files:

  • backends/advanced/src/advanced_omi_backend/controllers/audio_controller.py
🧠 Learnings (1)
📚 Learning: 2025-12-08T23:52:34.959Z
Learnt from: AnkushMalaker
Repo: chronicler-ai/chronicle PR: 178
File: backends/advanced/src/advanced_omi_backend/services/memory/providers/mycelia.py:218-223
Timestamp: 2025-12-08T23:52:34.959Z
Learning: In Python code (chronicle project), prefer logging.exception() inside except blocks to automatically log the full stack trace. When re-raising exceptions, always chain with 'raise ... from e' to preserve the original context; use 'raise ... from None' only if you explicitly want to suppress the context. This improves debuggability across Python files.

Applied to files:

  • backends/advanced/src/advanced_omi_backend/controllers/audio_controller.py
🧬 Code graph analysis (1)
backends/advanced/src/advanced_omi_backend/controllers/audio_controller.py (1)
backends/advanced/src/advanced_omi_backend/models/user.py (1)
  • user_id (71-73)
🔇 Additional comments (3)
backends/advanced/src/advanced_omi_backend/controllers/audio_controller.py (3)

43-43: LGTM: Source parameter addition.

The new source parameter with default value "upload" maintains backward compatibility while enabling differentiation between upload sources (e.g., gdrive vs. direct upload).


85-93: Previous concern resolved: audio_uuid handling is now robust.

The code now properly handles the case where audio_uuid might be missing from gdrive files by logging an error and falling back to a generated UUID, preventing potential downstream issues. This addresses the concern raised in the previous review.


109-109: LGTM: Source propagation to write_audio_file.

The source parameter is correctly passed to write_audio_file, enabling proper persistence of the audio source metadata.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (6)
backends/advanced/webui/src/pages/Upload.tsx (2)

197-219: Update placeholder text to reflect Google Drive folder usage.

The placeholder https://example.com/audio.wav is misleading since this feature accepts a Google Drive folder ID, not a direct audio URL. Consider updating the UI label and placeholder to be clearer.

   <label className="block mb-2 font-medium text-gray-900 dark:text-gray-100">
-    Paste audio URL:
+    Google Drive Folder ID:
   </label>

   <div className="flex space-x-2">
     <input
       type="text"
       value={audioUrl}
       onChange={(e) => setAudioUrl(e.target.value)}
-      placeholder="https://example.com/audio.wav"
+      placeholder="e.g., 1ABC123xyz (from folders/1ABC123xyz in Drive URL)"
       className="flex-1 px-3 py-2 border rounded-lg dark:bg-gray-800 dark:text-gray-100"
     />

     <button
       onClick={handleUrlSubmit}
       disabled={isUploading || !audioUrl}
       className="px-4 py-2 bg-blue-600 text-white rounded-lg hover:bg-blue-700 disabled:opacity-50 disabled:cursor-not-allowed"
     >
-      {isUploading ? 'Submitting...' : 'Submit URL'}
+      {isUploading ? 'Submitting...' : 'Import from Drive'}
     </button>

31-63: Shared isUploading state may cause UX confusion.

The isUploading state is shared between URL upload and file upload flows. When uploading from URL, the file upload "Upload All" button will also appear disabled, and vice versa. Consider using a separate state for URL upload or a more granular approach.

+ const [isUrlUploading, setIsUrlUploading] = useState(false)

  const handleUrlSubmit = async () => {
    if (!audioUrl) return

-   setIsUploading(true)
+   setIsUrlUploading(true)
    setUrlUploadStatus({ type: null, message: '' })

    try {
      // ... existing code ...
    } finally {
-     setIsUploading(false)
+     setIsUrlUploading(false)
    }
  }

Then update the button disabled logic accordingly.

backends/advanced/src/advanced_omi_backend/app_config.py (1)

88-90: Make credentials path configurable via environment variable.

The hardcoded path "data/gdrive_service_account.json" limits deployment flexibility. Consider making this configurable, consistent with other service configurations in this file.

- self.gdrive_credentials_path = "data/gdrive_service_account.json"
+ self.gdrive_credentials_path = os.getenv(
+     "GDRIVE_CREDENTIALS_PATH", "data/gdrive_service_account.json"
+ )
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py (3)

29-31: Prefix unused variable with underscore.

The status variable from downloader.next_chunk() is never used.

-        status, done = downloader.next_chunk()
+        _status, done = downloader.next_chunk()

84-87: Use exception chaining for better debugging.

When re-raising as AudioValidationError, preserve the original exception context using from e for better stack traces.

     except Exception as e:
         if isinstance(e, AudioValidationError):
             raise
-        raise AudioValidationError(f"Google Drive API Error: {repr(e)}")
+        raise AudioValidationError(f"Google Drive API Error: {e!r}") from e

20-46: Async function performs blocking I/O.

download_and_wrap_drive_file is declared async but calls synchronous Google API methods (service.files().get_media(), downloader.next_chunk()) which will block the event loop. For a backend handling concurrent requests, this could cause latency issues.

Consider running the blocking operations in a thread pool executor, or document that this is acceptable for the expected load.

import asyncio
from concurrent.futures import ThreadPoolExecutor

_executor = ThreadPoolExecutor(max_workers=4)

async def download_and_wrap_drive_file(service, file_item):
    loop = asyncio.get_event_loop()
    # Run blocking download in thread pool
    content = await loop.run_in_executor(
        _executor, 
        lambda: _download_file_sync(service, file_item)
    )
    # ... rest of the function
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2ae5946 and d5b9518.

📒 Files selected for processing (6)
  • backends/advanced/pyproject.toml (1 hunks)
  • backends/advanced/src/advanced_omi_backend/app_config.py (1 hunks)
  • backends/advanced/src/advanced_omi_backend/routers/modules/audio_routes.py (1 hunks)
  • backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py (1 hunks)
  • backends/advanced/webui/src/pages/Upload.tsx (3 hunks)
  • backends/advanced/webui/src/services/api.ts (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py (1)
backends/advanced/src/advanced_omi_backend/app_config.py (2)
  • get_app_config (117-119)
  • get_gdrive_service (92-110)
backends/advanced/src/advanced_omi_backend/routers/modules/audio_routes.py (3)
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py (1)
  • download_audio_files_from_drive (52-87)
backends/advanced/src/advanced_omi_backend/models/user.py (1)
  • User (54-101)
backends/advanced/src/advanced_omi_backend/controllers/audio_controller.py (1)
  • upload_and_process_audio_files (37-192)
backends/advanced/webui/src/pages/Upload.tsx (1)
backends/advanced/webui/src/services/api.ts (1)
  • uploadApi (167-191)
🪛 Ruff (0.14.7)
backends/advanced/src/advanced_omi_backend/app_config.py

100-102: Avoid specifying long messages outside the exception class

(TRY003)

backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py

31-31: Unpacked variable status is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


36-36: Avoid specifying long messages outside the exception class

(TRY003)


54-54: Avoid specifying long messages outside the exception class

(TRY003)


76-76: Abstract raise to an inner function

(TRY301)


76-76: Avoid specifying long messages outside the exception class

(TRY003)


82-82: Consider moving this statement to an else block

(TRY300)


87-87: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


87-87: Avoid specifying long messages outside the exception class

(TRY003)


87-87: Use explicit conversion flag

Replace with conversion flag

(RUF010)

backends/advanced/src/advanced_omi_backend/routers/modules/audio_routes.py

23-23: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

🔇 Additional comments (3)
backends/advanced/webui/src/services/api.ts (1)

180-190: LGTM!

The API method correctly maps drive_folder_id to the url query parameter to match the backend's alias, uses appropriate timeout, and follows the existing pattern.

backends/advanced/pyproject.toml (1)

27-29: Verify google-auth-oauthlib is required.

The code uses google.oauth2.service_account.Credentials for service account authentication, which is provided by google-auth (a transitive dependency of google-api-python-client). The google-auth-oauthlib package is designed for OAuth 2.0 user consent flows and may not be necessary for service account authentication.

Confirm whether google-auth-oauthlib is actively used elsewhere in the codebase before including it as a dependency.

backends/advanced/src/advanced_omi_backend/app_config.py (1)

92-110: Verify service account credentials file is not committed to the repository.

Service account JSON keys contain sensitive credentials that should never be committed to version control per Google's official security best practices. Ensure data/gdrive_service_account.json is in .gitignore and document the expected file location and setup process in README or deployment documentation so developers know where to obtain/place credentials securely.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
backends/advanced/src/advanced_omi_backend/routers/modules/audio_routes.py (1)

14-31: Add AudioValidationError handling and a docstring for the Drive upload endpoint

download_audio_files_from_drive raises AudioValidationError for invalid folder IDs, missing audio files, and Drive API errors. Right now those bubble up as 500s instead of a client‑visible 400, and this new endpoint lacks a docstring while /upload has one.

I suggest:

  • Importing AudioValidationError.
  • Wrapping the download call in try/except and translating to HTTPException(status_code=400, detail=str(e)).
  • Adding a brief docstring describing behavior, permissions, and expected input.

Example:

-from advanced_omi_backend.utils.gdrive_audio_utils import download_audio_files_from_drive
+from advanced_omi_backend.utils.gdrive_audio_utils import (
+    download_audio_files_from_drive,
+    AudioValidationError,
+)

 @router.post("/upload_audio_from_url")
 async def upload_audio_from_drive_folder(
-    drive_folder_id: str = Query(...,alias="url", description="Google Drive Folder ID containing audio files (e.g., the string after /folders/ in the URL)"),
+    drive_folder_id: str = Query(
+        ...,
+        alias="url",
+        description="Google Drive folder ID containing audio files (string after /folders/ in the URL)",
+    ),
     current_user: User = Depends(current_superuser),
     device_name: str = Query(default="upload"),
     auto_generate_client: bool = Query(default=True),
 ):
-    files = await download_audio_files_from_drive(drive_folder_id)
-
-    return await audio_controller.upload_and_process_audio_files(
-        current_user, files, device_name, auto_generate_client
-    )
+    """
+    Upload and process audio files from a Google Drive folder. Admin only.
+
+    The folder must be shared with the configured service account; only supported
+    audio file types will be processed.
+    """
+    try:
+        files = await download_audio_files_from_drive(drive_folder_id)
+    except AudioValidationError as e:
+        raise HTTPException(status_code=400, detail=str(e))
+
+    return await audio_controller.upload_and_process_audio_files(
+        current_user, files, device_name, auto_generate_client
+    )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d5b9518 and 9392989.

📒 Files selected for processing (3)
  • backends/advanced/pyproject.toml (1 hunks)
  • backends/advanced/src/advanced_omi_backend/routers/modules/audio_routes.py (1 hunks)
  • backends/advanced/webui/src/services/api.ts (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • backends/advanced/pyproject.toml
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use uv run python command instead of direct python or python3 for executing Python scripts
Use Black formatter with 100-character line length for Python code style
Use isort for Python import organization
NEVER import modules in the middle of functions or files - ALL imports must be at the top of the file after the docstring
Group imports in order: standard library, third-party, local imports
Use lazy imports sparingly and only when absolutely necessary for circular import issues in Python
Always raise errors explicitly in Python code, never silently ignore errors or failures
Use explicit error handling with proper exceptions in Python rather than silent failures
Understand data structures and research input/response or class structure instead of adding defensive hasattr() checks in Python

Files:

  • backends/advanced/src/advanced_omi_backend/routers/modules/audio_routes.py
🧬 Code graph analysis (1)
backends/advanced/src/advanced_omi_backend/routers/modules/audio_routes.py (2)
backends/advanced/src/advanced_omi_backend/app_config.py (1)
  • get_audio_chunk_dir (122-124)
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py (1)
  • download_audio_files_from_drive (52-87)
🪛 Ruff (0.14.7)
backends/advanced/src/advanced_omi_backend/routers/modules/audio_routes.py

23-23: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

🔇 Additional comments (2)
backends/advanced/webui/src/services/api.ts (2)

182-193: uploadAudioFiles refactor preserves behavior

The switch to a concise arrow function keeps the same endpoint, headers, timeout, and progress handling while simplifying the implementation. No functional issues spotted.


194-204: Confirm folder ID vs full URL handling for Drive uploads

The new uploadAudioFromUrl method is wired correctly to the backend (/api/audio/upload_audio_from_url with url, device_name, and auto_generate_client as query params and a 5‑minute timeout).

Please double‑check that the value passed as drive_folder_id is always the raw folder ID (the string after /folders/…), since the backend’s drive_folder_id is used directly as a Drive parent ID. If the UI or docs promise support for pasting a full share URL, you’ll want either:

  • to parse the ID on the frontend before calling this API, or
  • to update the backend/utilities to accept and normalize full URLs.

Copy link
Collaborator

@AnkushMalaker AnkushMalaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make changes according to coderabbit comments? As required

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py (2)

1-8: Imports are well-organized.

The import grouping follows the coding guidelines correctly (standard library, third-party, local). However, atexit is imported but not used in the current implementation.

Consider removing the unused import:

-import atexit

10-11: AUDIO_EXTENSIONS correctly includes .m4a now.

Good fix addressing the past review comment. However, FOLDER_MIMETYPE is defined but never used in this module.

If FOLDER_MIMETYPE isn't needed elsewhere, consider removing it:

-FOLDER_MIMETYPE = "application/vnd.google-apps.folder"
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9392989 and 5b5ea64.

📒 Files selected for processing (2)
  • backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py (1 hunks)
  • backends/advanced/webui/src/pages/Upload.tsx (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • backends/advanced/webui/src/pages/Upload.tsx
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use uv run python command instead of direct python or python3 for executing Python scripts
Use Black formatter with 100-character line length for Python code style
Use isort for Python import organization
NEVER import modules in the middle of functions or files - ALL imports must be at the top of the file after the docstring
Group imports in order: standard library, third-party, local imports
Use lazy imports sparingly and only when absolutely necessary for circular import issues in Python
Always raise errors explicitly in Python code, never silently ignore errors or failures
Use explicit error handling with proper exceptions in Python rather than silent failures
Understand data structures and research input/response or class structure instead of adding defensive hasattr() checks in Python

Files:

  • backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py
🧬 Code graph analysis (1)
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py (1)
backends/advanced/src/advanced_omi_backend/app_config.py (2)
  • get_app_config (117-119)
  • get_gdrive_service (92-110)
🪛 Ruff (0.14.7)
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py

32-32: Unpacked variable status is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


37-37: Avoid specifying long messages outside the exception class

(TRY003)


65-65: Avoid specifying long messages outside the exception class

(TRY003)


87-87: Abstract raise to an inner function

(TRY301)


87-87: Avoid specifying long messages outside the exception class

(TRY003)


93-93: Consider moving this statement to an else block

(TRY300)


98-98: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


98-98: Avoid specifying long messages outside the exception class

(TRY003)


98-98: Use explicit conversion flag

Replace with conversion flag

(RUF010)

🔇 Additional comments (1)
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py (1)

14-15: LGTM!

Clean exception class for domain-specific validation errors.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py (2)

1-11: Remove unused import and constant.

The atexit module (line 4) and FOLDER_MIMETYPE constant (line 11) are not used anywhere in this file. Based on past review comments, atexit was needed in an earlier version but is no longer required with SpooledTemporaryFile.

Apply this diff:

 import os
 import io
 import tempfile
-import atexit
 from typing import List
 from starlette.datastructures import UploadFile as StarletteUploadFile
 from googleapiclient.http import MediaIoBaseDownload
 from advanced_omi_backend.app_config import get_app_config

 AUDIO_EXTENSIONS = (".wav", ".mp3", ".flac", ".ogg", ".m4a")
-FOLDER_MIMETYPE = "application/vnd.google-apps.folder"

18-52: Simplify and add type hints.

Two optional improvements:

  1. Unnecessary close wrapper (lines 41-50): The wrapper only calls original_close() then passes. Since SpooledTemporaryFile auto-cleans when closed and StarletteUploadFile.close() properly closes the underlying file, this wrapper adds no value.

  2. Missing type hints: For consistency with download_audio_files_from_drive, consider adding type hints to parameters and return type.

Apply this diff:

-def download_and_wrap_drive_file(service, file_item):
+def download_and_wrap_drive_file(service, file_item: dict) -> StarletteUploadFile:
+    """Download a Google Drive file and wrap it as a Starlette UploadFile.
+    
+    Args:
+        service: Google Drive API service instance
+        file_item: Dict with 'id' and 'name' keys from Drive API response
+    
+    Returns:
+        StarletteUploadFile wrapper around the downloaded content
+    """
     file_id = file_item["id"]
     name = file_item["name"]
 
     request = service.files().get_media(fileId=file_id)
 
     fh = io.BytesIO()
     downloader = MediaIoBaseDownload(fh, request)
 
     done = False
     while not done:
         _status, done = downloader.next_chunk()
 
     content = fh.getvalue()
 
     if not content:
         raise AudioValidationError(f"Downloaded Google Drive file '{name}' was empty")
 
     tmp_file = tempfile.SpooledTemporaryFile(max_size=10*1024*1024)  # 10 MB
     tmp_file.write(content)
     tmp_file.seek(0)
-    upload_file = StarletteUploadFile(filename=name, file=tmp_file)
-
-    original_close = upload_file.close
-
-    def wrapped_close():
-        try:
-            original_close()
-        finally:
-            # SpooledTemporaryFile auto-cleans when closed; no unlink needed
-            pass
-
-    upload_file.close = wrapped_close
-
-    return upload_file
+    return StarletteUploadFile(filename=name, file=tmp_file)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5b5ea64 and 5abd99d.

📒 Files selected for processing (1)
  • backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use uv run python command instead of direct python or python3 for executing Python scripts
Use Black formatter with 100-character line length for Python code style
Use isort for Python import organization
NEVER import modules in the middle of functions or files - ALL imports must be at the top of the file after the docstring
Group imports in order: standard library, third-party, local imports
Use lazy imports sparingly and only when absolutely necessary for circular import issues in Python
Always raise errors explicitly in Python code, never silently ignore errors or failures
Use explicit error handling with proper exceptions in Python rather than silent failures
Understand data structures and research input/response or class structure instead of adding defensive hasattr() checks in Python

Files:

  • backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py
🧬 Code graph analysis (1)
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py (1)
backends/advanced/src/advanced_omi_backend/app_config.py (2)
  • get_app_config (117-119)
  • get_gdrive_service (92-110)
🪛 Ruff (0.14.7)
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py

34-34: Avoid specifying long messages outside the exception class

(TRY003)


59-59: Avoid specifying long messages outside the exception class

(TRY003)


81-81: Abstract raise to an inner function

(TRY301)


81-81: Avoid specifying long messages outside the exception class

(TRY003)


88-88: Consider moving this statement to an else block

(TRY300)


93-93: Avoid specifying long messages outside the exception class

(TRY003)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py (2)

4-4: Remove unused atexit import.

The atexit module is imported but never used in this file. It was likely part of an earlier cleanup implementation before switching to SpooledTemporaryFile.

Apply this diff:

-import atexit

67-72: Consider adding pagination for large folders.

The Drive API's files().list() method returns paginated results (default page size is 100 files). If a Drive folder contains more than 100 files, only the first page will be processed, potentially missing audio files in subsequent pages. While this may be uncommon for typical use cases, it could silently drop files in larger folders.

Consider adding pagination handling:

         response = service.files().list(
             q=query,
             fields="files(id, name, mimeType)",
             includeItemsFromAllDrives=False,
             supportsAllDrives=False,
         ).execute()

         all_files = response.get("files", [])
+        
+        # Handle pagination if folder has many files
+        while "nextPageToken" in response:
+            response = service.files().list(
+                q=query,
+                fields="files(id, name, mimeType)",
+                includeItemsFromAllDrives=False,
+                supportsAllDrives=False,
+                pageToken=response["nextPageToken"],
+            ).execute()
+            all_files.extend(response.get("files", []))
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5abd99d and 3d00bac.

📒 Files selected for processing (1)
  • backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use uv run python command instead of direct python or python3 for executing Python scripts
Use Black formatter with 100-character line length for Python code style
Use isort for Python import organization
NEVER import modules in the middle of functions or files - ALL imports must be at the top of the file after the docstring
Group imports in order: standard library, third-party, local imports
Use lazy imports sparingly and only when absolutely necessary for circular import issues in Python
Always raise errors explicitly in Python code, never silently ignore errors or failures
Use explicit error handling with proper exceptions in Python rather than silent failures
Understand data structures and research input/response or class structure instead of adding defensive hasattr() checks in Python

Files:

  • backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py
🧬 Code graph analysis (1)
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py (1)
backends/advanced/src/advanced_omi_backend/app_config.py (2)
  • get_app_config (117-119)
  • get_gdrive_service (92-110)
🪛 Ruff (0.14.7)
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py

34-34: Avoid specifying long messages outside the exception class

(TRY003)


59-59: Avoid specifying long messages outside the exception class

(TRY003)


82-82: Abstract raise to an inner function

(TRY301)


82-82: Avoid specifying long messages outside the exception class

(TRY003)


89-89: Consider moving this statement to an else block

(TRY300)


94-94: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (2)
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py (2)

43-48: Cleanup pattern correctly implemented.

The wrapped close function correctly handles cleanup for SpooledTemporaryFile, which auto-cleans both in-memory and on-disk backing files when closed. The wrapper pattern provides a clear extension point if additional cleanup logic is needed in the future.


64-65: Query injection properly mitigated.

The folder ID is correctly escaped before being used in the Drive API query string, following Google's recommended practices. Escaping backslashes first, then single quotes, prevents query injection attacks while maintaining correct query syntax.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py (1)

20-55: Simplify download helper: make it synchronous and drop the close wrapper.

download_and_wrap_drive_file is async but performs only synchronous Google API and file operations, and the custom wrapped_close never awaits the original async UploadFile.close, which can be surprising if .close() is ever awaited downstream.

Consider simplifying this helper to be a plain synchronous function that streams directly into the spooled file and relies on Starlette’s default close implementation:

-async def download_and_wrap_drive_file(service, file_item):
+def download_and_wrap_drive_file(service, file_item):
     file_id = file_item["id"]
     name = file_item["name"]
 
-    request = service.files().get_media(fileId=file_id)
-
-    fh = io.BytesIO()
-    downloader = MediaIoBaseDownload(fh, request)
-
-    done = False
-    while not done:
-        _status, done = downloader.next_chunk()
-
-    content = fh.getvalue()
-
-    if not content:
-        raise AudioValidationError(f"Downloaded Google Drive file '{name}' was empty")
-
-    tmp_file = tempfile.SpooledTemporaryFile(max_size=10*1024*1024)  # 10 MB
-    tmp_file.write(content)
-    tmp_file.seek(0)
-    upload_file = StarletteUploadFile(filename=name, file=tmp_file)
-
-    original_close = upload_file.close
-
-    def wrapped_close():
-        try:
-            original_close()
-        finally:
-            # SpooledTemporaryFile auto-cleans when closed; no unlink needed
-            pass
-
-    upload_file.close = wrapped_close
-
-    return upload_file
+    tmp_file = tempfile.SpooledTemporaryFile(max_size=10 * 1024 * 1024)  # 10 MB
+    downloader = MediaIoBaseDownload(tmp_file, request=service.files().get_media(fileId=file_id))
+
+    done = False
+    while not done:
+        _status, done = downloader.next_chunk()
+
+    tmp_file.seek(0)
+    return StarletteUploadFile(filename=name, file=tmp_file)

And in the caller (line 99), drop the await once this helper is synchronous:

wrapped_file = download_and_wrap_drive_file(service, item)

This removes the unnecessary async wrapper, avoids holding duplicate copies of the file in memory, and keeps UploadFile.close semantics intact.

🧹 Nitpick comments (2)
backends/advanced/src/advanced_omi_backend/utils/audio_utils.py (1)

102-112: gdrive_file_id plumbing into AudioFile looks correct; consider docstring update.

Adding gdrive_file_id as a trailing keyword argument keeps existing call sites safe, and persisting it into AudioFile will support de-duplication and Drive lookups as intended. You may want to extend the write_audio_file docstring to document this new parameter for future maintainers.

Also applies to: 198-209

backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py (1)

59-123: Minor cleanups in folder download/de‑dup logic.

A few small tweaks will make this code clearer and closer to project guidelines:

  • Use direct attribute assignment instead of setattr with a constant name (Ruff B010):
-            wrapped_file = await download_and_wrap_drive_file(service, item)
-            #  Attach the file_id to the UploadFile object for later use
-            setattr(wrapped_file, "gdrive_file_id", file_id) 
+            wrapped_file = await download_and_wrap_drive_file(service, item)
+            # Attach the Drive file ID for downstream processing
+            wrapped_file.gdrive_file_id = file_id
  • Consider batching the is_drive_file_already_processed check if you expect large folders, e.g., by fetching all existing AudioFile documents for the set of Drive IDs in one query and filtering in memory, to avoid N round‑trips.

  • The function-scope import in is_drive_file_already_processed violates the “no mid-file imports” guideline unless it’s strictly for a circular-import workaround. If there’s no cycle here, prefer a top-level import; if there is, add a short comment explaining why this needs to stay lazy:

-from advanced_omi_backend.models.audio_file import AudioFile     
+from advanced_omi_backend.models.audio_file import AudioFile

moved to the top of the module (or keep it here with an explanatory comment if circular imports require it).

As per coding guidelines, …

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3d00bac and b036185.

📒 Files selected for processing (4)
  • backends/advanced/src/advanced_omi_backend/controllers/audio_controller.py (3 hunks)
  • backends/advanced/src/advanced_omi_backend/models/audio_file.py (1 hunks)
  • backends/advanced/src/advanced_omi_backend/utils/audio_utils.py (2 hunks)
  • backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use uv run python command instead of direct python or python3 for executing Python scripts
Use Black formatter with 100-character line length for Python code style
Use isort for Python import organization
NEVER import modules in the middle of functions or files - ALL imports must be at the top of the file after the docstring
Group imports in order: standard library, third-party, local imports
Use lazy imports sparingly and only when absolutely necessary for circular import issues in Python
Always raise errors explicitly in Python code, never silently ignore errors or failures
Use explicit error handling with proper exceptions in Python rather than silent failures
Understand data structures and research input/response or class structure instead of adding defensive hasattr() checks in Python

Files:

  • backends/advanced/src/advanced_omi_backend/models/audio_file.py
  • backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py
  • backends/advanced/src/advanced_omi_backend/utils/audio_utils.py
  • backends/advanced/src/advanced_omi_backend/controllers/audio_controller.py
🧠 Learnings (1)
📚 Learning: 2025-12-08T23:52:34.959Z
Learnt from: AnkushMalaker
Repo: chronicler-ai/chronicle PR: 178
File: backends/advanced/src/advanced_omi_backend/services/memory/providers/mycelia.py:218-223
Timestamp: 2025-12-08T23:52:34.959Z
Learning: In Python code (chronicle project), prefer logging.exception() inside except blocks to automatically log the full stack trace. When re-raising exceptions, always chain with 'raise ... from e' to preserve the original context; use 'raise ... from None' only if you explicitly want to suppress the context. This improves debuggability across Python files.

Applied to files:

  • backends/advanced/src/advanced_omi_backend/models/audio_file.py
  • backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py
  • backends/advanced/src/advanced_omi_backend/utils/audio_utils.py
  • backends/advanced/src/advanced_omi_backend/controllers/audio_controller.py
🪛 Ruff (0.14.8)
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py

36-36: Avoid specifying long messages outside the exception class

(TRY003)


61-61: Avoid specifying long messages outside the exception class

(TRY003)


84-84: Abstract raise to an inner function

(TRY301)


84-84: Avoid specifying long messages outside the exception class

(TRY003)


101-101: Do not call setattr with a constant attribute value. It is not any safer than normal property access.

Replace setattr with assignment

(B010)


105-105: Abstract raise to an inner function

(TRY301)


105-105: Avoid specifying long messages outside the exception class

(TRY003)


107-107: Consider moving this statement to an else block

(TRY300)


112-112: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (2)
backends/advanced/src/advanced_omi_backend/controllers/audio_controller.py (1)

81-85: Drive ID propagation through upload pipeline is sound.

Using getattr(file, "gdrive_file_id", None) cleanly keeps Drive-specific metadata optional while preserving existing behavior for standard uploads, and forwarding it into write_audio_file wires the model change end-to-end without breaking callers.

Also applies to: 98-110, 123-140

backends/advanced/src/advanced_omi_backend/models/audio_file.py (1)

54-65: Model and index extension for gdrive_file_id aligns with usage.

Adding gdrive_file_id plus an index is consistent with the new de-duplication check and should keep is_drive_file_already_processed performant. No behavioral regressions expected.

self.gdrive_scopes = ["https://www.googleapis.com/auth/drive.readonly"]
self._gdrive_service = None

def get_gdrive_service(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably shouldn't be in app_config. can have a gdrive service in services or have a google_client in clients. Then we initialise it at the point of use

router = APIRouter(prefix="/audio", tags=["audio"])


@router.post("/upload_audio_from_url")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this router name suggests I can use any url, but looks like it's just Gdrive?
Probably the way we want to do this would be to upload from any url, but provide credentials in the header that we get from a google_auth client.

I'm not super familiar with how gdrive works though

📝 Upload Instructions
</h3>
<ul className="text-sm text-yellow-700 dark:text-yellow-300 space-y-1">
<li>• Hello world</li>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assume this is test, remove

},
}),

uploadAudioFromUrl: (payload: { drive_folder_id: string; device_name?: string; auto_generate_client?: boolean }) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As before, we either need a generic uploadAudiofromURL that gets called from uploadAudiofromGdrive, or change the name to be from gdrive

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (6)
backends/advanced/src/advanced_omi_backend/app_config.py (1)

88-89: Make GDrive credentials path configurable via environment

Right now the Drive credentials path and scopes are hard-coded. That works locally but is brittle across environments (Docker, k8s secrets, different mount paths).

Consider something like:

-        self.gdrive_credentials_path = "data/gdrive_service_account.json"
-        self.gdrive_scopes = ["https://www.googleapis.com/auth/drive.readonly"]
+        self.gdrive_credentials_path = os.getenv(
+            "GDRIVE_CREDENTIALS_PATH", "data/gdrive_service_account.json"
+        )
+        self.gdrive_scopes = [
+            scope.strip()
+            for scope in os.getenv(
+                "GDRIVE_SCOPES",
+                "https://www.googleapis.com/auth/drive.readonly",
+            ).split(",")
+            if scope.strip()
+        ]

This keeps sane defaults while allowing deployments to override path/scopes via env vars.

backends/advanced/webui/src/pages/Upload.tsx (3)

18-60: Avoid reusing stale uploadProgress for GDrive submissions

handleGDriveSubmit sets isUploading but never touches uploadProgress, so if a prior file upload ended at 100%, the GDrive flow will instantly show 100% progress even though it’s just starting.

Either reset progress at the start of a GDrive submission, or track a separate progress flag for folder uploads, e.g.:

  const handleGDriveSubmit = async () => {
     if (!gdriveFolderId) return

-    setIsUploading(true)
+    setIsUploading(true)
+    setUploadProgress(0)

or introduce a dedicated isGdriveUploading boolean and gate the progress UI on that instead of the shared isUploading.

Also applies to: 189-225


80-85: Use functional setFiles updates for all mutations

removeFile and clearCompleted still close over files:

setFiles(files.filter((f) => f.id !== id))
setFiles(files.filter((f) => f.status === 'pending' || f.status === 'uploading'))

For consistency with the rest of the component and to avoid stale-closure issues under concurrent renders, prefer functional updates:

const removeFile = (id: string) => {
-  setFiles(files.filter((f) => f.id !== id))
+  setFiles((prev) => prev.filter((f) => f.id !== id))
}

const clearCompleted = () => {
-  setFiles(files.filter((f) => f.status === 'pending' || f.status === 'uploading'))
+  setFiles((prev) =>
+    prev.filter((f) => f.status === 'pending' || f.status === 'uploading'),
+  )
}

Also applies to: 140-142


144-150: Reuse shared formatFileSize helper instead of duplicating logic

This local formatFileSize duplicates the implementation in utils/fileHash.ts. To keep formatting consistent across the app and avoid future divergence, consider importing and using the shared helper here instead of maintaining two copies.

backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py (2)

22-57: Consider making download_and_wrap_drive_file a plain function

download_and_wrap_drive_file performs only synchronous work (HTTP/file I/O, no await), yet it’s declared async and awaited. That’s functionally correct, but a bit misleading and adds coroutine overhead without giving you any extra concurrency.

Unless you plan to introduce actual async operations inside it, consider:

-async def download_and_wrap_drive_file(service, file_item):
+def download_and_wrap_drive_file(service, file_item):
@@
-            # synchronous call now (but make the parent function async)
-            wrapped_file = await download_and_wrap_drive_file(service, item)
+            # synchronous call
+            wrapped_file = download_and_wrap_drive_file(service, item)

download_audio_files_from_drive can stay async (it already awaits DB checks), while this helper remains a straightforward synchronous utility.

Also applies to: 100-104


111-114: Optionally log full stack on unexpected Drive errors

The except Exception as e block correctly re-raises as AudioValidationError with raise ... from e. To make debugging production issues easier, you might also log the stack trace here:

     except Exception as e:
         if isinstance(e, AudioValidationError):
             raise
-        raise AudioValidationError(f"Google Drive API Error: {e}") from e
+        logger.exception("Google Drive API Error while listing/downloading files")
+        raise AudioValidationError(f"Google Drive API Error: {e}") from e

This preserves the current behavior while adding observability for operators.

Based on learnings, this follows the preferred logging.exception + raise ... from e pattern.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b036185 and cff1a4c.

📒 Files selected for processing (6)
  • backends/advanced/src/advanced_omi_backend/app_config.py (1 hunks)
  • backends/advanced/src/advanced_omi_backend/clients/gdrive_audio_client.py (1 hunks)
  • backends/advanced/src/advanced_omi_backend/routers/modules/audio_routes.py (1 hunks)
  • backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py (1 hunks)
  • backends/advanced/webui/src/pages/Upload.tsx (8 hunks)
  • backends/advanced/webui/src/services/api.ts (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use uv run python command instead of direct python or python3 for executing Python scripts
Use Black formatter with 100-character line length for Python code style
Use isort for Python import organization
NEVER import modules in the middle of functions or files - ALL imports must be at the top of the file after the docstring
Group imports in order: standard library, third-party, local imports
Use lazy imports sparingly and only when absolutely necessary for circular import issues in Python
Always raise errors explicitly in Python code, never silently ignore errors or failures
Use explicit error handling with proper exceptions in Python rather than silent failures
Understand data structures and research input/response or class structure instead of adding defensive hasattr() checks in Python

Files:

  • backends/advanced/src/advanced_omi_backend/clients/gdrive_audio_client.py
  • backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py
  • backends/advanced/src/advanced_omi_backend/routers/modules/audio_routes.py
  • backends/advanced/src/advanced_omi_backend/app_config.py
🧠 Learnings (1)
📚 Learning: 2025-12-08T23:52:34.959Z
Learnt from: AnkushMalaker
Repo: chronicler-ai/chronicle PR: 178
File: backends/advanced/src/advanced_omi_backend/services/memory/providers/mycelia.py:218-223
Timestamp: 2025-12-08T23:52:34.959Z
Learning: In Python code (chronicle project), prefer logging.exception() inside except blocks to automatically log the full stack trace. When re-raising exceptions, always chain with 'raise ... from e' to preserve the original context; use 'raise ... from None' only if you explicitly want to suppress the context. This improves debuggability across Python files.

Applied to files:

  • backends/advanced/src/advanced_omi_backend/clients/gdrive_audio_client.py
  • backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py
  • backends/advanced/src/advanced_omi_backend/routers/modules/audio_routes.py
  • backends/advanced/src/advanced_omi_backend/app_config.py
🧬 Code graph analysis (4)
backends/advanced/src/advanced_omi_backend/clients/gdrive_audio_client.py (1)
backends/advanced/src/advanced_omi_backend/app_config.py (1)
  • get_app_config (96-98)
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py (4)
backends/advanced/src/advanced_omi_backend/app_config.py (1)
  • get_app_config (96-98)
backends/advanced/src/advanced_omi_backend/clients/gdrive_audio_client.py (1)
  • get_google_drive_client (8-29)
backends/advanced/src/advanced_omi_backend/utils/audio_utils.py (1)
  • AudioValidationError (28-30)
backends/advanced/src/advanced_omi_backend/models/audio_file.py (1)
  • AudioFile (19-65)
backends/advanced/src/advanced_omi_backend/routers/modules/audio_routes.py (4)
backends/advanced/src/advanced_omi_backend/app_config.py (1)
  • get_audio_chunk_dir (101-103)
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py (1)
  • download_audio_files_from_drive (61-114)
backends/advanced/src/advanced_omi_backend/models/user.py (1)
  • User (51-98)
backends/advanced/src/advanced_omi_backend/controllers/audio_controller.py (1)
  • upload_and_process_audio_files (37-213)
backends/advanced/webui/src/pages/Upload.tsx (1)
backends/advanced/webui/src/services/api.ts (1)
  • uploadApi (181-203)
🪛 Ruff (0.14.8)
backends/advanced/src/advanced_omi_backend/clients/gdrive_audio_client.py

18-20: Avoid specifying long messages outside the exception class

(TRY003)

backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py

38-38: Avoid specifying long messages outside the exception class

(TRY003)


63-63: Avoid specifying long messages outside the exception class

(TRY003)


86-86: Abstract raise to an inner function

(TRY301)


86-86: Avoid specifying long messages outside the exception class

(TRY003)


103-103: Do not call setattr with a constant attribute value. It is not any safer than normal property access.

Replace setattr with assignment

(B010)


107-107: Abstract raise to an inner function

(TRY301)


107-107: Avoid specifying long messages outside the exception class

(TRY003)


109-109: Consider moving this statement to an else block

(TRY300)


114-114: Avoid specifying long messages outside the exception class

(TRY003)

backends/advanced/src/advanced_omi_backend/routers/modules/audio_routes.py

23-23: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

🔇 Additional comments (2)
backends/advanced/src/advanced_omi_backend/clients/gdrive_audio_client.py (1)

1-29: Drive client singleton looks sound

The client factory cleanly validates the credentials path, builds a Drive v3 client with scoped service-account creds, and caches it for reuse. No functional issues stand out here.

backends/advanced/webui/src/services/api.ts (1)

181-203: GDrive upload API wiring looks consistent with existing upload flow

The new uploadFromGDriveFolder mirrors the existing upload conventions (same base client, 5‑minute timeout, query params), and the route path matches the new backend endpoint. No changes needed here.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py (2)

20-54: Simplify download_and_wrap_drive_file and drop the no-op close wrapper

The core logic looks fine, but the wrapped_close indirection no longer adds any behavior—SpooledTemporaryFile is already cleaned up on close, and the wrapper just calls original_close() then pass. You can simplify by returning the UploadFile directly.

-    tmp_file = tempfile.SpooledTemporaryFile(max_size=10*1024*1024)  # 10 MB
-    tmp_file.write(content)
-    tmp_file.seek(0)
-    upload_file = StarletteUploadFile(filename=name, file=tmp_file)
-
-    original_close = upload_file.close
-
-    def wrapped_close():
-        try:
-            original_close()
-        finally:
-            # SpooledTemporaryFile auto-cleans when closed; no unlink needed
-            pass
-
-    upload_file.close = wrapped_close
-
-    return upload_file
+    tmp_file = tempfile.SpooledTemporaryFile(max_size=10 * 1024 * 1024)  # 10 MB
+    tmp_file.write(content)
+    tmp_file.seek(0)
+    upload_file = StarletteUploadFile(filename=name, file=tmp_file)
+    return upload_file

59-112: Tighten Drive-folder error handling and fix stale comment

The overall flow (ID validation, extension filtering, skip of already-processed files) looks solid, but two things are worth adjusting:

  1. The comment at line 98 still says “synchronous call now”, but download_and_wrap_drive_file is async again. This is misleading for future readers.
  2. The broad except Exception as e converts all non-AudioValidationError failures (including transient Drive API or network issues) into AudioValidationError, which the router maps to HTTP 400. That blurs the line between user-facing validation errors and genuine server/infra problems.

Based on learnings, it would also help debuggability to log the underlying exception with audio_logger.exception(...) before re‑raising, and to keep exception chaining as you already do with from e.

For example:

-        for item in audio_files_metadata:
+        for item in audio_files_metadata:
             file_id = item["id"] # Get the Google Drive File ID
@@
-            #  Check if the file is already processed
+            # Check if the file is already processed
@@
-            # synchronous call now (but make the parent function async)
-            wrapped_file = await download_and_wrap_drive_file(service, item)
+            # Download and wrap the Drive file
+            wrapped_file = await download_and_wrap_drive_file(service, item)
@@
-    except Exception as e:
-        if isinstance(e, AudioValidationError):
-            raise
-        raise AudioValidationError(f"Google Drive API Error: {e}") from e
+    except AudioValidationError:
+        # Propagate validation-style errors unchanged
+        raise
+    except Exception as e:
+        # Log unexpected Drive/API issues and surface as a wrapped error
+        audio_logger.exception("Google Drive API error while listing/downloading files")
+        raise AudioValidationError(f"Google Drive API Error: {e}") from e

Based on learnings, this preserves stack traces, keeps validation vs. infrastructure errors distinguishable, and improves observability.

backends/advanced/src/advanced_omi_backend/routers/modules/audio_routes.py (1)

14-34: New /upload_audio_from_gdrive route wiring looks good; add exception chaining

The endpoint correctly restricts access, delegates to download_audio_files_from_drive, and reuses the existing upload pipeline. To keep traceback context (and satisfy Ruff B904), you can chain the HTTPException from the original AudioValidationError.

-    try: 
-        files = await download_audio_files_from_drive(gdrive_folder_id)
-    except AudioValidationError as e: 
-        raise HTTPException(status_code=400, detail=str(e))
+    try:
+        files = await download_audio_files_from_drive(gdrive_folder_id)
+    except AudioValidationError as e:
+        raise HTTPException(status_code=400, detail=str(e)) from e

Based on learnings, this preserves the original error context while still returning a clean 400 to clients.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cff1a4c and 6534288.

📒 Files selected for processing (2)
  • backends/advanced/src/advanced_omi_backend/routers/modules/audio_routes.py (1 hunks)
  • backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use uv run python command instead of direct python or python3 for executing Python scripts
Use Black formatter with 100-character line length for Python code style
Use isort for Python import organization
NEVER import modules in the middle of functions or files - ALL imports must be at the top of the file after the docstring
Group imports in order: standard library, third-party, local imports
Use lazy imports sparingly and only when absolutely necessary for circular import issues in Python
Always raise errors explicitly in Python code, never silently ignore errors or failures
Use explicit error handling with proper exceptions in Python rather than silent failures
Understand data structures and research input/response or class structure instead of adding defensive hasattr() checks in Python

Files:

  • backends/advanced/src/advanced_omi_backend/routers/modules/audio_routes.py
  • backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py
🧠 Learnings (1)
📚 Learning: 2025-12-08T23:52:34.959Z
Learnt from: AnkushMalaker
Repo: chronicler-ai/chronicle PR: 178
File: backends/advanced/src/advanced_omi_backend/services/memory/providers/mycelia.py:218-223
Timestamp: 2025-12-08T23:52:34.959Z
Learning: In Python code (chronicle project), prefer logging.exception() inside except blocks to automatically log the full stack trace. When re-raising exceptions, always chain with 'raise ... from e' to preserve the original context; use 'raise ... from None' only if you explicitly want to suppress the context. This improves debuggability across Python files.

Applied to files:

  • backends/advanced/src/advanced_omi_backend/routers/modules/audio_routes.py
  • backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py
🧬 Code graph analysis (2)
backends/advanced/src/advanced_omi_backend/routers/modules/audio_routes.py (4)
backends/advanced/src/advanced_omi_backend/app_config.py (1)
  • get_audio_chunk_dir (101-103)
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py (1)
  • download_audio_files_from_drive (59-112)
backends/advanced/src/advanced_omi_backend/utils/audio_utils.py (1)
  • AudioValidationError (28-30)
backends/advanced/src/advanced_omi_backend/controllers/audio_controller.py (1)
  • upload_and_process_audio_files (37-213)
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py (3)
backends/advanced/src/advanced_omi_backend/clients/gdrive_audio_client.py (1)
  • get_google_drive_client (8-29)
backends/advanced/src/advanced_omi_backend/models/audio_file.py (1)
  • AudioFile (19-65)
backends/advanced/src/advanced_omi_backend/utils/audio_utils.py (1)
  • AudioValidationError (28-30)
🪛 Ruff (0.14.8)
backends/advanced/src/advanced_omi_backend/routers/modules/audio_routes.py

23-23: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


30-30: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py

36-36: Avoid specifying long messages outside the exception class

(TRY003)


61-61: Avoid specifying long messages outside the exception class

(TRY003)


84-84: Abstract raise to an inner function

(TRY301)


84-84: Avoid specifying long messages outside the exception class

(TRY003)


105-105: Abstract raise to an inner function

(TRY301)


105-105: Avoid specifying long messages outside the exception class

(TRY003)


107-107: Consider moving this statement to an else block

(TRY300)


112-112: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (1)
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py (1)

115-122: Drive de-duplication helper is consistent with the model

is_drive_file_already_processed cleanly checks AudioFile.gdrive_file_id and short-circuits on falsy IDs. Given the indexed gdrive_file_id field in AudioFile, this is a good, efficient way to avoid reprocessing.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (1)
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py (1)

20-54: Remove unnecessary async decorator.

download_and_wrap_drive_file is declared async but performs no await operations—the Google Drive API calls are synchronous. This adds unnecessary overhead and is misleading to callers.

-async def download_and_wrap_drive_file(service, file_item):
+def download_and_wrap_drive_file(service, file_item):
     file_id = file_item["id"]
     name = file_item["name"]

Also update the call site on line 104:

-            wrapped_file = await download_and_wrap_drive_file(service, item)
+            wrapped_file = download_and_wrap_drive_file(service, item)

Note: This was flagged in a previous review and marked as addressed, but the current code still has the issue.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6534288 and 1ff28cb.

📒 Files selected for processing (6)
  • backends/advanced/src/advanced_omi_backend/controllers/audio_controller.py (4 hunks)
  • backends/advanced/src/advanced_omi_backend/models/audio_file.py (2 hunks)
  • backends/advanced/src/advanced_omi_backend/routers/modules/audio_routes.py (1 hunks)
  • backends/advanced/src/advanced_omi_backend/utils/audio_utils.py (2 hunks)
  • backends/advanced/src/advanced_omi_backend/utils/dropbox_audio_utils.py (1 hunks)
  • backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • backends/advanced/src/advanced_omi_backend/utils/audio_utils.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use uv run python command instead of direct python or python3 for executing Python scripts
Use Black formatter with 100-character line length for Python code style
Use isort for Python import organization
NEVER import modules in the middle of functions or files - ALL imports must be at the top of the file after the docstring
Group imports in order: standard library, third-party, local imports
Use lazy imports sparingly and only when absolutely necessary for circular import issues in Python
Always raise errors explicitly in Python code, never silently ignore errors or failures
Use explicit error handling with proper exceptions in Python rather than silent failures
Understand data structures and research input/response or class structure instead of adding defensive hasattr() checks in Python

Files:

  • backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py
  • backends/advanced/src/advanced_omi_backend/utils/dropbox_audio_utils.py
  • backends/advanced/src/advanced_omi_backend/routers/modules/audio_routes.py
  • backends/advanced/src/advanced_omi_backend/models/audio_file.py
  • backends/advanced/src/advanced_omi_backend/controllers/audio_controller.py
🧠 Learnings (1)
📚 Learning: 2025-12-08T23:52:34.959Z
Learnt from: AnkushMalaker
Repo: chronicler-ai/chronicle PR: 178
File: backends/advanced/src/advanced_omi_backend/services/memory/providers/mycelia.py:218-223
Timestamp: 2025-12-08T23:52:34.959Z
Learning: In Python code (chronicle project), prefer logging.exception() inside except blocks to automatically log the full stack trace. When re-raising exceptions, always chain with 'raise ... from e' to preserve the original context; use 'raise ... from None' only if you explicitly want to suppress the context. This improves debuggability across Python files.

Applied to files:

  • backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py
  • backends/advanced/src/advanced_omi_backend/utils/dropbox_audio_utils.py
  • backends/advanced/src/advanced_omi_backend/routers/modules/audio_routes.py
  • backends/advanced/src/advanced_omi_backend/models/audio_file.py
  • backends/advanced/src/advanced_omi_backend/controllers/audio_controller.py
🧬 Code graph analysis (2)
backends/advanced/src/advanced_omi_backend/routers/modules/audio_routes.py (4)
backends/advanced/src/advanced_omi_backend/app_config.py (1)
  • get_audio_chunk_dir (101-103)
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py (1)
  • download_audio_files_from_drive (59-117)
backends/advanced/src/advanced_omi_backend/utils/audio_utils.py (1)
  • AudioValidationError (28-30)
backends/advanced/src/advanced_omi_backend/controllers/audio_controller.py (1)
  • upload_and_process_audio_files (37-216)
backends/advanced/src/advanced_omi_backend/controllers/audio_controller.py (1)
backends/advanced/src/advanced_omi_backend/models/user.py (1)
  • user_id (71-73)
🪛 Ruff (0.14.8)
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py

36-36: Avoid specifying long messages outside the exception class

(TRY003)


61-61: Avoid specifying long messages outside the exception class

(TRY003)


84-84: Abstract raise to an inner function

(TRY301)


84-84: Avoid specifying long messages outside the exception class

(TRY003)


110-110: Abstract raise to an inner function

(TRY301)


110-110: Avoid specifying long messages outside the exception class

(TRY003)


112-112: Consider moving this statement to an else block

(TRY300)


117-117: Avoid specifying long messages outside the exception class

(TRY003)

backends/advanced/src/advanced_omi_backend/utils/dropbox_audio_utils.py

21-21: Avoid specifying long messages outside the exception class

(TRY003)


31-31: Probable use of requests call without timeout

(S113)


37-39: Avoid specifying long messages outside the exception class

(TRY003)


43-43: Avoid specifying long messages outside the exception class

(TRY003)


65-65: Avoid specifying long messages outside the exception class

(TRY003)


69-69: Avoid specifying long messages outside the exception class

(TRY003)


85-85: Probable use of requests call without timeout

(S113)


92-92: Abstract raise to an inner function

(TRY301)


92-92: Avoid specifying long messages outside the exception class

(TRY003)


103-103: Abstract raise to an inner function

(TRY301)


103-103: Avoid specifying long messages outside the exception class

(TRY003)


123-123: Do not call setattr with a constant attribute value. It is not any safer than normal property access.

Replace setattr with assignment

(B010)


128-130: Abstract raise to an inner function

(TRY301)


128-130: Avoid specifying long messages outside the exception class

(TRY003)


132-132: Consider moving this statement to an else block

(TRY300)


137-137: Avoid specifying long messages outside the exception class

(TRY003)

backends/advanced/src/advanced_omi_backend/routers/modules/audio_routes.py

23-23: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


30-30: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🔇 Additional comments (3)
backends/advanced/src/advanced_omi_backend/models/audio_file.py (1)

32-35: LGTM! Clean addition of source field.

The new source field is properly typed, indexed, and documented. The default value "upload" maintains backward compatibility with existing code paths.

backends/advanced/src/advanced_omi_backend/routers/modules/audio_routes.py (1)

20-34: LGTM! Well-structured endpoint with proper error handling.

The new Google Drive upload endpoint correctly:

  • Maps AudioValidationError to HTTP 400 for user-facing validation issues
  • Requires superuser authentication
  • Passes source="gdrive" to the controller for proper audio_uuid handling

The Ruff warning about Depends in default arguments is a false positive—this is the standard FastAPI pattern for dependency injection.

backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py (1)

66-67: Excellent security and deduplication implementation.

The code properly:

  • Escapes the folder_id to prevent Drive API query injection (lines 66-67)
  • Checks for already-processed files using audio_uuid + source pattern (lines 93-96)
  • Attaches the Drive file ID as audio_uuid for downstream processing (line 106)

This ensures both security and prevents duplicate processing of Drive files.

Also applies to: 93-96, 106-106

Copy link
Collaborator

@AnkushMalaker AnkushMalaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@thestumonkey thestumonkey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@AnkushMalaker
Copy link
Collaborator

@01PrathamS Thanks! this is merged to dev!
It'll be included in the release when its merged to main :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants