audio upload extension with gdrive credentials#175
audio upload extension with gdrive credentials#17501PrathamS wants to merge 11 commits intoSimpleOpenSoftware:mainfrom
Conversation
|
@coderabbitai review |
1 similar comment
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
WalkthroughAdds Google Drive audio file integration to the application by introducing dependencies, configuration properties, utilities for downloading audio files from Drive folders, a Google Drive API client, an API endpoint, and frontend UI controls to upload audio from Google Drive. Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant Frontend as React UI
participant API as Backend API
participant GDrive as Google Drive API
participant Controller as Audio Controller
participant DB as Database
User->>Frontend: Paste Drive Folder ID + Click Submit
Frontend->>API: POST /api/audio/upload_audio_from_gdrive
API->>API: Load Google Drive Client
API->>GDrive: Query folder for audio files
GDrive-->>API: File list
loop For each audio file
API->>GDrive: Download file stream
GDrive-->>API: File content
API->>API: Wrap as UploadFile<br/>(store Drive ID as audio_uuid)
end
API->>Controller: upload_and_process_audio_files<br/>(files, source="gdrive")
Controller->>DB: Store AudioFile with source="gdrive"
Controller->>DB: Process audio (transcription, etc.)
Controller-->>API: Success
API-->>Frontend: Upload completed
Frontend->>User: Show success message
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used📓 Path-based instructions (1)**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
🧠 Learnings (1)📚 Learning: 2025-12-08T23:52:34.959ZApplied to files:
🧬 Code graph analysis (1)backends/advanced/src/advanced_omi_backend/controllers/audio_controller.py (1)
🔇 Additional comments (3)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (6)
backends/advanced/webui/src/pages/Upload.tsx (2)
197-219: Update placeholder text to reflect Google Drive folder usage.The placeholder
https://example.com/audio.wavis misleading since this feature accepts a Google Drive folder ID, not a direct audio URL. Consider updating the UI label and placeholder to be clearer.<label className="block mb-2 font-medium text-gray-900 dark:text-gray-100"> - Paste audio URL: + Google Drive Folder ID: </label> <div className="flex space-x-2"> <input type="text" value={audioUrl} onChange={(e) => setAudioUrl(e.target.value)} - placeholder="https://example.com/audio.wav" + placeholder="e.g., 1ABC123xyz (from folders/1ABC123xyz in Drive URL)" className="flex-1 px-3 py-2 border rounded-lg dark:bg-gray-800 dark:text-gray-100" /> <button onClick={handleUrlSubmit} disabled={isUploading || !audioUrl} className="px-4 py-2 bg-blue-600 text-white rounded-lg hover:bg-blue-700 disabled:opacity-50 disabled:cursor-not-allowed" > - {isUploading ? 'Submitting...' : 'Submit URL'} + {isUploading ? 'Submitting...' : 'Import from Drive'} </button>
31-63: SharedisUploadingstate may cause UX confusion.The
isUploadingstate is shared between URL upload and file upload flows. When uploading from URL, the file upload "Upload All" button will also appear disabled, and vice versa. Consider using a separate state for URL upload or a more granular approach.+ const [isUrlUploading, setIsUrlUploading] = useState(false) const handleUrlSubmit = async () => { if (!audioUrl) return - setIsUploading(true) + setIsUrlUploading(true) setUrlUploadStatus({ type: null, message: '' }) try { // ... existing code ... } finally { - setIsUploading(false) + setIsUrlUploading(false) } }Then update the button disabled logic accordingly.
backends/advanced/src/advanced_omi_backend/app_config.py (1)
88-90: Make credentials path configurable via environment variable.The hardcoded path
"data/gdrive_service_account.json"limits deployment flexibility. Consider making this configurable, consistent with other service configurations in this file.- self.gdrive_credentials_path = "data/gdrive_service_account.json" + self.gdrive_credentials_path = os.getenv( + "GDRIVE_CREDENTIALS_PATH", "data/gdrive_service_account.json" + )backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py (3)
29-31: Prefix unused variable with underscore.The
statusvariable fromdownloader.next_chunk()is never used.- status, done = downloader.next_chunk() + _status, done = downloader.next_chunk()
84-87: Use exception chaining for better debugging.When re-raising as
AudioValidationError, preserve the original exception context usingfrom efor better stack traces.except Exception as e: if isinstance(e, AudioValidationError): raise - raise AudioValidationError(f"Google Drive API Error: {repr(e)}") + raise AudioValidationError(f"Google Drive API Error: {e!r}") from e
20-46: Async function performs blocking I/O.
download_and_wrap_drive_fileis declaredasyncbut calls synchronous Google API methods (service.files().get_media(),downloader.next_chunk()) which will block the event loop. For a backend handling concurrent requests, this could cause latency issues.Consider running the blocking operations in a thread pool executor, or document that this is acceptable for the expected load.
import asyncio from concurrent.futures import ThreadPoolExecutor _executor = ThreadPoolExecutor(max_workers=4) async def download_and_wrap_drive_file(service, file_item): loop = asyncio.get_event_loop() # Run blocking download in thread pool content = await loop.run_in_executor( _executor, lambda: _download_file_sync(service, file_item) ) # ... rest of the function
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
backends/advanced/pyproject.toml(1 hunks)backends/advanced/src/advanced_omi_backend/app_config.py(1 hunks)backends/advanced/src/advanced_omi_backend/routers/modules/audio_routes.py(1 hunks)backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py(1 hunks)backends/advanced/webui/src/pages/Upload.tsx(3 hunks)backends/advanced/webui/src/services/api.ts(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py (1)
backends/advanced/src/advanced_omi_backend/app_config.py (2)
get_app_config(117-119)get_gdrive_service(92-110)
backends/advanced/src/advanced_omi_backend/routers/modules/audio_routes.py (3)
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py (1)
download_audio_files_from_drive(52-87)backends/advanced/src/advanced_omi_backend/models/user.py (1)
User(54-101)backends/advanced/src/advanced_omi_backend/controllers/audio_controller.py (1)
upload_and_process_audio_files(37-192)
backends/advanced/webui/src/pages/Upload.tsx (1)
backends/advanced/webui/src/services/api.ts (1)
uploadApi(167-191)
🪛 Ruff (0.14.7)
backends/advanced/src/advanced_omi_backend/app_config.py
100-102: Avoid specifying long messages outside the exception class
(TRY003)
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py
31-31: Unpacked variable status is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
36-36: Avoid specifying long messages outside the exception class
(TRY003)
54-54: Avoid specifying long messages outside the exception class
(TRY003)
76-76: Abstract raise to an inner function
(TRY301)
76-76: Avoid specifying long messages outside the exception class
(TRY003)
82-82: Consider moving this statement to an else block
(TRY300)
87-87: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
87-87: Avoid specifying long messages outside the exception class
(TRY003)
87-87: Use explicit conversion flag
Replace with conversion flag
(RUF010)
backends/advanced/src/advanced_omi_backend/routers/modules/audio_routes.py
23-23: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
🔇 Additional comments (3)
backends/advanced/webui/src/services/api.ts (1)
180-190: LGTM!The API method correctly maps
drive_folder_idto theurlquery parameter to match the backend's alias, uses appropriate timeout, and follows the existing pattern.backends/advanced/pyproject.toml (1)
27-29: Verifygoogle-auth-oauthlibis required.The code uses
google.oauth2.service_account.Credentialsfor service account authentication, which is provided bygoogle-auth(a transitive dependency ofgoogle-api-python-client). Thegoogle-auth-oauthlibpackage is designed for OAuth 2.0 user consent flows and may not be necessary for service account authentication.Confirm whether
google-auth-oauthlibis actively used elsewhere in the codebase before including it as a dependency.backends/advanced/src/advanced_omi_backend/app_config.py (1)
92-110: Verify service account credentials file is not committed to the repository.Service account JSON keys contain sensitive credentials that should never be committed to version control per Google's official security best practices. Ensure
data/gdrive_service_account.jsonis in.gitignoreand document the expected file location and setup process in README or deployment documentation so developers know where to obtain/place credentials securely.
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py
Outdated
Show resolved
Hide resolved
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
backends/advanced/src/advanced_omi_backend/routers/modules/audio_routes.py (1)
14-31: AddAudioValidationErrorhandling and a docstring for the Drive upload endpoint
download_audio_files_from_driveraisesAudioValidationErrorfor invalid folder IDs, missing audio files, and Drive API errors. Right now those bubble up as 500s instead of a client‑visible 400, and this new endpoint lacks a docstring while/uploadhas one.I suggest:
- Importing
AudioValidationError.- Wrapping the download call in
try/exceptand translating toHTTPException(status_code=400, detail=str(e)).- Adding a brief docstring describing behavior, permissions, and expected input.
Example:
-from advanced_omi_backend.utils.gdrive_audio_utils import download_audio_files_from_drive +from advanced_omi_backend.utils.gdrive_audio_utils import ( + download_audio_files_from_drive, + AudioValidationError, +) @router.post("/upload_audio_from_url") async def upload_audio_from_drive_folder( - drive_folder_id: str = Query(...,alias="url", description="Google Drive Folder ID containing audio files (e.g., the string after /folders/ in the URL)"), + drive_folder_id: str = Query( + ..., + alias="url", + description="Google Drive folder ID containing audio files (string after /folders/ in the URL)", + ), current_user: User = Depends(current_superuser), device_name: str = Query(default="upload"), auto_generate_client: bool = Query(default=True), ): - files = await download_audio_files_from_drive(drive_folder_id) - - return await audio_controller.upload_and_process_audio_files( - current_user, files, device_name, auto_generate_client - ) + """ + Upload and process audio files from a Google Drive folder. Admin only. + + The folder must be shared with the configured service account; only supported + audio file types will be processed. + """ + try: + files = await download_audio_files_from_drive(drive_folder_id) + except AudioValidationError as e: + raise HTTPException(status_code=400, detail=str(e)) + + return await audio_controller.upload_and_process_audio_files( + current_user, files, device_name, auto_generate_client + )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
backends/advanced/pyproject.toml(1 hunks)backends/advanced/src/advanced_omi_backend/routers/modules/audio_routes.py(1 hunks)backends/advanced/webui/src/services/api.ts(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- backends/advanced/pyproject.toml
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Useuv run pythoncommand instead of directpythonorpython3for executing Python scripts
Use Black formatter with 100-character line length for Python code style
Use isort for Python import organization
NEVER import modules in the middle of functions or files - ALL imports must be at the top of the file after the docstring
Group imports in order: standard library, third-party, local imports
Use lazy imports sparingly and only when absolutely necessary for circular import issues in Python
Always raise errors explicitly in Python code, never silently ignore errors or failures
Use explicit error handling with proper exceptions in Python rather than silent failures
Understand data structures and research input/response or class structure instead of adding defensivehasattr()checks in Python
Files:
backends/advanced/src/advanced_omi_backend/routers/modules/audio_routes.py
🧬 Code graph analysis (1)
backends/advanced/src/advanced_omi_backend/routers/modules/audio_routes.py (2)
backends/advanced/src/advanced_omi_backend/app_config.py (1)
get_audio_chunk_dir(122-124)backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py (1)
download_audio_files_from_drive(52-87)
🪛 Ruff (0.14.7)
backends/advanced/src/advanced_omi_backend/routers/modules/audio_routes.py
23-23: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
🔇 Additional comments (2)
backends/advanced/webui/src/services/api.ts (2)
182-193:uploadAudioFilesrefactor preserves behaviorThe switch to a concise arrow function keeps the same endpoint, headers, timeout, and progress handling while simplifying the implementation. No functional issues spotted.
194-204: Confirm folder ID vs full URL handling for Drive uploadsThe new
uploadAudioFromUrlmethod is wired correctly to the backend (/api/audio/upload_audio_from_urlwithurl,device_name, andauto_generate_clientas query params and a 5‑minute timeout).Please double‑check that the value passed as
drive_folder_idis always the raw folder ID (the string after/folders/…), since the backend’sdrive_folder_idis used directly as a Drive parent ID. If the UI or docs promise support for pasting a full share URL, you’ll want either:
- to parse the ID on the frontend before calling this API, or
- to update the backend/utilities to accept and normalize full URLs.
backends/advanced/src/advanced_omi_backend/routers/modules/audio_routes.py
Outdated
Show resolved
Hide resolved
AnkushMalaker
left a comment
There was a problem hiding this comment.
Can you make changes according to coderabbit comments? As required
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py (2)
1-8: Imports are well-organized.The import grouping follows the coding guidelines correctly (standard library, third-party, local). However,
atexitis imported but not used in the current implementation.Consider removing the unused import:
-import atexit
10-11: AUDIO_EXTENSIONS correctly includes.m4anow.Good fix addressing the past review comment. However,
FOLDER_MIMETYPEis defined but never used in this module.If
FOLDER_MIMETYPEisn't needed elsewhere, consider removing it:-FOLDER_MIMETYPE = "application/vnd.google-apps.folder"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py(1 hunks)backends/advanced/webui/src/pages/Upload.tsx(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- backends/advanced/webui/src/pages/Upload.tsx
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Useuv run pythoncommand instead of directpythonorpython3for executing Python scripts
Use Black formatter with 100-character line length for Python code style
Use isort for Python import organization
NEVER import modules in the middle of functions or files - ALL imports must be at the top of the file after the docstring
Group imports in order: standard library, third-party, local imports
Use lazy imports sparingly and only when absolutely necessary for circular import issues in Python
Always raise errors explicitly in Python code, never silently ignore errors or failures
Use explicit error handling with proper exceptions in Python rather than silent failures
Understand data structures and research input/response or class structure instead of adding defensivehasattr()checks in Python
Files:
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py
🧬 Code graph analysis (1)
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py (1)
backends/advanced/src/advanced_omi_backend/app_config.py (2)
get_app_config(117-119)get_gdrive_service(92-110)
🪛 Ruff (0.14.7)
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py
32-32: Unpacked variable status is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
37-37: Avoid specifying long messages outside the exception class
(TRY003)
65-65: Avoid specifying long messages outside the exception class
(TRY003)
87-87: Abstract raise to an inner function
(TRY301)
87-87: Avoid specifying long messages outside the exception class
(TRY003)
93-93: Consider moving this statement to an else block
(TRY300)
98-98: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
98-98: Avoid specifying long messages outside the exception class
(TRY003)
98-98: Use explicit conversion flag
Replace with conversion flag
(RUF010)
🔇 Additional comments (1)
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py (1)
14-15: LGTM!Clean exception class for domain-specific validation errors.
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py (2)
1-11: Remove unused import and constant.The
atexitmodule (line 4) andFOLDER_MIMETYPEconstant (line 11) are not used anywhere in this file. Based on past review comments,atexitwas needed in an earlier version but is no longer required withSpooledTemporaryFile.Apply this diff:
import os import io import tempfile -import atexit from typing import List from starlette.datastructures import UploadFile as StarletteUploadFile from googleapiclient.http import MediaIoBaseDownload from advanced_omi_backend.app_config import get_app_config AUDIO_EXTENSIONS = (".wav", ".mp3", ".flac", ".ogg", ".m4a") -FOLDER_MIMETYPE = "application/vnd.google-apps.folder"
18-52: Simplify and add type hints.Two optional improvements:
Unnecessary close wrapper (lines 41-50): The wrapper only calls
original_close()then passes. SinceSpooledTemporaryFileauto-cleans when closed andStarletteUploadFile.close()properly closes the underlying file, this wrapper adds no value.Missing type hints: For consistency with
download_audio_files_from_drive, consider adding type hints to parameters and return type.Apply this diff:
-def download_and_wrap_drive_file(service, file_item): +def download_and_wrap_drive_file(service, file_item: dict) -> StarletteUploadFile: + """Download a Google Drive file and wrap it as a Starlette UploadFile. + + Args: + service: Google Drive API service instance + file_item: Dict with 'id' and 'name' keys from Drive API response + + Returns: + StarletteUploadFile wrapper around the downloaded content + """ file_id = file_item["id"] name = file_item["name"] request = service.files().get_media(fileId=file_id) fh = io.BytesIO() downloader = MediaIoBaseDownload(fh, request) done = False while not done: _status, done = downloader.next_chunk() content = fh.getvalue() if not content: raise AudioValidationError(f"Downloaded Google Drive file '{name}' was empty") tmp_file = tempfile.SpooledTemporaryFile(max_size=10*1024*1024) # 10 MB tmp_file.write(content) tmp_file.seek(0) - upload_file = StarletteUploadFile(filename=name, file=tmp_file) - - original_close = upload_file.close - - def wrapped_close(): - try: - original_close() - finally: - # SpooledTemporaryFile auto-cleans when closed; no unlink needed - pass - - upload_file.close = wrapped_close - - return upload_file + return StarletteUploadFile(filename=name, file=tmp_file)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Useuv run pythoncommand instead of directpythonorpython3for executing Python scripts
Use Black formatter with 100-character line length for Python code style
Use isort for Python import organization
NEVER import modules in the middle of functions or files - ALL imports must be at the top of the file after the docstring
Group imports in order: standard library, third-party, local imports
Use lazy imports sparingly and only when absolutely necessary for circular import issues in Python
Always raise errors explicitly in Python code, never silently ignore errors or failures
Use explicit error handling with proper exceptions in Python rather than silent failures
Understand data structures and research input/response or class structure instead of adding defensivehasattr()checks in Python
Files:
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py
🧬 Code graph analysis (1)
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py (1)
backends/advanced/src/advanced_omi_backend/app_config.py (2)
get_app_config(117-119)get_gdrive_service(92-110)
🪛 Ruff (0.14.7)
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py
34-34: Avoid specifying long messages outside the exception class
(TRY003)
59-59: Avoid specifying long messages outside the exception class
(TRY003)
81-81: Abstract raise to an inner function
(TRY301)
81-81: Avoid specifying long messages outside the exception class
(TRY003)
88-88: Consider moving this statement to an else block
(TRY300)
93-93: Avoid specifying long messages outside the exception class
(TRY003)
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py (2)
4-4: Remove unusedatexitimport.The
atexitmodule is imported but never used in this file. It was likely part of an earlier cleanup implementation before switching toSpooledTemporaryFile.Apply this diff:
-import atexit
67-72: Consider adding pagination for large folders.The Drive API's
files().list()method returns paginated results (default page size is 100 files). If a Drive folder contains more than 100 files, only the first page will be processed, potentially missing audio files in subsequent pages. While this may be uncommon for typical use cases, it could silently drop files in larger folders.Consider adding pagination handling:
response = service.files().list( q=query, fields="files(id, name, mimeType)", includeItemsFromAllDrives=False, supportsAllDrives=False, ).execute() all_files = response.get("files", []) + + # Handle pagination if folder has many files + while "nextPageToken" in response: + response = service.files().list( + q=query, + fields="files(id, name, mimeType)", + includeItemsFromAllDrives=False, + supportsAllDrives=False, + pageToken=response["nextPageToken"], + ).execute() + all_files.extend(response.get("files", []))
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Useuv run pythoncommand instead of directpythonorpython3for executing Python scripts
Use Black formatter with 100-character line length for Python code style
Use isort for Python import organization
NEVER import modules in the middle of functions or files - ALL imports must be at the top of the file after the docstring
Group imports in order: standard library, third-party, local imports
Use lazy imports sparingly and only when absolutely necessary for circular import issues in Python
Always raise errors explicitly in Python code, never silently ignore errors or failures
Use explicit error handling with proper exceptions in Python rather than silent failures
Understand data structures and research input/response or class structure instead of adding defensivehasattr()checks in Python
Files:
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py
🧬 Code graph analysis (1)
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py (1)
backends/advanced/src/advanced_omi_backend/app_config.py (2)
get_app_config(117-119)get_gdrive_service(92-110)
🪛 Ruff (0.14.7)
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py
34-34: Avoid specifying long messages outside the exception class
(TRY003)
59-59: Avoid specifying long messages outside the exception class
(TRY003)
82-82: Abstract raise to an inner function
(TRY301)
82-82: Avoid specifying long messages outside the exception class
(TRY003)
89-89: Consider moving this statement to an else block
(TRY300)
94-94: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (2)
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py (2)
43-48: Cleanup pattern correctly implemented.The wrapped close function correctly handles cleanup for
SpooledTemporaryFile, which auto-cleans both in-memory and on-disk backing files when closed. The wrapper pattern provides a clear extension point if additional cleanup logic is needed in the future.
64-65: Query injection properly mitigated.The folder ID is correctly escaped before being used in the Drive API query string, following Google's recommended practices. Escaping backslashes first, then single quotes, prevents query injection attacks while maintaining correct query syntax.
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py (1)
20-55: Simplify download helper: make it synchronous and drop the close wrapper.
download_and_wrap_drive_fileisasyncbut performs only synchronous Google API and file operations, and the customwrapped_closenever awaits the original asyncUploadFile.close, which can be surprising if.close()is ever awaited downstream.Consider simplifying this helper to be a plain synchronous function that streams directly into the spooled file and relies on Starlette’s default
closeimplementation:-async def download_and_wrap_drive_file(service, file_item): +def download_and_wrap_drive_file(service, file_item): file_id = file_item["id"] name = file_item["name"] - request = service.files().get_media(fileId=file_id) - - fh = io.BytesIO() - downloader = MediaIoBaseDownload(fh, request) - - done = False - while not done: - _status, done = downloader.next_chunk() - - content = fh.getvalue() - - if not content: - raise AudioValidationError(f"Downloaded Google Drive file '{name}' was empty") - - tmp_file = tempfile.SpooledTemporaryFile(max_size=10*1024*1024) # 10 MB - tmp_file.write(content) - tmp_file.seek(0) - upload_file = StarletteUploadFile(filename=name, file=tmp_file) - - original_close = upload_file.close - - def wrapped_close(): - try: - original_close() - finally: - # SpooledTemporaryFile auto-cleans when closed; no unlink needed - pass - - upload_file.close = wrapped_close - - return upload_file + tmp_file = tempfile.SpooledTemporaryFile(max_size=10 * 1024 * 1024) # 10 MB + downloader = MediaIoBaseDownload(tmp_file, request=service.files().get_media(fileId=file_id)) + + done = False + while not done: + _status, done = downloader.next_chunk() + + tmp_file.seek(0) + return StarletteUploadFile(filename=name, file=tmp_file)And in the caller (line 99), drop the
awaitonce this helper is synchronous:wrapped_file = download_and_wrap_drive_file(service, item)This removes the unnecessary async wrapper, avoids holding duplicate copies of the file in memory, and keeps
UploadFile.closesemantics intact.
🧹 Nitpick comments (2)
backends/advanced/src/advanced_omi_backend/utils/audio_utils.py (1)
102-112: gdrive_file_id plumbing into AudioFile looks correct; consider docstring update.Adding
gdrive_file_idas a trailing keyword argument keeps existing call sites safe, and persisting it intoAudioFilewill support de-duplication and Drive lookups as intended. You may want to extend thewrite_audio_filedocstring to document this new parameter for future maintainers.Also applies to: 198-209
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py (1)
59-123: Minor cleanups in folder download/de‑dup logic.A few small tweaks will make this code clearer and closer to project guidelines:
- Use direct attribute assignment instead of
setattrwith a constant name (Ruff B010):- wrapped_file = await download_and_wrap_drive_file(service, item) - # Attach the file_id to the UploadFile object for later use - setattr(wrapped_file, "gdrive_file_id", file_id) + wrapped_file = await download_and_wrap_drive_file(service, item) + # Attach the Drive file ID for downstream processing + wrapped_file.gdrive_file_id = file_id
Consider batching the
is_drive_file_already_processedcheck if you expect large folders, e.g., by fetching all existingAudioFiledocuments for the set of Drive IDs in one query and filtering in memory, to avoid N round‑trips.The function-scope import in
is_drive_file_already_processedviolates the “no mid-file imports” guideline unless it’s strictly for a circular-import workaround. If there’s no cycle here, prefer a top-level import; if there is, add a short comment explaining why this needs to stay lazy:-from advanced_omi_backend.models.audio_file import AudioFile +from advanced_omi_backend.models.audio_file import AudioFilemoved to the top of the module (or keep it here with an explanatory comment if circular imports require it).
As per coding guidelines, …
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
backends/advanced/src/advanced_omi_backend/controllers/audio_controller.py(3 hunks)backends/advanced/src/advanced_omi_backend/models/audio_file.py(1 hunks)backends/advanced/src/advanced_omi_backend/utils/audio_utils.py(2 hunks)backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Useuv run pythoncommand instead of directpythonorpython3for executing Python scripts
Use Black formatter with 100-character line length for Python code style
Use isort for Python import organization
NEVER import modules in the middle of functions or files - ALL imports must be at the top of the file after the docstring
Group imports in order: standard library, third-party, local imports
Use lazy imports sparingly and only when absolutely necessary for circular import issues in Python
Always raise errors explicitly in Python code, never silently ignore errors or failures
Use explicit error handling with proper exceptions in Python rather than silent failures
Understand data structures and research input/response or class structure instead of adding defensivehasattr()checks in Python
Files:
backends/advanced/src/advanced_omi_backend/models/audio_file.pybackends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.pybackends/advanced/src/advanced_omi_backend/utils/audio_utils.pybackends/advanced/src/advanced_omi_backend/controllers/audio_controller.py
🧠 Learnings (1)
📚 Learning: 2025-12-08T23:52:34.959Z
Learnt from: AnkushMalaker
Repo: chronicler-ai/chronicle PR: 178
File: backends/advanced/src/advanced_omi_backend/services/memory/providers/mycelia.py:218-223
Timestamp: 2025-12-08T23:52:34.959Z
Learning: In Python code (chronicle project), prefer logging.exception() inside except blocks to automatically log the full stack trace. When re-raising exceptions, always chain with 'raise ... from e' to preserve the original context; use 'raise ... from None' only if you explicitly want to suppress the context. This improves debuggability across Python files.
Applied to files:
backends/advanced/src/advanced_omi_backend/models/audio_file.pybackends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.pybackends/advanced/src/advanced_omi_backend/utils/audio_utils.pybackends/advanced/src/advanced_omi_backend/controllers/audio_controller.py
🪛 Ruff (0.14.8)
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py
36-36: Avoid specifying long messages outside the exception class
(TRY003)
61-61: Avoid specifying long messages outside the exception class
(TRY003)
84-84: Abstract raise to an inner function
(TRY301)
84-84: Avoid specifying long messages outside the exception class
(TRY003)
101-101: Do not call setattr with a constant attribute value. It is not any safer than normal property access.
Replace setattr with assignment
(B010)
105-105: Abstract raise to an inner function
(TRY301)
105-105: Avoid specifying long messages outside the exception class
(TRY003)
107-107: Consider moving this statement to an else block
(TRY300)
112-112: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (2)
backends/advanced/src/advanced_omi_backend/controllers/audio_controller.py (1)
81-85: Drive ID propagation through upload pipeline is sound.Using
getattr(file, "gdrive_file_id", None)cleanly keeps Drive-specific metadata optional while preserving existing behavior for standard uploads, and forwarding it intowrite_audio_filewires the model change end-to-end without breaking callers.Also applies to: 98-110, 123-140
backends/advanced/src/advanced_omi_backend/models/audio_file.py (1)
54-65: Model and index extension for gdrive_file_id aligns with usage.Adding
gdrive_file_idplus an index is consistent with the new de-duplication check and should keepis_drive_file_already_processedperformant. No behavioral regressions expected.
| self.gdrive_scopes = ["https://www.googleapis.com/auth/drive.readonly"] | ||
| self._gdrive_service = None | ||
|
|
||
| def get_gdrive_service(self): |
There was a problem hiding this comment.
This probably shouldn't be in app_config. can have a gdrive service in services or have a google_client in clients. Then we initialise it at the point of use
| router = APIRouter(prefix="/audio", tags=["audio"]) | ||
|
|
||
|
|
||
| @router.post("/upload_audio_from_url") |
There was a problem hiding this comment.
this router name suggests I can use any url, but looks like it's just Gdrive?
Probably the way we want to do this would be to upload from any url, but provide credentials in the header that we get from a google_auth client.
I'm not super familiar with how gdrive works though
| 📝 Upload Instructions | ||
| </h3> | ||
| <ul className="text-sm text-yellow-700 dark:text-yellow-300 space-y-1"> | ||
| <li>• Hello world</li> |
There was a problem hiding this comment.
assume this is test, remove
| }, | ||
| }), | ||
|
|
||
| uploadAudioFromUrl: (payload: { drive_folder_id: string; device_name?: string; auto_generate_client?: boolean }) => |
There was a problem hiding this comment.
As before, we either need a generic uploadAudiofromURL that gets called from uploadAudiofromGdrive, or change the name to be from gdrive
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (6)
backends/advanced/src/advanced_omi_backend/app_config.py (1)
88-89: Make GDrive credentials path configurable via environmentRight now the Drive credentials path and scopes are hard-coded. That works locally but is brittle across environments (Docker, k8s secrets, different mount paths).
Consider something like:
- self.gdrive_credentials_path = "data/gdrive_service_account.json" - self.gdrive_scopes = ["https://www.googleapis.com/auth/drive.readonly"] + self.gdrive_credentials_path = os.getenv( + "GDRIVE_CREDENTIALS_PATH", "data/gdrive_service_account.json" + ) + self.gdrive_scopes = [ + scope.strip() + for scope in os.getenv( + "GDRIVE_SCOPES", + "https://www.googleapis.com/auth/drive.readonly", + ).split(",") + if scope.strip() + ]This keeps sane defaults while allowing deployments to override path/scopes via env vars.
backends/advanced/webui/src/pages/Upload.tsx (3)
18-60: Avoid reusing stale uploadProgress for GDrive submissions
handleGDriveSubmitsetsisUploadingbut never touchesuploadProgress, so if a prior file upload ended at 100%, the GDrive flow will instantly show 100% progress even though it’s just starting.Either reset progress at the start of a GDrive submission, or track a separate progress flag for folder uploads, e.g.:
const handleGDriveSubmit = async () => { if (!gdriveFolderId) return - setIsUploading(true) + setIsUploading(true) + setUploadProgress(0)or introduce a dedicated
isGdriveUploadingboolean and gate the progress UI on that instead of the sharedisUploading.Also applies to: 189-225
80-85: Use functionalsetFilesupdates for all mutations
removeFileandclearCompletedstill close overfiles:setFiles(files.filter((f) => f.id !== id)) setFiles(files.filter((f) => f.status === 'pending' || f.status === 'uploading'))For consistency with the rest of the component and to avoid stale-closure issues under concurrent renders, prefer functional updates:
const removeFile = (id: string) => { - setFiles(files.filter((f) => f.id !== id)) + setFiles((prev) => prev.filter((f) => f.id !== id)) } const clearCompleted = () => { - setFiles(files.filter((f) => f.status === 'pending' || f.status === 'uploading')) + setFiles((prev) => + prev.filter((f) => f.status === 'pending' || f.status === 'uploading'), + ) }Also applies to: 140-142
144-150: Reuse sharedformatFileSizehelper instead of duplicating logicThis local
formatFileSizeduplicates the implementation inutils/fileHash.ts. To keep formatting consistent across the app and avoid future divergence, consider importing and using the shared helper here instead of maintaining two copies.backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py (2)
22-57: Consider makingdownload_and_wrap_drive_filea plain function
download_and_wrap_drive_fileperforms only synchronous work (HTTP/file I/O, noawait), yet it’s declaredasyncand awaited. That’s functionally correct, but a bit misleading and adds coroutine overhead without giving you any extra concurrency.Unless you plan to introduce actual async operations inside it, consider:
-async def download_and_wrap_drive_file(service, file_item): +def download_and_wrap_drive_file(service, file_item): @@ - # synchronous call now (but make the parent function async) - wrapped_file = await download_and_wrap_drive_file(service, item) + # synchronous call + wrapped_file = download_and_wrap_drive_file(service, item)
download_audio_files_from_drivecan stay async (it already awaits DB checks), while this helper remains a straightforward synchronous utility.Also applies to: 100-104
111-114: Optionally log full stack on unexpected Drive errorsThe
except Exception as eblock correctly re-raises asAudioValidationErrorwithraise ... from e. To make debugging production issues easier, you might also log the stack trace here:except Exception as e: if isinstance(e, AudioValidationError): raise - raise AudioValidationError(f"Google Drive API Error: {e}") from e + logger.exception("Google Drive API Error while listing/downloading files") + raise AudioValidationError(f"Google Drive API Error: {e}") from eThis preserves the current behavior while adding observability for operators.
Based on learnings, this follows the preferred
logging.exception+raise ... from epattern.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
backends/advanced/src/advanced_omi_backend/app_config.py(1 hunks)backends/advanced/src/advanced_omi_backend/clients/gdrive_audio_client.py(1 hunks)backends/advanced/src/advanced_omi_backend/routers/modules/audio_routes.py(1 hunks)backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py(1 hunks)backends/advanced/webui/src/pages/Upload.tsx(8 hunks)backends/advanced/webui/src/services/api.ts(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Useuv run pythoncommand instead of directpythonorpython3for executing Python scripts
Use Black formatter with 100-character line length for Python code style
Use isort for Python import organization
NEVER import modules in the middle of functions or files - ALL imports must be at the top of the file after the docstring
Group imports in order: standard library, third-party, local imports
Use lazy imports sparingly and only when absolutely necessary for circular import issues in Python
Always raise errors explicitly in Python code, never silently ignore errors or failures
Use explicit error handling with proper exceptions in Python rather than silent failures
Understand data structures and research input/response or class structure instead of adding defensivehasattr()checks in Python
Files:
backends/advanced/src/advanced_omi_backend/clients/gdrive_audio_client.pybackends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.pybackends/advanced/src/advanced_omi_backend/routers/modules/audio_routes.pybackends/advanced/src/advanced_omi_backend/app_config.py
🧠 Learnings (1)
📚 Learning: 2025-12-08T23:52:34.959Z
Learnt from: AnkushMalaker
Repo: chronicler-ai/chronicle PR: 178
File: backends/advanced/src/advanced_omi_backend/services/memory/providers/mycelia.py:218-223
Timestamp: 2025-12-08T23:52:34.959Z
Learning: In Python code (chronicle project), prefer logging.exception() inside except blocks to automatically log the full stack trace. When re-raising exceptions, always chain with 'raise ... from e' to preserve the original context; use 'raise ... from None' only if you explicitly want to suppress the context. This improves debuggability across Python files.
Applied to files:
backends/advanced/src/advanced_omi_backend/clients/gdrive_audio_client.pybackends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.pybackends/advanced/src/advanced_omi_backend/routers/modules/audio_routes.pybackends/advanced/src/advanced_omi_backend/app_config.py
🧬 Code graph analysis (4)
backends/advanced/src/advanced_omi_backend/clients/gdrive_audio_client.py (1)
backends/advanced/src/advanced_omi_backend/app_config.py (1)
get_app_config(96-98)
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py (4)
backends/advanced/src/advanced_omi_backend/app_config.py (1)
get_app_config(96-98)backends/advanced/src/advanced_omi_backend/clients/gdrive_audio_client.py (1)
get_google_drive_client(8-29)backends/advanced/src/advanced_omi_backend/utils/audio_utils.py (1)
AudioValidationError(28-30)backends/advanced/src/advanced_omi_backend/models/audio_file.py (1)
AudioFile(19-65)
backends/advanced/src/advanced_omi_backend/routers/modules/audio_routes.py (4)
backends/advanced/src/advanced_omi_backend/app_config.py (1)
get_audio_chunk_dir(101-103)backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py (1)
download_audio_files_from_drive(61-114)backends/advanced/src/advanced_omi_backend/models/user.py (1)
User(51-98)backends/advanced/src/advanced_omi_backend/controllers/audio_controller.py (1)
upload_and_process_audio_files(37-213)
backends/advanced/webui/src/pages/Upload.tsx (1)
backends/advanced/webui/src/services/api.ts (1)
uploadApi(181-203)
🪛 Ruff (0.14.8)
backends/advanced/src/advanced_omi_backend/clients/gdrive_audio_client.py
18-20: Avoid specifying long messages outside the exception class
(TRY003)
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py
38-38: Avoid specifying long messages outside the exception class
(TRY003)
63-63: Avoid specifying long messages outside the exception class
(TRY003)
86-86: Abstract raise to an inner function
(TRY301)
86-86: Avoid specifying long messages outside the exception class
(TRY003)
103-103: Do not call setattr with a constant attribute value. It is not any safer than normal property access.
Replace setattr with assignment
(B010)
107-107: Abstract raise to an inner function
(TRY301)
107-107: Avoid specifying long messages outside the exception class
(TRY003)
109-109: Consider moving this statement to an else block
(TRY300)
114-114: Avoid specifying long messages outside the exception class
(TRY003)
backends/advanced/src/advanced_omi_backend/routers/modules/audio_routes.py
23-23: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
🔇 Additional comments (2)
backends/advanced/src/advanced_omi_backend/clients/gdrive_audio_client.py (1)
1-29: Drive client singleton looks soundThe client factory cleanly validates the credentials path, builds a Drive v3 client with scoped service-account creds, and caches it for reuse. No functional issues stand out here.
backends/advanced/webui/src/services/api.ts (1)
181-203: GDrive upload API wiring looks consistent with existing upload flowThe new
uploadFromGDriveFoldermirrors the existing upload conventions (same base client, 5‑minute timeout, query params), and the route path matches the new backend endpoint. No changes needed here.
backends/advanced/src/advanced_omi_backend/routers/modules/audio_routes.py
Show resolved
Hide resolved
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (3)
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py (2)
20-54: Simplifydownload_and_wrap_drive_fileand drop the no-opclosewrapperThe core logic looks fine, but the
wrapped_closeindirection no longer adds any behavior—SpooledTemporaryFileis already cleaned up on close, and the wrapper just callsoriginal_close()thenpass. You can simplify by returning theUploadFiledirectly.- tmp_file = tempfile.SpooledTemporaryFile(max_size=10*1024*1024) # 10 MB - tmp_file.write(content) - tmp_file.seek(0) - upload_file = StarletteUploadFile(filename=name, file=tmp_file) - - original_close = upload_file.close - - def wrapped_close(): - try: - original_close() - finally: - # SpooledTemporaryFile auto-cleans when closed; no unlink needed - pass - - upload_file.close = wrapped_close - - return upload_file + tmp_file = tempfile.SpooledTemporaryFile(max_size=10 * 1024 * 1024) # 10 MB + tmp_file.write(content) + tmp_file.seek(0) + upload_file = StarletteUploadFile(filename=name, file=tmp_file) + return upload_file
59-112: Tighten Drive-folder error handling and fix stale commentThe overall flow (ID validation, extension filtering, skip of already-processed files) looks solid, but two things are worth adjusting:
- The comment at line 98 still says “synchronous call now”, but
download_and_wrap_drive_fileis async again. This is misleading for future readers.- The broad
except Exception as econverts all non-AudioValidationErrorfailures (including transient Drive API or network issues) intoAudioValidationError, which the router maps to HTTP 400. That blurs the line between user-facing validation errors and genuine server/infra problems.Based on learnings, it would also help debuggability to log the underlying exception with
audio_logger.exception(...)before re‑raising, and to keep exception chaining as you already do withfrom e.For example:
- for item in audio_files_metadata: + for item in audio_files_metadata: file_id = item["id"] # Get the Google Drive File ID @@ - # Check if the file is already processed + # Check if the file is already processed @@ - # synchronous call now (but make the parent function async) - wrapped_file = await download_and_wrap_drive_file(service, item) + # Download and wrap the Drive file + wrapped_file = await download_and_wrap_drive_file(service, item) @@ - except Exception as e: - if isinstance(e, AudioValidationError): - raise - raise AudioValidationError(f"Google Drive API Error: {e}") from e + except AudioValidationError: + # Propagate validation-style errors unchanged + raise + except Exception as e: + # Log unexpected Drive/API issues and surface as a wrapped error + audio_logger.exception("Google Drive API error while listing/downloading files") + raise AudioValidationError(f"Google Drive API Error: {e}") from eBased on learnings, this preserves stack traces, keeps validation vs. infrastructure errors distinguishable, and improves observability.
backends/advanced/src/advanced_omi_backend/routers/modules/audio_routes.py (1)
14-34: New/upload_audio_from_gdriveroute wiring looks good; add exception chainingThe endpoint correctly restricts access, delegates to
download_audio_files_from_drive, and reuses the existing upload pipeline. To keep traceback context (and satisfy Ruff B904), you can chain the HTTPException from the originalAudioValidationError.- try: - files = await download_audio_files_from_drive(gdrive_folder_id) - except AudioValidationError as e: - raise HTTPException(status_code=400, detail=str(e)) + try: + files = await download_audio_files_from_drive(gdrive_folder_id) + except AudioValidationError as e: + raise HTTPException(status_code=400, detail=str(e)) from eBased on learnings, this preserves the original error context while still returning a clean 400 to clients.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
backends/advanced/src/advanced_omi_backend/routers/modules/audio_routes.py(1 hunks)backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Useuv run pythoncommand instead of directpythonorpython3for executing Python scripts
Use Black formatter with 100-character line length for Python code style
Use isort for Python import organization
NEVER import modules in the middle of functions or files - ALL imports must be at the top of the file after the docstring
Group imports in order: standard library, third-party, local imports
Use lazy imports sparingly and only when absolutely necessary for circular import issues in Python
Always raise errors explicitly in Python code, never silently ignore errors or failures
Use explicit error handling with proper exceptions in Python rather than silent failures
Understand data structures and research input/response or class structure instead of adding defensivehasattr()checks in Python
Files:
backends/advanced/src/advanced_omi_backend/routers/modules/audio_routes.pybackends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py
🧠 Learnings (1)
📚 Learning: 2025-12-08T23:52:34.959Z
Learnt from: AnkushMalaker
Repo: chronicler-ai/chronicle PR: 178
File: backends/advanced/src/advanced_omi_backend/services/memory/providers/mycelia.py:218-223
Timestamp: 2025-12-08T23:52:34.959Z
Learning: In Python code (chronicle project), prefer logging.exception() inside except blocks to automatically log the full stack trace. When re-raising exceptions, always chain with 'raise ... from e' to preserve the original context; use 'raise ... from None' only if you explicitly want to suppress the context. This improves debuggability across Python files.
Applied to files:
backends/advanced/src/advanced_omi_backend/routers/modules/audio_routes.pybackends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py
🧬 Code graph analysis (2)
backends/advanced/src/advanced_omi_backend/routers/modules/audio_routes.py (4)
backends/advanced/src/advanced_omi_backend/app_config.py (1)
get_audio_chunk_dir(101-103)backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py (1)
download_audio_files_from_drive(59-112)backends/advanced/src/advanced_omi_backend/utils/audio_utils.py (1)
AudioValidationError(28-30)backends/advanced/src/advanced_omi_backend/controllers/audio_controller.py (1)
upload_and_process_audio_files(37-213)
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py (3)
backends/advanced/src/advanced_omi_backend/clients/gdrive_audio_client.py (1)
get_google_drive_client(8-29)backends/advanced/src/advanced_omi_backend/models/audio_file.py (1)
AudioFile(19-65)backends/advanced/src/advanced_omi_backend/utils/audio_utils.py (1)
AudioValidationError(28-30)
🪛 Ruff (0.14.8)
backends/advanced/src/advanced_omi_backend/routers/modules/audio_routes.py
23-23: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
30-30: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py
36-36: Avoid specifying long messages outside the exception class
(TRY003)
61-61: Avoid specifying long messages outside the exception class
(TRY003)
84-84: Abstract raise to an inner function
(TRY301)
84-84: Avoid specifying long messages outside the exception class
(TRY003)
105-105: Abstract raise to an inner function
(TRY301)
105-105: Avoid specifying long messages outside the exception class
(TRY003)
107-107: Consider moving this statement to an else block
(TRY300)
112-112: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (1)
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py (1)
115-122: Drive de-duplication helper is consistent with the model
is_drive_file_already_processedcleanly checksAudioFile.gdrive_file_idand short-circuits on falsy IDs. Given the indexedgdrive_file_idfield inAudioFile, this is a good, efficient way to avoid reprocessing.
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (1)
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py (1)
20-54: Remove unnecessary async decorator.
download_and_wrap_drive_fileis declaredasyncbut performs noawaitoperations—the Google Drive API calls are synchronous. This adds unnecessary overhead and is misleading to callers.-async def download_and_wrap_drive_file(service, file_item): +def download_and_wrap_drive_file(service, file_item): file_id = file_item["id"] name = file_item["name"]Also update the call site on line 104:
- wrapped_file = await download_and_wrap_drive_file(service, item) + wrapped_file = download_and_wrap_drive_file(service, item)Note: This was flagged in a previous review and marked as addressed, but the current code still has the issue.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
backends/advanced/src/advanced_omi_backend/controllers/audio_controller.py(4 hunks)backends/advanced/src/advanced_omi_backend/models/audio_file.py(2 hunks)backends/advanced/src/advanced_omi_backend/routers/modules/audio_routes.py(1 hunks)backends/advanced/src/advanced_omi_backend/utils/audio_utils.py(2 hunks)backends/advanced/src/advanced_omi_backend/utils/dropbox_audio_utils.py(1 hunks)backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- backends/advanced/src/advanced_omi_backend/utils/audio_utils.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Useuv run pythoncommand instead of directpythonorpython3for executing Python scripts
Use Black formatter with 100-character line length for Python code style
Use isort for Python import organization
NEVER import modules in the middle of functions or files - ALL imports must be at the top of the file after the docstring
Group imports in order: standard library, third-party, local imports
Use lazy imports sparingly and only when absolutely necessary for circular import issues in Python
Always raise errors explicitly in Python code, never silently ignore errors or failures
Use explicit error handling with proper exceptions in Python rather than silent failures
Understand data structures and research input/response or class structure instead of adding defensivehasattr()checks in Python
Files:
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.pybackends/advanced/src/advanced_omi_backend/utils/dropbox_audio_utils.pybackends/advanced/src/advanced_omi_backend/routers/modules/audio_routes.pybackends/advanced/src/advanced_omi_backend/models/audio_file.pybackends/advanced/src/advanced_omi_backend/controllers/audio_controller.py
🧠 Learnings (1)
📚 Learning: 2025-12-08T23:52:34.959Z
Learnt from: AnkushMalaker
Repo: chronicler-ai/chronicle PR: 178
File: backends/advanced/src/advanced_omi_backend/services/memory/providers/mycelia.py:218-223
Timestamp: 2025-12-08T23:52:34.959Z
Learning: In Python code (chronicle project), prefer logging.exception() inside except blocks to automatically log the full stack trace. When re-raising exceptions, always chain with 'raise ... from e' to preserve the original context; use 'raise ... from None' only if you explicitly want to suppress the context. This improves debuggability across Python files.
Applied to files:
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.pybackends/advanced/src/advanced_omi_backend/utils/dropbox_audio_utils.pybackends/advanced/src/advanced_omi_backend/routers/modules/audio_routes.pybackends/advanced/src/advanced_omi_backend/models/audio_file.pybackends/advanced/src/advanced_omi_backend/controllers/audio_controller.py
🧬 Code graph analysis (2)
backends/advanced/src/advanced_omi_backend/routers/modules/audio_routes.py (4)
backends/advanced/src/advanced_omi_backend/app_config.py (1)
get_audio_chunk_dir(101-103)backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py (1)
download_audio_files_from_drive(59-117)backends/advanced/src/advanced_omi_backend/utils/audio_utils.py (1)
AudioValidationError(28-30)backends/advanced/src/advanced_omi_backend/controllers/audio_controller.py (1)
upload_and_process_audio_files(37-216)
backends/advanced/src/advanced_omi_backend/controllers/audio_controller.py (1)
backends/advanced/src/advanced_omi_backend/models/user.py (1)
user_id(71-73)
🪛 Ruff (0.14.8)
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py
36-36: Avoid specifying long messages outside the exception class
(TRY003)
61-61: Avoid specifying long messages outside the exception class
(TRY003)
84-84: Abstract raise to an inner function
(TRY301)
84-84: Avoid specifying long messages outside the exception class
(TRY003)
110-110: Abstract raise to an inner function
(TRY301)
110-110: Avoid specifying long messages outside the exception class
(TRY003)
112-112: Consider moving this statement to an else block
(TRY300)
117-117: Avoid specifying long messages outside the exception class
(TRY003)
backends/advanced/src/advanced_omi_backend/utils/dropbox_audio_utils.py
21-21: Avoid specifying long messages outside the exception class
(TRY003)
31-31: Probable use of requests call without timeout
(S113)
37-39: Avoid specifying long messages outside the exception class
(TRY003)
43-43: Avoid specifying long messages outside the exception class
(TRY003)
65-65: Avoid specifying long messages outside the exception class
(TRY003)
69-69: Avoid specifying long messages outside the exception class
(TRY003)
85-85: Probable use of requests call without timeout
(S113)
92-92: Abstract raise to an inner function
(TRY301)
92-92: Avoid specifying long messages outside the exception class
(TRY003)
103-103: Abstract raise to an inner function
(TRY301)
103-103: Avoid specifying long messages outside the exception class
(TRY003)
123-123: Do not call setattr with a constant attribute value. It is not any safer than normal property access.
Replace setattr with assignment
(B010)
128-130: Abstract raise to an inner function
(TRY301)
128-130: Avoid specifying long messages outside the exception class
(TRY003)
132-132: Consider moving this statement to an else block
(TRY300)
137-137: Avoid specifying long messages outside the exception class
(TRY003)
backends/advanced/src/advanced_omi_backend/routers/modules/audio_routes.py
23-23: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
30-30: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
🔇 Additional comments (3)
backends/advanced/src/advanced_omi_backend/models/audio_file.py (1)
32-35: LGTM! Clean addition of source field.The new
sourcefield is properly typed, indexed, and documented. The default value "upload" maintains backward compatibility with existing code paths.backends/advanced/src/advanced_omi_backend/routers/modules/audio_routes.py (1)
20-34: LGTM! Well-structured endpoint with proper error handling.The new Google Drive upload endpoint correctly:
- Maps
AudioValidationErrorto HTTP 400 for user-facing validation issues- Requires superuser authentication
- Passes
source="gdrive"to the controller for proper audio_uuid handlingThe Ruff warning about
Dependsin default arguments is a false positive—this is the standard FastAPI pattern for dependency injection.backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py (1)
66-67: Excellent security and deduplication implementation.The code properly:
- Escapes the
folder_idto prevent Drive API query injection (lines 66-67)- Checks for already-processed files using
audio_uuid+sourcepattern (lines 93-96)- Attaches the Drive file ID as
audio_uuidfor downstream processing (line 106)This ensures both security and prevents duplicate processing of Drive files.
Also applies to: 93-96, 106-106
backends/advanced/src/advanced_omi_backend/controllers/audio_controller.py
Show resolved
Hide resolved
backends/advanced/src/advanced_omi_backend/controllers/audio_controller.py
Outdated
Show resolved
Hide resolved
backends/advanced/src/advanced_omi_backend/utils/dropbox_audio_utils.py
Outdated
Show resolved
Hide resolved
backends/advanced/src/advanced_omi_backend/utils/dropbox_audio_utils.py
Outdated
Show resolved
Hide resolved
backends/advanced/src/advanced_omi_backend/utils/dropbox_audio_utils.py
Outdated
Show resolved
Hide resolved
backends/advanced/src/advanced_omi_backend/utils/dropbox_audio_utils.py
Outdated
Show resolved
Hide resolved
|
@01PrathamS Thanks! this is merged to |
Feature: Google Drive Public Link Audio Upload
This PR introduces the ability to import and process audio files directly from a publicly shared Google Drive folder link, leveraging Google Cloud Service Account credentials for access.
🛠️ Implementation Details
1. Service Account Setup
A new service account JSON key file has been added to the
data/folder. This key was generated from:This credential is used by the backend service to authorize access to Google Drive files.
2. Code Flow Overview
The flow is initiated by the frontend submission of a public Google Drive URL/folder ID and is handled entirely by the new backend logic.
download_audio_files_from_drive(drive_folder_id)is called. This function:.wavfiles.files) is then passed directly to the existing audio processing utility:/uploadrouter.Testing Notes
Future Enhancements
addressing #151
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.