-
Notifications
You must be signed in to change notification settings - Fork 25
audio upload extension with gdrive credentials #175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
01PrathamS
wants to merge
11
commits into
SimpleOpenSoftware:main
from
01PrathamS:audio_upload_extend
Closed
Changes from all commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
44beeac
audio upload extension with gdrive credentials
01PrathamS d5b9518
FIX: API parameters
01PrathamS 9392989
Merge branch 'main' into audio_upload_extend
AnkushMalaker 5b5ea64
UPDATE: tmp files cleanup n code refactored as per review
01PrathamS 5abd99d
REFACTOR: minor refactor as per review
01PrathamS 3d00bac
REFACTOR: minor update as per review
01PrathamS b036185
UPDATE: gdrive sync logic
01PrathamS cff1a4c
REFACTOR: code update as per gdrive and update credential client
01PrathamS 6534288
REFACTOR: validation updated - as per review from CR
01PrathamS 1ff28cb
UPDATE: code has been refactore for UUID for diffrent audio upload so…
01PrathamS 8e5a6b2
REFACTOR: updated code as per review
01PrathamS File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
29 changes: 29 additions & 0 deletions
29
backends/advanced/src/advanced_omi_backend/clients/gdrive_audio_client.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,29 @@ | ||
| import os | ||
| from google.oauth2.service_account import Credentials | ||
| from googleapiclient.discovery import build | ||
| from advanced_omi_backend.app_config import get_app_config | ||
|
|
||
| _drive_client_cache = None | ||
|
|
||
| def get_google_drive_client(): | ||
| """Singleton Google Drive client.""" | ||
| global _drive_client_cache | ||
|
|
||
| if _drive_client_cache: | ||
| return _drive_client_cache | ||
|
|
||
| config = get_app_config() | ||
|
|
||
| if not os.path.exists(config.gdrive_credentials_path): | ||
| raise FileNotFoundError( | ||
| f"Missing Google Drive credentials at {config.gdrive_credentials_path}" | ||
| ) | ||
|
|
||
| creds = Credentials.from_service_account_file( | ||
| config.gdrive_credentials_path, | ||
| scopes=config.gdrive_scopes | ||
| ) | ||
|
|
||
| _drive_client_cache = build("drive", "v3", credentials=creds) | ||
|
|
||
| return _drive_client_cache |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
119 changes: 119 additions & 0 deletions
119
backends/advanced/src/advanced_omi_backend/utils/gdrive_audio_utils.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,119 @@ | ||
| import io | ||
| import tempfile | ||
| from typing import List | ||
| import logging | ||
| from starlette.datastructures import UploadFile as StarletteUploadFile | ||
| from googleapiclient.http import MediaIoBaseDownload | ||
| from advanced_omi_backend.clients.gdrive_audio_client import get_google_drive_client | ||
| from advanced_omi_backend.models.audio_file import AudioFile | ||
| from advanced_omi_backend.utils.audio_utils import AudioValidationError | ||
|
|
||
|
|
||
| logger = logging.getLogger(__name__) | ||
| audio_logger = logging.getLogger("audio_processing") | ||
|
|
||
| AUDIO_EXTENSIONS = (".wav", ".mp3", ".flac", ".ogg", ".m4a") | ||
| FOLDER_MIMETYPE = "application/vnd.google-apps.folder" | ||
|
|
||
|
|
||
|
|
||
| async def download_and_wrap_drive_file(service, file_item): | ||
| file_id = file_item["id"] | ||
| name = file_item["name"] | ||
|
|
||
| request = service.files().get_media(fileId=file_id) | ||
|
|
||
| fh = io.BytesIO() | ||
| downloader = MediaIoBaseDownload(fh, request) | ||
|
|
||
| done = False | ||
| while not done: | ||
| _status, done = downloader.next_chunk() | ||
|
|
||
| content = fh.getvalue() | ||
|
|
||
| if not content: | ||
| raise AudioValidationError(f"Downloaded Google Drive file '{name}' was empty") | ||
|
|
||
| tmp_file = tempfile.SpooledTemporaryFile(max_size=10*1024*1024) # 10 MB | ||
| tmp_file.write(content) | ||
| tmp_file.seek(0) | ||
| upload_file = StarletteUploadFile(filename=name, file=tmp_file) | ||
|
|
||
| original_close = upload_file.close | ||
|
|
||
| def wrapped_close(): | ||
| try: | ||
| original_close() | ||
| finally: | ||
| # SpooledTemporaryFile auto-cleans when closed; no unlink needed | ||
| pass | ||
|
|
||
| upload_file.close = wrapped_close | ||
|
|
||
| return upload_file | ||
|
|
||
| # ------------------------------------------------------------- | ||
| # LIST + DOWNLOAD FILES IN FOLDER (OAUTH) | ||
| # ------------------------------------------------------------- | ||
| async def download_audio_files_from_drive(folder_id: str) -> List[StarletteUploadFile]: | ||
| if not folder_id: | ||
| raise AudioValidationError("Google Drive folder ID is required.") | ||
|
|
||
| service = get_google_drive_client() | ||
|
|
||
| try: | ||
| escaped_folder_id = folder_id.replace("\\", "\\\\").replace("'", "\\'") | ||
| query = f"'{escaped_folder_id}' in parents and trashed = false" | ||
|
|
||
| response = service.files().list( | ||
| q=query, | ||
| fields="files(id, name, mimeType)", | ||
| includeItemsFromAllDrives=False, | ||
| supportsAllDrives=False, | ||
| ).execute() | ||
|
|
||
| all_files = response.get("files", []) | ||
|
|
||
| audio_files_metadata = [ | ||
| f for f in all_files | ||
| if f["name"].lower().endswith(AUDIO_EXTENSIONS) | ||
| ] | ||
|
|
||
| if not audio_files_metadata: | ||
| raise AudioValidationError("No audio files found in folder.") | ||
|
|
||
| wrapped_files = [] | ||
| skipped_count = 0 | ||
|
|
||
| for item in audio_files_metadata: | ||
| file_id = item["id"] # Get the Google Drive File ID | ||
|
|
||
| # Check if the file is already processed | ||
| existing = await AudioFile.find_one({ | ||
| "audio_uuid": file_id, | ||
| "source": "gdrive" | ||
| }) | ||
|
|
||
| if existing: | ||
| audio_logger.info(f"Skipping already processed file: {item['name']}") | ||
| skipped_count += 1 | ||
| continue | ||
|
|
||
| # synchronous call now (but make the parent function async) | ||
| wrapped_file = await download_and_wrap_drive_file(service, item) | ||
| # Attach the file_id to the UploadFile object for later use | ||
| wrapped_file.audio_uuid = file_id | ||
| wrapped_files.append(wrapped_file) | ||
|
|
||
| if not wrapped_files and skipped_count > 0: | ||
| raise AudioValidationError(f"All {skipped_count} files in the folder have already been processed.") | ||
|
|
||
| return wrapped_files | ||
|
|
||
| except Exception as e: | ||
| if isinstance(e, AudioValidationError): | ||
| raise | ||
| raise AudioValidationError(f"Google Drive API Error: {e}") from e | ||
|
|
||
|
|
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.