Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 27 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,17 +19,42 @@ pip install -r requirements.txt

## μ‹€ν–‰

개발 μ„œλ²„(Uvicorn) μ‹€ν–‰:

```bash
python -m app.main
uvicorn app.main:app --reload --port 8000
```

API λ¬Έμ„œ: `http://127.0.0.1:8000/docs`

## ν™˜κ²½ λ³€μˆ˜ μ„€μ •

ν”„λ‘œμ νŠΈ λ£¨νŠΈμ— `.env` νŒŒμΌμ„ μƒμ„±ν•˜κ³  λ‹€μŒ 값을 μ±„μš°μ„Έμš”.

### AWS S3 μ„€μ •
```
AWS_ACCESS_KEY_ID=...
AWS_SECRET_ACCESS_KEY=...
AWS_REGION=ap-northeast-2
S3_BUCKET_NAME=your-bucket
S3_PREFIX=voices
```

### Google Cloud Speech-to-Text μ„€μ •
```
# μ„œλΉ„μŠ€ 계정 ν‚€ 파일 경둜 μ„€μ •
GOOGLE_APPLICATION_CREDENTIALS=/path/to/your/service-account-key.json
```

`.env`λŠ” `app/__init__.py`μ—μ„œ μžλ™ λ‘œλ“œλ©λ‹ˆλ‹€.

## ν”„λ‘œμ νŠΈ ꡬ쑰

```
caring-voice/
β”œβ”€β”€ app/
β”‚ β”œβ”€β”€ __init__.py
β”‚ └── main.py # 메인 μ—”νŠΈλ¦¬ 포인트
β”‚ └── main.py # FastAPI μ—”νŠΈλ¦¬ 포인트 및 μ—”λ“œν¬μΈνŠΈ
β”œβ”€β”€ .gitignore
β”œβ”€β”€ README.md
β”œβ”€β”€ requirements.txt
Expand Down
3 changes: 3 additions & 0 deletions app/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
from dotenv import load_dotenv # type: ignore
load_dotenv()

11 changes: 11 additions & 0 deletions app/constants.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
import os

# μ—…λ‘œλ“œ κΈ°λ³Έ 베이슀 ν”„λ¦¬ν”½μŠ€ (ν™˜κ²½λ³€μˆ˜ S3_PREFIX둜 μ˜€λ²„λΌμ΄λ“œ κ°€λŠ₯)
VOICE_BASE_PREFIX = os.getenv("S3_PREFIX", "voices")

# κΈ°λ³Έ 폴더λͺ… (μš”μ²­μ— folder λ―Έμ§€μ • μ‹œ μ‚¬μš©)
DEFAULT_UPLOAD_FOLDER = "voiceFile"

# # ν•„μš” μ‹œ ν—ˆμš© 폴더 μ§‘ν•© μ •μ˜ (예: κ²€μ¦μš©)
# ALLOWED_FOLDERS = {"raw", "processed", "public"}

117 changes: 117 additions & 0 deletions app/emotion_service.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
import io
import os
import tempfile
from typing import Dict, Any
import librosa
import torch
from transformers import Wav2Vec2ForSequenceClassification, Wav2Vec2FeatureExtractor
import numpy as np


class EmotionAnalyzer:
def __init__(self):
self.model = None
self.feature_extractor = None
self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
self._load_model()

def _load_model(self):
"""Hugging Face λͺ¨λΈ λ‘œλ“œ"""
model_name = "jungjongho/wav2vec2-xlsr-korean-speech-emotion-recognition"

try:
self.model = Wav2Vec2ForSequenceClassification.from_pretrained(model_name)
self.feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained(model_name)
self.model.to(self.device)
self.model.eval()
except Exception as e:
print(f"λͺ¨λΈ λ‘œλ“œ μ‹€νŒ¨: {e}")
self.model = None
self.feature_extractor = None

def analyze_emotion(self, audio_file) -> Dict[str, Any]:
"""
μŒμ„± 파일의 감정을 λΆ„μ„ν•©λ‹ˆλ‹€.

Args:
audio_file: μ—…λ‘œλ“œλœ μŒμ„± 파일 (FastAPI UploadFile)

Returns:
Dict: 감정 뢄석 κ²°κ³Ό
"""
if not self.model or not self.feature_extractor:
return {
"error": "λͺ¨λΈμ΄ λ‘œλ“œλ˜μ§€ μ•Šμ•˜μŠ΅λ‹ˆλ‹€",
"emotion": "unknown",
"confidence": 0.0
}

try:
# μž„μ‹œ 파일둜 μ €μž₯
with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as tmp_file:
content = audio_file.file.read()
tmp_file.write(content)
tmp_file_path = tmp_file.name

# μ˜€λ””μ˜€ λ‘œλ“œ (16kHz둜 λ¦¬μƒ˜ν”Œλ§)
audio, sr = librosa.load(tmp_file_path, sr=16000)
Comment on lines +49 to +57
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | πŸ”΄ Critical

🧩 Analysis chain

파일 포인터 리셋 ν•„μš” 검증

audio_file.file.read()λ₯Ό ν˜ΈμΆœν•˜λ©΄ 파일 포인터가 μ†Œμ§„λ©λ‹ˆλ‹€. λ™μΌν•œ UploadFile 객체에 λŒ€ν•΄ 이 ν•¨μˆ˜κ°€ μ—¬λŸ¬ 번 호좜될 경우 λ¬Έμ œκ°€ λ°œμƒν•  수 μžˆμŠ΅λ‹ˆλ‹€.

λ‹€μŒ 슀크립트λ₯Ό μ‹€ν–‰ν•˜μ—¬ 파일이 μ—¬λŸ¬ 번 μ²˜λ¦¬λ˜λŠ”μ§€ ν™•μΈν•˜μ„Έμš”:

파일이 μ—¬λŸ¬ 번 μ²˜λ¦¬λ˜λŠ” 경우, λ‹€μŒκ³Ό 같이 파일 포인터λ₯Ό λ¦¬μ…‹ν•˜λŠ” 것을 κ³ λ €ν•˜μ„Έμš”:

         try:
             # μž„μ‹œ 파일둜 μ €μž₯
             with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as tmp_file:
                 content = audio_file.file.read()
+                audio_file.file.seek(0)  # 파일 포인터 리셋
                 tmp_file.write(content)
                 tmp_file_path = tmp_file.name

🏁 Script executed:

#!/bin/bash
# Description: Check if the audio_file is used multiple times in the codebase

# Search for multiple calls to analyze_emotion or analyze_voice_emotion on the same file object
rg -n -C5 'analyze_voice_emotion|analyze_emotion' --type=py -g '!**/test/**'

Length of output: 3131


파일 포인터 μ†Œμ§„ 문제 확인 - μ¦‰μ‹œ μˆ˜μ • ν•„μš”

μ½”λ“œ 검증 κ²°κ³Ό, λ™μΌν•œ UploadFile 객체가 순차적으둜 μ—¬λŸ¬ ν•¨μˆ˜μ— μ „λ‹¬λ˜λ©΄μ„œ 파일 포인터가 μ†Œμ§„λ˜λŠ” λ¬Έμ œκ°€ μ‹€μ œλ‘œ λ°œμƒν•©λ‹ˆλ‹€:

  • /upload μ—”λ“œν¬μΈνŠΈ (app/main.py 32-37): upload_fileobj()μ—μ„œ file.file을 읽은 ν›„, analyze_voice_emotion(file)이 ν˜ΈμΆœλ˜λŠ”λ° 이미 파일 포인터가 EOF μƒνƒœμž…λ‹ˆλ‹€.

  • /voices/upload-and-analyze μ—”λ“œν¬μΈνŠΈ (app/main.py 110-118): upload_fileobj() β†’ analyze_voice_emotion(file) β†’ transcribe_voice(file, language_code) μˆœμ„œλ‘œ λ™μΌν•œ 파일 객체가 μ„Έ 번 읽히렀고 μ‹œλ„ν•©λ‹ˆλ‹€.

ν•„μˆ˜ μˆ˜μ • 사항 (app/emotion_service.py 50-51):

         try:
             # μž„μ‹œ 파일둜 μ €μž₯
             with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as tmp_file:
                 content = audio_file.file.read()
+                audio_file.file.seek(0)  # λ‹€λ₯Έ ν•¨μˆ˜μ—μ„œλ„ 읽을 수 μžˆλ„λ‘ 포인터 리셋
                 tmp_file.write(content)
                 tmp_file_path = tmp_file.name
πŸ€– Prompt for AI Agents
In app/emotion_service.py around lines 48 to 56, the code reads directly from
the UploadFile.file which exhausts the file pointer and breaks downstream
readers; instead, read the entire uploaded file into memory once (bytes), then
create and use independent file-like objects (e.g., io.BytesIO or temporary
files) for each consumer or reset the pointer before each read. Update the
function to read bytes = await audio_file.read() (or file.file.read()), use that
bytes buffer to write the temp WAV or pass BytesIO copies into
librosa/transcription functions, ensure any temporary files are closed/removed,
and avoid reusing the original UploadFile.file without seeking.


# νŠΉμ„± μΆ”μΆœ
inputs = self.feature_extractor(
audio,
sampling_rate=16000,
return_tensors="pt",
padding=True
)

# GPU둜 이동
inputs = {k: v.to(self.device) for k, v in inputs.items()}

# μΆ”λ‘ 
with torch.no_grad():
outputs = self.model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)

# 감정 라벨 (λͺ¨λΈμ— 따라 μ‘°μ • ν•„μš”)
emotion_labels = ["neutral", "happy", "sad", "angry", "fear", "surprise", "disgust"]

# κ°€μž₯ 높은 ν™•λ₯ μ˜ 감정
predicted_class = torch.argmax(predictions, dim=-1).item()
confidence = predictions[0][predicted_class].item()
emotion = emotion_labels[predicted_class] if predicted_class < len(emotion_labels) else "unknown"

# λͺ¨λ“  κ°μ •μ˜ ν™•λ₯ 
emotion_scores = {
emotion_labels[i]: predictions[0][i].item()
for i in range(min(len(emotion_labels), predictions.shape[1]))
}

return {
"emotion": emotion,
"confidence": confidence,
"emotion_scores": emotion_scores,
"audio_duration": len(audio) / sr,
"sample_rate": sr
}

except Exception as e:
return {
"error": f"뢄석 쀑 였λ₯˜ λ°œμƒ: {str(e)}",
"emotion": "unknown",
"confidence": 0.0
}
finally:
# μž„μ‹œ 파일 정리
try:
os.unlink(tmp_file_path)
except OSError as e:
print(f"μž„μ‹œ 파일 μ‚­μ œ μ‹€νŒ¨: {tmp_file_path}, 였λ₯˜: {e}")


# μ „μ—­ μΈμŠ€ν„΄μŠ€
emotion_analyzer = EmotionAnalyzer()


def analyze_voice_emotion(audio_file) -> Dict[str, Any]:
"""μŒμ„± 감정 뢄석 ν•¨μˆ˜"""
return emotion_analyzer.analyze_emotion(audio_file)
139 changes: 138 additions & 1 deletion app/main.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,144 @@
from fastapi import FastAPI
import os
from typing import Optional
from fastapi import FastAPI, UploadFile, File, HTTPException, Form
from fastapi.responses import JSONResponse
from typing import List
from .s3_service import upload_fileobj, list_bucket_objects
from .constants import VOICE_BASE_PREFIX, DEFAULT_UPLOAD_FOLDER
from .emotion_service import analyze_voice_emotion
from .stt_service import transcribe_voice

app = FastAPI(title="Caring API")

@app.get("/health")
def health():
return {"status": "ok"}


# POST : upload voice
@app.post("/voices/upload")
async def upload_voice(
file: UploadFile = File(...),
folder: Optional[str] = Form(default=None), # 예: "raw" λ˜λŠ” "user123/session1"
):
bucket = os.getenv("S3_BUCKET_NAME")
if not bucket:
raise HTTPException(status_code=500, detail="S3_BUCKET_NAME not configured")

# ν‚€: optional prefix/YYYYMMDD_originalname
base_prefix = VOICE_BASE_PREFIX.rstrip("/")
effective_prefix = f"{base_prefix}/{folder or DEFAULT_UPLOAD_FOLDER}".rstrip("/")
filename = os.path.basename(file.filename or "upload.wav")
key = f"{effective_prefix}/{filename}"

# νŒŒμΌμ„ S3에 μ—…λ‘œλ“œ
# Content-Type μ €μž₯
upload_fileobj(bucket=bucket, key=key, fileobj=file.file, content_type=file.content_type)
# 이후 μ†ŒλΉ„μžλ₯Ό μœ„ν•΄ 포인터 리셋
try:
file.file.seek(0)
except Exception:
pass

# 감정 뢄석 μˆ˜ν–‰
emotion_result = analyze_voice_emotion(file)

# DBκ°€ μ—†μœΌλ―€λ‘œ, λ²„ν‚·μ˜ 파일 λͺ©λ‘μ„ λ°˜ν™˜
names = list_bucket_objects(bucket=bucket, prefix=effective_prefix)
return {
"uploaded": key,
"files": names,
"emotion_analysis": emotion_result
}


# GET : query my voice histories
@app.get("/voices")
async def list_voices(skip: int = 0, limit: int = 50, folder: Optional[str] = None):
bucket = os.getenv("S3_BUCKET_NAME")
if not bucket:
raise HTTPException(status_code=500, detail="S3_BUCKET_NAME not configured")
base_prefix = VOICE_BASE_PREFIX.rstrip("/")
effective_prefix = f"{base_prefix}/{folder or DEFAULT_UPLOAD_FOLDER}".rstrip("/")

keys = list_bucket_objects(bucket=bucket, prefix=effective_prefix)
# νŽ˜μ΄μ§• λΉ„μŠ·ν•˜κ²Œ slice만 적용
sliced = keys[skip: skip + limit]
return {"items": sliced, "count": len(sliced), "next": skip + len(sliced)}


# GET : query specific voice & show result
@app.get("/voices/{voice_id}")
async def get_voice(voice_id: str):
# λ‚΄λΆ€ λ‘œμ§μ€ μƒλž΅, 더미 상세 λ°˜ν™˜
result = {
"voice_id": voice_id,
"filename": f"{voice_id}.wav",
"status": "processed",
"duration_sec": 12.34,
"analysis": {"pitch_mean": 220.5, "energy": 0.82}
}
return JSONResponse(content=result)


# POST : analyze emotion from uploaded voice file
@app.post("/voices/analyze-emotion")
async def analyze_emotion(file: UploadFile = File(...)):
"""μŒμ„± 파일의 감정을 λΆ„μ„ν•©λ‹ˆλ‹€."""
emotion_result = analyze_voice_emotion(file)
return emotion_result


# POST : convert speech to text using Google STT
@app.post("/voices/transcribe")
async def transcribe_speech(
file: UploadFile = File(...),
language_code: str = "ko-KR"
):
"""μŒμ„± νŒŒμΌμ„ ν…μŠ€νŠΈλ‘œ λ³€ν™˜ν•©λ‹ˆλ‹€."""
stt_result = transcribe_voice(file, language_code)
return stt_result


# POST : upload voice with both emotion analysis and STT
@app.post("/voices/upload-with-analysis")
async def upload_voice_with_analysis(
file: UploadFile = File(...),
folder: Optional[str] = Form(default=None),
language_code: str = Form(default="ko-KR")
):
"""μŒμ„± νŒŒμΌμ„ μ—…λ‘œλ“œν•˜κ³  감정 뢄석과 STTλ₯Ό λͺ¨λ‘ μˆ˜ν–‰ν•©λ‹ˆλ‹€."""
bucket = os.getenv("S3_BUCKET_NAME")
if not bucket:
raise HTTPException(status_code=500, detail="S3_BUCKET_NAME not configured")

# S3 μ—…λ‘œλ“œ
base_prefix = VOICE_BASE_PREFIX.rstrip("/")
effective_prefix = f"{base_prefix}/{folder or DEFAULT_UPLOAD_FOLDER}".rstrip("/")
filename = os.path.basename(file.filename or "upload.wav")
key = f"{effective_prefix}/{filename}"
upload_fileobj(bucket=bucket, key=key, fileobj=file.file, content_type=file.content_type)
try:
file.file.seek(0)
except Exception:
pass

# 감정 뢄석
emotion_result = analyze_voice_emotion(file)
try:
file.file.seek(0)
except Exception:
pass

# STT λ³€ν™˜
stt_result = transcribe_voice(file, language_code)

# 파일 λͺ©λ‘ 쑰회
names = list_bucket_objects(bucket=bucket, prefix=effective_prefix)

return {
"uploaded": key,
"files": names,
"emotion_analysis": emotion_result,
"transcription": stt_result
}
40 changes: 40 additions & 0 deletions app/s3_service.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
import os
from typing import List

import boto3 # type: ignore
from botocore.client import Config # type: ignore


def get_s3_client():
region = os.getenv("AWS_REGION", "ap-northeast-2")
kwargs = {
"region_name": region,
"config": Config(signature_version="s3v4"),
}
access_key = os.getenv("AWS_ACCESS_KEY_ID")
secret_key = os.getenv("AWS_SECRET_ACCESS_KEY")
session_token = os.getenv("AWS_SESSION_TOKEN")
if access_key and secret_key:
kwargs["aws_access_key_id"] = access_key
kwargs["aws_secret_access_key"] = secret_key
if session_token:
kwargs["aws_session_token"] = session_token
return boto3.client("s3", **kwargs)


def upload_fileobj(bucket: str, key: str, fileobj) -> str:
s3 = get_s3_client()
s3.upload_fileobj(fileobj, bucket, key)
return key
Comment on lines +25 to +28
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | πŸ”΄ Critical

Content-Type λ§€κ°œλ³€μˆ˜ λˆ„λ½μœΌλ‘œ λŸ°νƒ€μž„ 였λ₯˜ λ°œμƒ

app/main.py의 36번 및 120번 λΌμΈμ—μ„œ 이 ν•¨μˆ˜λ₯Ό content_type λ§€κ°œλ³€μˆ˜μ™€ ν•¨κ»˜ ν˜ΈμΆœν•˜κ³  μžˆμ§€λ§Œ, ν˜„μž¬ ν•¨μˆ˜ μ‹œκ·Έλ‹ˆμ²˜μ—λŠ” ν•΄λ‹Ή λ§€κ°œλ³€μˆ˜κ°€ μ—†μ–΄ TypeErrorκ°€ λ°œμƒν•©λ‹ˆλ‹€.

λ‹€μŒ diffλ₯Ό μ μš©ν•˜μ—¬ Content-Type을 S3 λ©”νƒ€λ°μ΄ν„°λ‘œ μ €μž₯ν•˜λ„λ‘ μˆ˜μ •ν•˜μ„Έμš”:

-def upload_fileobj(bucket: str, key: str, fileobj) -> str:
+def upload_fileobj(bucket: str, key: str, fileobj, content_type: str = None) -> str:
     s3 = get_s3_client()
-    s3.upload_fileobj(fileobj, bucket, key)
+    extra_args = {}
+    if content_type:
+        extra_args["ContentType"] = content_type
+    s3.upload_fileobj(fileobj, bucket, key, ExtraArgs=extra_args if extra_args else None)
     return key
πŸ“ Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def upload_fileobj(bucket: str, key: str, fileobj) -> str:
s3 = get_s3_client()
s3.upload_fileobj(fileobj, bucket, key)
return key
def upload_fileobj(bucket: str, key: str, fileobj, content_type: str = None) -> str:
s3 = get_s3_client()
extra_args = {}
if content_type:
extra_args["ContentType"] = content_type
s3.upload_fileobj(fileobj, bucket, key, ExtraArgs=extra_args if extra_args else None)
return key
πŸ€– Prompt for AI Agents
In app/s3_service.py around lines 25 to 28, the upload_fileobj function is
missing the content_type parameter expected by callers, causing a TypeError and
also failing to set the S3 object's Content-Type; add a content_type:
Optional[str] = None parameter to the function signature and, when content_type
is provided, pass it to s3.upload_fileobj via the ExtraArgs argument
(ExtraArgs={'ContentType': content_type}) so the Content-Type is stored in S3
metadata; keep returning the key as before.



def list_bucket_objects(bucket: str, prefix: str = "") -> List[str]:
s3 = get_s3_client()
paginator = s3.get_paginator("list_objects_v2")
keys: List[str] = []
for page in paginator.paginate(Bucket=bucket, Prefix=prefix):
for obj in page.get("Contents", []) or []:
keys.append(obj["Key"])
return keys


Loading