Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
96 commits
Select commit Hold shift + click to select a range
a20c1d7
Refactor configuration management in wizard and ChronicleSetup
AnkushMalaker Jan 3, 2026
ad4b1f9
Fix string formatting for error message in ChronicleSetup
AnkushMalaker Jan 3, 2026
ff061e0
Enhance chat configuration management and UI integration
AnkushMalaker Jan 3, 2026
bdcb547
Merge branch 'dev' into feat/edit-chat-system-prompt
AnkushMalaker Jan 3, 2026
5f8d868
Refactor backend shutdown process and enhance chat service configurat…
AnkushMalaker Jan 3, 2026
5a3f8be
Implement plugin system for enhanced functionality and configuration …
AnkushMalaker Jan 3, 2026
32d541f
Enhance configuration management and plugin system integration
AnkushMalaker Jan 3, 2026
251010a
Implement Redis integration for client-user mapping and enhance wake …
AnkushMalaker Jan 3, 2026
eceb633
Refactor Deepgram worker management and enhance text normalization
AnkushMalaker Jan 3, 2026
916135e
Add original prompt retrieval and restoration in chat configuration test
AnkushMalaker Jan 3, 2026
944fc62
Refactor test execution and enhance documentation for integration tests
AnkushMalaker Jan 3, 2026
952d471
Enhance test environment cleanup and improve Deepgram worker management
AnkushMalaker Jan 6, 2026
4eb1ca9
Refactor worker management and introduce orchestrator for improved pr…
AnkushMalaker Jan 6, 2026
5cffe17
oops
AnkushMalaker Jan 6, 2026
8f44c4b
oops2
AnkushMalaker Jan 6, 2026
7e05de9
Remove legacy test runner script and update worker orchestration
AnkushMalaker Jan 6, 2026
112a280
Add bulk restart mechanism for RQ worker registration loss
AnkushMalaker Jan 6, 2026
0d82c8e
Enhance plugin architecture with event-driven system and test integra…
AnkushMalaker Jan 6, 2026
df79524
Enhance Docker configurations and startup script for test mode
AnkushMalaker Jan 6, 2026
668dfea
Refactor test scripts for improved reliability and clarity
AnkushMalaker Jan 7, 2026
197a610
remove mistral deadcode; notebooks untouched
AnkushMalaker Jan 7, 2026
c128147
Merge branch 'fix/plugin-tests' into fix/audio-pipe
AnkushMalaker Jan 7, 2026
10dfe2b
Merge branch 'dev' into fix/audio-pipe-2
AnkushMalaker Jan 10, 2026
a65b1bf
Refactor audio streaming endpoints and improve documentation
AnkushMalaker Jan 10, 2026
60aa8e1
Enhance testing infrastructure and API routes for plugin events
AnkushMalaker Jan 11, 2026
61d72a5
Add audio pipeline architecture documentation and improve audio persi…
AnkushMalaker Jan 11, 2026
9ab28ed
Add test container setup and teardown scripts
AnkushMalaker Jan 11, 2026
387385f
Update worker count validation and websocket disconnect tests
AnkushMalaker Jan 12, 2026
32e2e47
Refactor audio storage to MongoDB chunks and enhance cleanup settings…
AnkushMalaker Jan 12, 2026
e662f46
Refactor audio processing to utilize MongoDB chunks and enhance job h…
AnkushMalaker Jan 12, 2026
d143fe7
Refactor speaker recognition client to use in-memory audio data
AnkushMalaker Jan 12, 2026
126df5b
Add mock providers and update testing workflows for API-independent e…
AnkushMalaker Jan 12, 2026
5bc9908
Enhance testing documentation and workflows for API key separation
AnkushMalaker Jan 12, 2026
f89a3de
Update test configurations and documentation for API key management
AnkushMalaker Jan 12, 2026
8ca401c
Add optional service profile to Docker Compose test configuration
AnkushMalaker Jan 12, 2026
55fd469
Refactor audio processing and job handling for transcription workflows
AnkushMalaker Jan 12, 2026
2f89970
Remove unnecessary network aliases from speaker service in Docker Com…
AnkushMalaker Jan 12, 2026
f1afd8b
Add network aliases for speaker service in Docker Compose configuration
AnkushMalaker Jan 12, 2026
912c1cd
Refactor Conversation model to use string for provider field
AnkushMalaker Jan 13, 2026
6e64173
Enhance configuration and model handling for waveform data
AnkushMalaker Jan 14, 2026
498f42c
Add SDK testing scripts for authentication, conversation retrieval, a…
AnkushMalaker Jan 14, 2026
ae40334
Enhance audio processing and conversation handling for large files
AnkushMalaker Jan 14, 2026
078c602
archive
AnkushMalaker Jan 14, 2026
f76a47d
Implement annotation system and enhance audio processing capabilities
AnkushMalaker Jan 14, 2026
425746f
Implement OmegaConf-based configuration management for backend settings
AnkushMalaker Jan 14, 2026
b660f65
Refactor .env.template and remove unused diarization configuration
AnkushMalaker Jan 14, 2026
0eafc22
Implement legacy environment variable syntax support in configuration…
AnkushMalaker Jan 14, 2026
096cde3
Add plugins configuration path retrieval and refactor usage
AnkushMalaker Jan 14, 2026
edb66eb
Unify plugin terminology and fix memory job dependencies
AnkushMalaker Jan 14, 2026
8ef5c71
Update Docker Compose configuration and enhance system routes
AnkushMalaker Jan 16, 2026
f08acd5
circular import
AnkushMalaker Jan 17, 2026
0d21016
Refactor testing infrastructure and enhance container management
AnkushMalaker Jan 17, 2026
e2fe9c9
Add Email Summarizer Plugin and SMTP Email Service
AnkushMalaker Jan 17, 2026
d8c72e8
Refactor plugin management and introduce Email Summarizer setup
AnkushMalaker Jan 17, 2026
252f323
Merge branch 'feature/email-summarizer-plugin' into pre-release-candi…
AnkushMalaker Jan 17, 2026
f1dc63c
Enhance plugin configuration and documentation
AnkushMalaker Jan 17, 2026
623a0a9
Refactor plugin setup process to allow interactive user input
AnkushMalaker Jan 17, 2026
98195df
Add shared setup utilities for interactive configuration
AnkushMalaker Jan 17, 2026
accb880
Enhance plugin security architecture and configuration management
AnkushMalaker Jan 17, 2026
0a0a70e
Refactor backend components for improved functionality and stability
AnkushMalaker Jan 17, 2026
a2962cb
Refactor plugin setup timing to enhance configuration flow
AnkushMalaker Jan 17, 2026
1d63565
Refactor save_diarization_settings_controller to improve validation a…
AnkushMalaker Jan 17, 2026
8898b43
Refactor audio processing and conversation management for improved de…
AnkushMalaker Jan 18, 2026
d7f716d
Refactor audio and email handling for improved functionality and secu…
AnkushMalaker Jan 18, 2026
86d0975
Refactor audio upload functionality to remove unused parameters
AnkushMalaker Jan 18, 2026
0803f54
Refactor Email Summarizer plugin configuration for improved clarity a…
AnkushMalaker Jan 18, 2026
4f49de6
Update API key configuration in config.yml.template to use environmen…
roshatron2 Jan 18, 2026
152b24c
Refactor Redis job queue cleanup process for improved success tracking
AnkushMalaker Jan 18, 2026
e2886e9
fix tests
AnkushMalaker Jan 18, 2026
73f2d43
Update CI workflows to use 'docker compose' for log retrieval and add…
AnkushMalaker Jan 18, 2026
13ccb75
test fixes
AnkushMalaker Jan 18, 2026
ed6176c
FIX StreamingTranscriptionConsumer to support cumulative audio timest…
AnkushMalaker Jan 19, 2026
dbdf06c
Enhance test container setup and improve error messages in integratio…
AnkushMalaker Jan 19, 2026
fb8225d
Improve WebSocket closing logic and enhance integration test teardown
AnkushMalaker Jan 19, 2026
a4681c5
Refactor job status handling to align with RQ standards
AnkushMalaker Jan 19, 2026
8b4d783
Update test configurations and improve audio inactivity handling
AnkushMalaker Jan 19, 2026
42eb911
Refactor audio processing and enhance error handling
AnkushMalaker Jan 20, 2026
24a3419
Enhance Docker command handling and configuration management
AnkushMalaker Jan 20, 2026
c1b84ae
Enhance configuration loading to support custom config file paths
AnkushMalaker Jan 20, 2026
3a55ac9
Update test scripts to use TEST_CONFIG_FILE for configuration management
AnkushMalaker Jan 20, 2026
157e1c7
Refactor audio upload response handling and improve error reporting
AnkushMalaker Jan 21, 2026
0915493
Add sneeze detection model evaluation and data preparation scripts
0xrushi Jan 22, 2026
f08fbc4
Refactor audio processing and job handling to improve transcription m…
AnkushMalaker Jan 22, 2026
62ee36d
Enhance integration tests for plugin events and improve error handling
AnkushMalaker Jan 22, 2026
64887a2
Enhance speaker recognition testing and audio processing
AnkushMalaker Jan 23, 2026
ef719b1
Refactor audio chunk retrieval and enhance logging in audio processing
AnkushMalaker Jan 23, 2026
bd1cd84
Refactor mock speaker recognition client and improve testing structure
AnkushMalaker Jan 23, 2026
0dfd900
Enhance conversation model to include word-level timestamps and impro…
AnkushMalaker Jan 23, 2026
7df727f
Implement speaker reprocessing feature and enhance timeout calculation
AnkushMalaker Jan 23, 2026
5ebaa79
Add user-loop service and anomaly detection features
0xrushi Jan 30, 2026
84e93bd
Merge branch 'refactor/use-standard-redis-job-status' into feat/swipe…
0xrushi Jan 30, 2026
3cf0145
Remove deprecated event detection script and update README for traini…
0xrushi Jan 30, 2026
6700d7c
Merge branch 'feat/swipe-anomaly' of github.com:0xrushi/friend-lite i…
0xrushi Jan 30, 2026
ab98363
Enhance anomaly detection and user loop features
0xrushi Jan 30, 2026
3842b71
Refactor user loop integration tests and enhance conversation management
0xrushi Jan 30, 2026
2cb6347
Merge branch 'dev' into feat/swipe-anomaly
AnkushMalaker Feb 8, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
202 changes: 202 additions & 0 deletions Docs/whisper-adapter-finetuning/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,202 @@
# Whisper Sneeze Adapter Training

This project fine-tunes OpenAI's Whisper model to transcribe sneezes in audio/video content using LoRA adapters. The model learns to recognize and transcribe sneezes as the token "SNEEZE" in transcriptions.

## Prerequisites

- Python 3.10+
- CUDA-capable GPU (recommended for training)
- Access to Google Gemini API (for generating transcripts)

## Installation

1. Create a virtual environment:
```bash
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
```

2. Install dependencies:
```bash
pip install torch torchaudio
pip install transformers datasets evaluate
pip install unsloth[colab-new]
pip install librosa soundfile jiwer
pip install tqdm
```

## Workflow

### Step 1: Prepare Your Video

1. Record or obtain a video file containing sneezes (e.g., `girls_sneezing.mp4` download with
```
yt-dlp -f "bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]/best" --merge-output-format mp4 -o "girls_sneezing.mp4" https://youtu.be/36b4248j5UE
```

### Step 2: Generate Transcript with Gemini

1. Upload your video to Google Gemini (or use Gemini API)
2. Request a transcript with sneezes marked using the format: `<sneeze>`
3. Generate a JSONL file named `sneeze_data.jsonl` with the following format:

```jsonl
{"start": 0.0, "end": 5.0, "text": "Ugh, I really need to sneeze. Stuck? Yeah, it's right there."}
{"start": 5.0, "end": 11.0, "text": "Close one. <sneeze> Bless you. Thanks."}
{"start": 12.0, "end": 17.0, "text": "Ugh, I can feel it. I really need to sneeze so bad. Go on, let it out."}
```

**Format requirements:**
- Each line is a JSON object
- `start`: Start time in seconds (float)
- `end`: End time in seconds (float)
- `text`: Transcription text with sneezes marked as `<sneeze>`

**Example Gemini prompt:**
```
Please transcribe this video and create a JSONL file where each line contains:
- start: start time in seconds
- end: end time in seconds
- text: the transcription with sneezes marked as <sneeze>
Format as JSONL (one JSON object per line).
```

### Step 3: Prepare Training Data

Run the data preparation script to extract audio chunks and create train/test splits:

```bash
python prepare_sneeze_data.py
```

This script will:
- Extract audio from your video file (`girls_sneezing.mp4`)
- Create audio chunks from the segments in `sneeze_data.jsonl`
- Save chunks to `sneeze_chunks/` directory
- Split data into `train.jsonl` (60%) and `test.jsonl` (40%)

**Requirements:**
- `sneeze_data.jsonl` must exist in the project root
- Video file must be named `girls_sneezing.mp4`

### Step 4: Train the Model

Train the Whisper model with LoRA adapters:

```bash
python train_sneeze.py
```

This will:
- Load the base Whisper Large v3 model
- Apply LoRA adapters (only trains 1-10% of parameters)
- Fine-tune on your sneeze data
- Save the adapter to `sneeze_lora_adapter_unsloth/`

**Training configuration:**
- Model: `unsloth/whisper-large-v3`
- LoRA rank: 64
- Batch size: 1 (with gradient accumulation: 4)
- Max steps: 200
- Learning rate: 1e-4

**Note:** Training requires a GPU with sufficient VRAM. Adjust `load_in_4bit=True` in the script if you have limited memory.

### Step 5: Evaluate the Model

Evaluate the trained model on the test set:

```bash
python evaluate_sneeze_model.py
```

This will:
- Load the base model and merge the LoRA adapter
- Run inference on test samples
- Calculate Word Error Rate (WER)
- Report sneeze detection recall and false positives

## Results

### Training Results

Training was performed on a Tesla T4 GPU with the following configuration:
- **Model**: `unsloth/whisper-large-v3`
- **Trainable Parameters**: 31,457,280 of 1,574,947,840 (2.00%)
- **Training Time**: 12.04 minutes
- **Peak Memory Usage**: 8.896 GB (60.35% of max memory)
- **Training Samples**: 49 samples
- **Test Samples**: 4 samples

**Training Loss Progression:**
| Step | Training Loss | Validation Loss | WER |
|------|---------------|-----------------|-----|
| 20 | 1.646100 | 1.869532 | 50.0% |
| 40 | 0.832500 | 1.004385 | 30.0% |
| 60 | 0.304600 | 0.354044 | 30.0% |
| 80 | 0.067700 | 0.051606 | 0.0% |
| 100 | 0.017600 | 0.162433 | 10.0% |
| 120 | 0.003400 | 0.006127 | 0.0% |
| 140 | 0.002000 | 0.004151 | 0.0% |
| 160 | 0.001400 | 0.003399 | 0.0% |
| 180 | 0.001300 | 0.003005 | 0.0% |
| 200 | 0.001000 | 0.002856 | 0.0% |

**Final Metrics:**
- Final Training Loss: 0.001000
- Final Validation Loss: 0.002856
- Final Validation WER: 0.0%

### Evaluation Results

Evaluation was performed on 10 test samples (4 containing sneezes):

**Overall Performance:**
- **Word Error Rate (WER)**: 0.3217 (32.17%)
- **Sneeze Recall**: 2/4 (50.0%)
- **False Positives**: 0

**Missed Sneezes:**
1. Reference: "Take your time, it'll come. SNEEZE Oh wow. Excuse me."
Prediction: "Take your time. It'll come. Oh, wow."

2. Reference: "It's right there but... False alarm? No, it's stuck. SNEEZE Bless you."
Prediction: "It's right there, but... False alarm? No! It stopped..."

**Analysis:**
- The model achieved perfect WER (0.0%) on the validation set during training, indicating good generalization on the training distribution.
- On the test set, the model achieved 50% sneeze recall, successfully detecting 2 out of 4 sneezes.
- No false positives were detected, showing the model is conservative in its sneeze predictions.
- The 32.17% WER on the test set suggests room for improvement, particularly in detecting sneezes in more varied contexts.

## Project Structure

```
whisper-adapter-test/
├── prepare_sneeze_data.py # Data preparation script
├── improved_sneeze_trainer.py # Training script
├── evaluate_sneeze_model.py # Evaluation script
├── sneeze_data.jsonl # Input transcript with sneezes
├── train.jsonl # Training manifest
├── test.jsonl # Test manifest
├── sneeze_chunks/ # Extracted audio chunks
└── sneeze_lora_adapter_unsloth/ # Trained adapter (created after training)
```

## Output Files

- `train.jsonl`: Training dataset manifest
- `test.jsonl`: Test dataset manifest
- `sneeze_chunks/`: Directory with extracted audio chunks
- `sneeze_lora_adapter_unsloth/`: Trained LoRA adapter weights

## Notes

- The model replaces `<sneeze>` tags with `SNEEZE` during training
- LoRA adapters are memory-efficient and only update a small portion of model weights
- The evaluation script merges the adapter into the base model for inference

## Conclusion

Despite training on only 13 examples and evaluating on 10 test samples, the model achieved significant progress in sneeze detection. With just this small dataset, we were able to fine-tune the Whisper model to recognize and transcribe sneezes with 50% recall and zero false positives. This demonstrates the effectiveness of LoRA adapters for efficient fine-tuning on specialized tasks with limited data.
115 changes: 115 additions & 0 deletions Docs/whisper-adapter-finetuning/evaluate_sneeze_model.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
import os
import json
import torch
import librosa
import jiwer
from transformers import WhisperProcessor, WhisperForConditionalGeneration
from peft import PeftModel
from tqdm import tqdm

# --- CONFIGURATION (MUST MATCH YOUR TRAINING) ---
BASE_MODEL_ID = "openai/whisper-large-v3"
ADAPTER_PATH = "sneeze_lora_adapter_unsloth" # The folder Unsloth created
TEST_MANIFEST = "test.jsonl"

def main():
# 1. Setup Device
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")

# 2. Load Base Model (Large v3)
print(f"Loading base model: {BASE_MODEL_ID}")
processor = WhisperProcessor.from_pretrained(BASE_MODEL_ID)
model = WhisperForConditionalGeneration.from_pretrained(
BASE_MODEL_ID,
torch_dtype=torch.float16 if device == "cuda" else torch.float32
)

# 3. Load and MERGE Adapter
if os.path.exists(ADAPTER_PATH):
print(f"Loading LoRA adapter from: {ADAPTER_PATH}")
model = PeftModel.from_pretrained(model, ADAPTER_PATH)
print("Merging LoRA weights...")
model = model.merge_and_unload()
else:
print(f"❌ ERROR: Adapter {ADAPTER_PATH} not found!")
return

model.to(device)
model.eval()

# 4. Run Evaluation
evaluate_dataset(model, processor, device, TEST_MANIFEST)

def evaluate_dataset(model, processor, device, manifest_path):
if not os.path.exists(manifest_path):
print(f"Manifest {manifest_path} not found.")
return

samples = []
with open(manifest_path, 'r') as f:
for line in f:
samples.append(json.loads(line))

print(f"Testing on {len(samples)} samples...")

predictions = []
references = []
sneeze_stats = {"total": 0, "detected": 0, "fp": 0}

for sample in tqdm(samples):
path = sample['audio']
ref_text = sample['text'].replace("<sneeze>", "SNEEZE")

try:
audio, _ = librosa.load(path, sr=16000)
except: continue

# Process audio
inputs = processor(audio, sampling_rate=16000, return_tensors="pt")
input_features = inputs.input_features.to(device)

# Handle the dtype for half precision (if on GPU)
if device == "cuda":
input_features = input_features.half()

# Generate
with torch.no_grad():
generated_ids = model.generate(
input_features=input_features, # Use input_features, not inputs
language="en",
task="transcribe",
max_new_tokens=256
)

pred = processor.batch_decode(generated_ids, skip_special_tokens=True)[0].strip()

predictions.append(pred)
references.append(ref_text)

# Stats
has_sneeze_ref = "SNEEZE" in ref_text
has_sneeze_pred = "SNEEZE" in pred

if has_sneeze_ref:
sneeze_stats["total"] += 1
if has_sneeze_pred:
sneeze_stats["detected"] += 1
else:
print(f"\n❌ MISSED SNEEZE\nRef: {ref_text}\nPrd: {pred}")
elif has_sneeze_pred:
sneeze_stats["fp"] += 1
print(f"\n⚠️ FALSE POSITIVE\nRef: {ref_text}\nPrd: {pred}")

# Results
wer = jiwer.wer(references, predictions)
print("\n" + "="*40)
print(f"Word Error Rate: {wer:.4f}")
if sneeze_stats["total"] > 0:
recall = (sneeze_stats["detected"] / sneeze_stats["total"]) * 100
print(f"Sneeze Recall: {sneeze_stats['detected']}/{sneeze_stats['total']} ({recall:.1f}%)")
print(f"False Positives: {sneeze_stats['fp']}")
print("="*40)

if __name__ == "__main__":
main()
Loading
Loading