This repository contains a multi-stage pipeline for processing, transcribing, and deduplicating Discord voice session recordings into clean text transcripts. It includes tools for audio capture, filtering, transcription, clustering-based deduplication, and final text output.
The pipeline operates in the following phases:
-
Phase 0 – Discord Audio Capture
Captures user audio streams as individual.wavfiles and generates session logs. -
Phase 1 – Audio Validation and Filtering
Filters audio for silence, duration constraints, and rescues bursty utterances with VAD. -
Phase 2 – Whisper Transcription
Transcribes accepted audio files to text using a CTranslate2-based Whisper model. -
Phase 3 – Deduplication by Clustering
Clusters transcriptions and deduplicates based on similarity, canonical form, and scoring. -
Output – A cleaned
.txttranscript preserving character, flow, and session integrity.
| Script | Purpose |
|---|---|
index.ts |
Captures Discord voice as per-user .wav files |
dedupe_audit.py |
Filters raw audio: silence, noise, duplicates, duration |
burst_scope.py |
Rescues short sharp utterances from false VAD rejection |
transcribe_accepted.py |
Transcribes accepted .wav files into enriched JSONL |
dedupe_transcript.py |
Deduplicates transcribed JSONL using clustering |
- Clone the repo and install required Python and Node.js dependencies.
- Configure
.envwith your Discord bot credentials. - Run each phase in sequence:
index.tsto capture audio.dedupe_audit.pyto filter audio.transcribe_accepted.pyto transcribe.dedupe_transcript.pyto deduplicate.
- Review the final transcript output.
- Built specifically for GPU-accelerated transcription with faster-whisper.
- Designed and tested on a GeForce RTX 5090 with a custom-built CTranslate2 backend.
- Provided "as is", with no guarantees; it's up to you to configure and compile any needed dependencies.
pip install -r requirements.txt
(further dependencies may be required)
npm install
(further dependencies may be required)
See also Pipeline Document