Skip to content

Transcribe recordings including speaker recognition (diarization) and timestamps

License

Notifications You must be signed in to change notification settings

sueskind/transcribe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

transcribe

Transcribe recordings including speaker recognition (diarization) and timestamps.

Powered by OpenAI Whisper and pyannote-audio.

Setup

Prerequisites

Steps

  1. Install via pip:
    pip install git+https://github.com/sueskind/transcribe
  2. Accept pyannote/segmentation's user conditions
  3. Accept pyannote/speaker-diarization's user conditions
  4. Create a Hugging Face access token

Usage

Example:

$ transcribe pulpfiction.mp3 -t <token>
Speaker 1 (00:00:00):
They don't call it a quarter pounder with cheese?

Speaker 2 (00:00:06):
No, they got the metric system there, they wouldn't know what a quarter pounder is.

Speaker 1 (00:00:13):
What do they call it?

Speaker 2 (00:00:17):
Royale with cheese.

Use --help to show all options:

$  transcribe --help
Usage: transcribe [OPTIONS] AUDIO

  Transcribe and diarize (recognize speakers) recorded audio.

Arguments:
  AUDIO  Path to the input audio file.  [required]

Options:
  -t, --token TEXT  Hugging face access token.  [required]
  -o, --out TEXT    Path to the output file. Print to stdout if not set.
  --device TEXT     Device to run the models on.  [default: cuda]
  --language TEXT   Spoken language in the recording.  [default: en]
  --help            Show this message and exit.

About

Transcribe recordings including speaker recognition (diarization) and timestamps

Resources

License

Stars

Watchers

Forks

Languages