Transcribe architecture has two primary components that require model selection
- Speech to Text
- LLM Responses
Speech to text aspect can be done locally or online using the Whisper API. Online Speech to Text requires use of -api
option on command line.
Local Speech to Text requires model selection. By default the base
model for English is used. This model is downloaded on the first invocation of the application. There are many more models available, though they vary by size and compute power required.
See the help of transcribe python main.py -h
for further details on local transcription models.
-m {tiny,base,small,medium,large-v1,large-v2,large-v3,large}, --model {tiny,base,small,medium,large-v1,large-v2,large-v3,large}
Specify the LLM to use for transcription.
By default tiny english model is part of the install.
tiny multi-lingual model has to be downloaded from the link https://drive.google.com/file/d/1M4AFutTmQROaE9xk2jPc5Y4oFRibHhEh/view?usp=drive_link
base english model has to be downloaded from the link https://openaipublic.azureedge.net/main/whisper/models/25a8566e1d0c1e2231d1c762132cd20e0f96a85d16145c3a00adf5d1ac670ead/base.en.pt
base multi-lingual model has to be downloaded from the link https://openaipublic.azureedge.net/main/whisper/models/ed3a0b6b1c0edf879ad9b11b1af5a0e6ab5db9205f891f668f8b0e6c6326e34e/base.pt
small english model has to be downloaded from the link https://openaipublic.azureedge.net/main/whisper/models/f953ad0fd29cacd07d5a9eda5624af0f6bcf2258be67c92b79389873d91e0872/small.en.pt
small multi-lingual model has to be downloaded from the link https://openaipublic.azureedge.net/main/whisper/models/9ecf779972d90ba49c06d968637d720dd632c55bbf19d441fb42bf17a411e794/small.pt
The models below require higher computing power:
medium english model has to be downloaded from the link https://openaipublic.azureedge.net/main/whisper/models/d7440d1dc186f76616474e0ff0b3b6b879abc9d1a4926b7adfa41db2d497ab4f/medium.en.pt
medium multi-lingual model has to be downloaded from the link https://openaipublic.azureedge.net/main/whisper/models/345ae4da62f9b3d59415adc60127b97c714f32e89e936602e85993674d08dcb1/medium.pt
large model has to be downloaded from the link https://openaipublic.azureedge.net/main/whisper/models/e5b1a55b89c1367dacf97e3e19bfd829a01529dbfdeefa8caeb59b3f1b81dadb/large-v3.pt
large-v1 model has to be downloaded from the link https://openaipublic.azureedge.net/main/whisper/models/e4b87e7e0bf463eb8e6956e646f1e277e901512310def2c24bf0e11bd3c28e9a/large-v1.pt
large-v2 model has to be downloaded from the link https://openaipublic.azureedge.net/main/whisper/models/81f7c96c852ee8fc832187b0132e569d6c3065a3252ed18e56effd0b6a73e524/large-v2.pt
large-v3 model has to be downloaded from the link https://openaipublic.azureedge.net/main/whisper/models/e5b1a55b89c1367dacf97e3e19bfd829a01529dbfdeefa8caeb59b3f1b81dadb/large-v3.pt
Online Speech to Text option is enabled using --api
option. This option does not require model selection as the appropriate model is selected by the API behind the scenes.
The quality, cost and speed of responses from LLM depends on the model chosen. Out of the box transcribe uses gpt-3.5-turbo-0301
model as specified in parameters.yaml
ai_model: gpt-3.5-turbo-0301
The model can be changed by altering the config in parameters.yaml
or override.yaml
file.
Details of all models for OpenAI are available at https://platform.openai.com/docs/models/continuous-model-upgrades