Skip to content

Latest commit

 

History

History
68 lines (46 loc) · 2.04 KB

File metadata and controls

68 lines (46 loc) · 2.04 KB

Automated Speech Recognition

This plugin adds the automated speech recognition task (batched and non-batched) which allows you to generate text from speech in the form of audio data.

Installation

This plugin requires core and simple_plugin_manager as dependencies. For source, local and containerized installation instructions please see the main README.

To execute the task, a plugin which includes a model that supports the automated speech recognition task should be installed.

Currently, the following models support automated speech recognition:

To expose the task as an endpoint the fastapi plugin and one of the following apis should be installed:

Usage

As input this task requires the following fields:

Key Required Batched Type Description
audio Yes Yes audio data The audio from which the speech needs to be generated
config No Yes dict Any additional arguments (see the model documentation)

As ouput this task returns the following fields:

Key Batched Type Description
transcript Yes string The generated text

When the task is executed in batched mode, the batched fields are given (or returned) as lists, whereas the non-batched fields are given as simple values.

Example

For the /api/tasks/automated-speech-recognition/{model}/jobs endpoint defined in base_api the following request body serves as an example:

{
  "audio": "data:audio/wav;base64,UklGRhDUCwB...",
  "config": {}
}

This returns:

{
  "id": "05032039-e7dd-4ca5-8941-ebff9f5f2e09",
  "status": "starting"
}

Later, the result can be retrieved using /api/tasks/automated-speech-recognition/jobs/05032039-e7dd-4ca5-8941-ebff9f5f2e09:

{
  "status": "complete",
  "result": {
    "transcript": "Hello"
  }
}

Back to Main README