LinTO-diarization

LinTO-diarization is the LinTO service for speaker diarization, with some ability to guess the number of speakers and identify some speakers if samples of their voice are provided.

LinTO-diarization can either be used as a standalone diarization service or deployed as a micro-services.

Prerequisites
Deploy
- HTTP
- MicroService
Usage
- HTTP API
- Using celery
License

Pre-requisites

Docker

The transcription service requires docker up and running.

(micro-service) Service broker and shared folder

The diarization only entry point in job mode are tasks posted on a Redis message broker. Futhermore, to prevent large audio from transiting through the message broker, diarization uses a shared storage folder mounted on /opt/audio.

Deploy

linto-diarization can be deployed:

As a standalone diarization service through an HTTP API.
As a micro-service connected to a message broker.

1- First step is to build or pull the image:

git clone https://github.com/linto-ai/linto-diarization.git
cd linto-diarization
docker build . -t linto-diarization-pyannote:latest -f pyannote/Dockerfile

or

docker pull lintoai/linto-diarization-pyannote

HTTP

1- Fill the .env

An example of .env file is provided in pyannote/.envdefault.

Parameters:

Variables	Description	Example
`SERVING_MODE`	(Required) Specify launch mode	`http`
`CONCURRENCY`	Number of worker(s) additional to the main worker	`0` \| `1` \| `2` \| ...
`DEVICE`	Device to use for the embedding model (by default, GPU/CUDA is used if it is available, CPU otherwise)	`cpu` \| `cuda` \| `cuda:0`
`DEVICE_CLUSTERING`	Device to use for clustering (Same as `DEVICE` by default)	`cpu` \| `cuda` \| `cuda:0`
`DEVICE_IDENTIFICATION`	Device to use for speaker identification, if it is enabled (Same as `DEVICE` by default)	`cpu` \| `cuda` \| `cuda:0`
`NUM_THREADS`	Number of threads (maximum) to use for things running on CPU	`1` \| `4` \| ...
`CUDA_VISIBLE_DEVICES`	GPU device index to use, when running on GPU/CUDA. We also recommend to set `CUDA_DEVICE_ORDER=PCI_BUS_ID` on multi-GPU machines	`0` \| `1` \| `2` \| ...
`SPEAKER_SAMPLES_FOLDER`	(default: `/opt/speaker_samples`) Folder where to find audio files for target speakers samples	`/path/to/folder`
`SPEAKER_PRECOMPUTED_FOLDER`	(default: `/opt/speaker_precomputed`) Folder where to store precomputed embeddings of target speakers	`/path/to/folder`

2- Run the container

This will run a container providing an http API binded on the host <HOST_SERVING_PORT> port (for instance 8080):

docker run --rm \
-v <SHARED_FOLDER>:/opt/audio \
-p <HOST_SERVING_PORT>:80 \
--env-file .env \
linto-diarization-pyannote:latest

If you want to enable speaker identification, you have to provide samples of the target speakers' voices, either in separate folders with the name of the speaker as the folder name, or in separate files with the name of the speaker as the file name. Then the parent folder of the samples must be mounted as a volume in the container under /opt/speaker_samples (or a custom folder set with the SPEAKER_SAMPLES_FOLDER environment variable).

docker run ... -v <</path/to/speaker/samples/folder>>:/opt/speaker_samples

When speaker identification, you can also mount a volume (empty at the beginning) on /opt/speaker_precomputed (or a custom folder set with the SPEAKER_PRECOMPUTED_FOLDER environment variable), where will be stored the precomputed embeddings of the speakers. This can avoid an initialisation time at each new docker run, if the set of target speakers remains the same or just grows.

docker run ... -v <</path/to/precomputed/embeddings/folder>>:/opt/speaker_precomputed

You may also want to add --gpus all to enable GPU capabilitiesn and maybe set CUDA_VISIBLE_DEVICES if there are several available GPU cards.

Using celery

LinTO-diarization can be deployed as a micro-service using celery. Used this way, the container spawn celery worker waiting for diarization task on a message broker.

You need a message broker up and running at SERVICES_BROKER.

1- Fill the .env

An example of .env file is provided in pyannote/.envdefault.

Parameters: Parameters are the same as for the HTTP API, with the addition of the following:

Variables	Description	Example
`SERVING_MODE`	(Required) Specify launch mode	`task`
`SERVICES_BROKER`	Service broker uri	`redis://my_redis_broker:6379`
`BROKER_PASS`	Service broker password (Leave empty if there is no password)	`my_password`
`QUEUE_NAME`	Overide the generated queue's name (See Queue name bellow)	`my_queue`
`SERVICE_NAME`	Service's name	`diarization-ml`
`LANGUAGE`	Language code as a BCP-47 code	`en-US` or * or languages separated by "\|"
`MODEL_INFO`	Human readable description of the model	`Multilingual diarization model`

2- Fill the docker-compose.yml

#docker-compose.yml

version: '3.7'

services:
  punctuation-service:
    image: linto-diarization-pyannote:latest
    volumes:
      - /path/to/shared/folder:/opt/audio
    env_file: .env
    deploy:
      replicas: 1
    networks:
      - your-net

networks:
  your-net:
    external: true

3- Run with docker compose

docker stack deploy --resolve-image always --compose-file docker-compose.yml your_stack

Queue name:

By default the service queue name is generated as SERVICE_NAME.

The queue name can be overided using the QUEUE_NAME env variable.

Service discovery:

As a micro-service, the instance will register itself in the service registry for discovery. The service information are stored as a JSON object in redis's db0 under the id service:{HOST_NAME}.

The following information are registered:

{
  "service_name": $SERVICE_NAME,
  "host_name": $HOST_NAME,
  "service_type": "diarization",
  "service_language": $LANGUAGE,
  "queue_name": $QUEUE_NAME,
  "version": "1.2.0", # This repository's version
  "info": $MODEL_INFO,
  "last_alive": 65478213,
  "concurrency": 1
}

Usages

HTTP API

/healthcheck

Returns the state of the API

Method: GET

Returns "1" if healthcheck passes.

/diarization

Diarization API

Input arguments are:

file: A Wave file
speaker_count: (integer - optional) Number of speakers. If empty, diarization will clusterize automatically.
max_speaker: (integer - optional) Max number of speakers if speaker_count is unknown.
speaker_names: (string - optional) List of target speaker names, speaker identification (if speaker samples are provided only). Possible values are
- empty string "": no speaker identification
- wild card "*": speaker identification for all speakers
- list of speaker names in json format (ex: "["speaker1", ..., "speakerN"]") or separated by | (ex: "speaker1|...|speakerN"): speaker identification for the listed speakers only

The response (application/json) is a json object when using structured as followed:

{
  "speakers": [
      {"spk_id": "spk5", "duration": 2.2, "nbr_seg": 1},
      ...
  ],
  "segments": [
      {"seg_id": 1, "spk_id": "spk5", "seg_begin": 0.0, "seg_end": 2.2},
      ...
  ]
}

/docs

The /docs route offers a OpenAPI/swagger interface.

Through the message broker

Diarization worker accepts requests with the following arguments:

file: (str) Is the relative path of the file in the shared_folder.
speaker_count: (int, default None) Fixed number of speakers.
max_speaker: (int, default None) Max number of speaker if speaker_count=None.
speaker_names: (string, optional) List of target speaker names, speaker identification (if speaker samples are provided only). Possible values are
- empty string "": no speaker identification
- wild card "*": speaker identification for all speakers
- list of speaker names in json format (ex: "["speaker1", ..., "speakerN"]") or separated by | (ex: "speaker1|...|speakerN"): speaker identification for the listed speakers only

Return format

On a successfull transcription the returned object is a json object structured as follow:

{
  "speakers": [
      {"spk_id": "spk5", "duration": 2.0, "nbr_seg": 1},
      ...
  ],
  "segments": [
      {"seg_id": 1, "spk_id": "spk5", "seg_begin": 0.0, "seg_end": 2.0},
      ...
  ]
}

The speakers field contains an arraw of speaker with overall duration and number of segments.
The segments field contains each audio segment with the associated speaker id start time and end time.

Test

Curl

You can test you http API using curl:

curl -X POST "http://YOUR_SERVICE:PORT/diarization" -H  "accept: application/json" -H  "Content-Type: multipart/form-data" -F "file=@YOUR_FILE.wav;type=audio/x-wav" -F "speaker_count=NUMBER_OF_SPEAKERS"

License

This project is developped under the AGPLv3 License (see LICENSE).

Acknowlegment.

PyAnnote diarization framework (License MIT).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

LinTO-diarization

Pre-requisites

Docker

(micro-service) Service broker and shared folder

Deploy

HTTP

Using celery

Usages

HTTP API

/healthcheck

/diarization

/docs

Through the message broker

Return format

Test

Curl

License

Acknowlegment.

Files

README.md

Latest commit

History

README.md

File metadata and controls

LinTO-diarization

Pre-requisites

Docker

(micro-service) Service broker and shared folder

Deploy

HTTP

Using celery

Usages

HTTP API

/healthcheck

/diarization

/docs

Through the message broker

Return format

Test

Curl

License

Acknowlegment.