Skip to content

Commit 36d1d4e

Browse files
Merge branch 'main' into document-start-recognition-timeout
2 parents 53f6200 + 2924c47 commit 36d1d4e

35 files changed

+1364
-456
lines changed

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,3 +26,5 @@ docs/api-ref/batch
2626
static/*.yaml
2727
.vercel
2828
.env*.local
29+
30+
.venv

docs/api-ref/realtime-transcription-websocket.mdx

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,8 @@ sequenceDiagram
4444
API-->>Client: ChannelAudioAdded (optional)
4545
API-->>Client: AddPartialTranscript (optional)
4646
API-->>Client: AddTranscript (optional)
47+
Client->>API: GetSpeakers (optional)
48+
API-->>Client: SpeakersResult (optional)
4749
end
4850
Client->>API: EndOfChannel (optional)
4951
Client->>API: EndOfStream

docs/deployments/index.md

Lines changed: 23 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
title: Deployments — Overview
3-
description: 'Learn about the different ways to use our APIs, including cloud services and on-prem containers.'
3+
description: "Learn about the different ways to use our APIs, including cloud services and on-prem containers."
44
---
55

66
# Overview
@@ -10,6 +10,7 @@ description: 'Learn about the different ways to use our APIs, including cloud se
1010
Leverage Speechmatics’ cloud services for easy, scalable, and fully managed speech-to-text and translation capabilities.
1111

1212
The best way to get started using Speechmatics' cloud services is:
13+
1314
- Create an account in our [Portal](https://portal.speechmatics.com/)
1415
- Check out our [Realtime Transcription](/speech-to-text/realtime/quickstart.mdx)
1516
- Check out our [Batch Transcription](/speech-to-text/batch/quickstart.mdx)
@@ -22,30 +23,29 @@ Deploy Speechmatics services in your own environment using containers. This opti
2223
- [Language ID Container](/deployments/container/language-id): Identify the language spoken in your audio using the Language ID container.
2324
- [Translation Container](/deployments/container/gpu-translation): Translate audio from one language to another using the Translation container.
2425

25-
2626
## Feature Availability
2727

2828
Feature availability varies depending on the deployment method you choose. Below is a table summarizing the speech to text feature availability for each deployment method and processing mode.
2929

30-
| Feature | Modes | Deployments |
31-
|-----------------------------------------------|----------------------|----------------------------|
30+
| Feature | Modes | Deployments |
31+
| ------------------------------------------------------------------------------------- | --------------- | ------------- |
3232
| [Multi-lingual speech to text](/speech-to-text/languages#multilingual-speech-to-text) | Batch, Realtime | SaaS, On-prem |
33-
| [Alignment](/speech-to-text/batch/alignment) | Batch | SaaS |
34-
| [Audio Events](/speech-to-text/features/audio-events) | Batch, Realtime | SaaS, On-prem |
35-
| [Audio Filtering](/speech-to-text/features/audio-filtering) | Batch, Realtime | SaaS, On-prem |
36-
| [Auto Chapters](/speech-to-text/batch/speech-intelligence/auto-chapters) | Batch | SaaS |
37-
| [Custom Dictionary](/speech-to-text/features/custom-dictionary) | Batch, Realtime | SaaS, On-prem |
38-
| [Diarization](/speech-to-text/features/diarization) | Batch, Realtime | SaaS, On-prem |
39-
| [Disfluencies and Word Replacement](/speech-to-text/formatting#disfluencies) | Batch, Realtime | SaaS, On-prem |
40-
| [End-of-Turn](/speech-to-text/realtime/end-of-turn) | Realtime | SaaS, On-prem |
41-
| [Feature Discovery](/speech-to-text/features/feature-discovery) | Batch, Realtime | SaaS |
42-
| [Fetch URL](/speech-to-text/batch/input#fetch-url) | Batch | SaaS, On-Prem |
43-
| [Language Identification](/speech-to-text/batch/language-identification) | Batch | SaaS |
44-
| [Notifications](/speech-to-text/batch/notifications.md) | Batch | SaaS, On-prem |
45-
| [Numeral Formatting](/speech-to-text/formatting#smart-formatting) | Batch, Realtime | SaaS, On-prem |
46-
| [Punctuation Settings](/speech-to-text/formatting#punctuation) | Batch, Realtime | SaaS, On-prem |
47-
| [Sentiment Analysis](/speech-to-text/batch/speech-intelligence/sentiment-analysis) | Batch | SaaS, On-prem |
48-
| [Summarization](/speech-to-text/batch/speech-intelligence/summarization) | Batch | SaaS |
49-
| [Topic Detection](/speech-to-text/batch/speech-intelligence/topic-detection) | Batch | SaaS |
50-
| [Tracking](/speech-to-text/batch/output#tracking-metadata) | Batch, Realtime | SaaS, On-prem |
51-
| [Translation](/speech-to-text/features/translation) | Batch, Realtime | SaaS, On-prem |
33+
| [Alignment](/speech-to-text/batch/alignment) | Batch | SaaS |
34+
| [Audio Events](/speech-to-text/features/audio-events) | Batch, Realtime | SaaS, On-prem |
35+
| [Audio Filtering](/speech-to-text/features/audio-filtering) | Batch, Realtime | SaaS, On-prem |
36+
| [Auto Chapters](/speech-to-text/batch/speech-intelligence/auto-chapters) | Batch | SaaS |
37+
| [Custom Dictionary](/speech-to-text/features/custom-dictionary) | Batch, Realtime | SaaS, On-prem |
38+
| [Diarization](/speech-to-text/features/diarization) | Batch, Realtime | SaaS, On-prem |
39+
| [Disfluencies and Word Replacement](/speech-to-text/formatting#disfluencies) | Batch, Realtime | SaaS, On-prem |
40+
| [End-of-Turn](/speech-to-text/realtime/turn-detection) | Realtime | SaaS, On-prem |
41+
| [Feature Discovery](/speech-to-text/features/feature-discovery) | Batch, Realtime | SaaS |
42+
| [Fetch URL](/speech-to-text/batch/input#fetch-url) | Batch | SaaS, On-Prem |
43+
| [Language Identification](/speech-to-text/batch/language-identification) | Batch | SaaS |
44+
| [Notifications](/speech-to-text/batch/notifications.md) | Batch | SaaS, On-prem |
45+
| [Numeral Formatting](/speech-to-text/formatting#smart-formatting) | Batch, Realtime | SaaS, On-prem |
46+
| [Punctuation Settings](/speech-to-text/formatting#punctuation) | Batch, Realtime | SaaS, On-prem |
47+
| [Sentiment Analysis](/speech-to-text/batch/speech-intelligence/sentiment-analysis) | Batch | SaaS, On-prem |
48+
| [Summarization](/speech-to-text/batch/speech-intelligence/summarization) | Batch | SaaS |
49+
| [Topic Detection](/speech-to-text/batch/speech-intelligence/topic-detection) | Batch | SaaS |
50+
| [Tracking](/speech-to-text/batch/output#tracking-metadata) | Batch, Realtime | SaaS, On-prem |
51+
| [Translation](/speech-to-text/features/translation) | Batch, Realtime | SaaS, On-prem |

docs/deployments/kubernetes/realtime.mdx

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -171,6 +171,13 @@ Run the following command to uninstall Speechmatics from the cluster:
171171
helm uninstall speechmatics-realtime
172172
```
173173

174+
Depending on the configuration setup, you may also need to remove PVCs created from the redis deployment:
175+
176+
```bash
177+
# Delete any left-over PVCs with `kubectl delete pvc`
178+
kubectl get pvc | grep redis-data
179+
```
180+
174181
## FAQ
175182

176183
*Why should I use the sm-realtime Helm chart over a Docker container deployment?*

docs/deployments/virtual-appliance/administration/adding-languages.mdx

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,10 @@ description: Add languages to a Virtual Appliance deployment
44

55
# Adding Languages
66

7+
:::warning
8+
As of 08 October 2025 we have moved our public containers from artifactory to Azure Container Registry (ACR), please [reach out to Support](https://support.speechmatics.com) to update your credentials.
9+
:::
10+
711
:::note
812
Adding languages is currently only supported for `batch` mode, if you need extra realtime languages please [reach out to Support](https://support.speechmatics.com).
913
:::
@@ -15,21 +19,27 @@ Log into the appliance using SSH with the username `smadmin`, as described in [R
1519
You will need to know which version of the images are compatible with your appliance, this will be documented here, but until then [reach out to Support](https://support.speechmatics.com).
1620
Language codes are the standard two-letter ISO codes used in configuring transcription, for example: "de", "fr".
1721

22+
Appliance versions up to and including `6.2.1` will pull from our artifactory repository by default, as of 08 October 2025 we will deprecate public access to this repository (see above). If running an appliance version `6.2.1` or earlier, you will need to configure your appliance to pull from the ACR repository instead.
23+
24+
```bash
25+
export SM_DOCKER_PUBLIC=speechmaticspublic.azurecr.io
26+
```
27+
1828
Access to the Speechmatics repository is needed to download new languages, these credentials can be set in the environment but need to be base64 encoded, alternately if unset the user will be prompted for the username and password after running the configure languages script. If you need access to the Speechmatics container repository please [reach out to Support](https://support.speechmatics.com).
1929

2030
```bash
21-
export CRICTL_AUTH=$(echo -n <username>:<password> | base64)
31+
export CRICTL_AUTH="$(echo -n <username>:<password> | base64)"
2232
```
2333

2434
### Add a language:
2535
```bash
26-
sudo CRICTL_AUTH=$(echo -n <username>:<password> | base64) BUILD_MODE=batch configure_languages.sh <version> <language-code> [<language-code> ...]
36+
sudo CRICTL_AUTH="$(echo -n <username>:<password> | base64)" SM_DOCKER_PUBLIC=speechmaticspublic.azurecr.io BUILD_MODE=batch configure_languages.sh <version> <language-code> [<language-code> ...]
2737
```
2838

2939
### Remove a language:
3040

3141
```bash
32-
sudo CRICTL_AUTH=$(echo -n <username>:<password> | base64) BUILD_MODE=batch configure_languages.sh -r <language-code> [<language-code> ...]
42+
sudo CRICTL_AUTH="$(echo -n <username>:<password> | base64)" SM_DOCKER_PUBLIC=speechmaticspublic.azurecr.io BUILD_MODE=batch configure_languages.sh -r <language-code> [<language-code> ...]
3343
```
3444

3545
:::note

docs/get-started/authentication.mdx

Lines changed: 27 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -33,10 +33,18 @@ curl -X GET "https://asr.api.speechmatics.com/v2/jobs/" \
3333
</TabItem>
3434
<TabItem value="real-time" label="Real-Time Transcription" default>
3535

36-
Your API key must be provided in the WebSocket connection request header. For example:
36+
### Server-side using header + API key
37+
38+
For server-side calls, your API key must be provided in the header of the [Websocket connection request](/api-ref/realtime-transcription-websocket#handshake-responses).
39+
40+
### Client-side using temporary key
41+
42+
For client-side calls, we recommend using [temporary keys](#temporary-keys) (JWTs) to authenticate your requests. These can be passed as a query param to the [Websocket connection request](/api-ref/realtime-transcription-websocket#handshake-responses) URL.
43+
44+
For example:
3745

3846
```bash
39-
wss://eu2.rt.speechmatics.com/v2
47+
wss://eu2.rt.speechmatics.com/v2?jwt=$TEMP_KEY
4048
```
4149

4250
</TabItem>
@@ -54,7 +62,7 @@ Speechmatics Batch SaaS supports the following endpoints for production use:
5462
| Enterprise | EU | EU2 | **eu2.asr.api.speechmatics.com** |
5563
| All | US | US1 | **us.asr.api.speechmatics.com**<br/>**us1.asr.api.speechmatics.com** |
5664
| Enterprise | US | US2 | **us2.asr.api.speechmatics.com** |
57-
| Enterprise | AU | AU1 | **au1.asr.api.speechmatics.com** |
65+
| All | AU | AU1 | **au1.asr.api.speechmatics.com** |
5866

5967
Speechmatics Real-Time SaaS supports the following endpoints for production use:
6068

@@ -64,20 +72,22 @@ Speechmatics Real-Time SaaS supports the following endpoints for production use:
6472
| All | EU | EU1 | neu.rt.speechmatics.com |
6573
| All | US | US1 | wus.rt.speechmatics.com |
6674

67-
Note that for Enterprise customers with access to the Portal, only jobs created in the EU1 environment will appear in the [Jobs list](https://portal.speechmatics.com/jobs).
75+
All production environments are active and highly available. You can use multiple environments to balance requests or provide a failover in the event of disruption to one environment.
6876

69-
All production environments are active and highly available. Multiple environments can be used to balance requests or provide a failover in the event of disruption to one environment.
77+
Jobs are created in the environment corresponding to the endpoint used. You must use the same endpoint for all requests relating to a specific job.
7078

71-
Note that jobs are created in the environment corresponding to the endpoint used. You must use the same endpoint for all requests relating to a specific job.
79+
If you attempt to use an endpoint for a region you are not contracted to use, that request will be unsuccessful. If you want to use a different region, please contact your account manager.
7280

73-
If you attempt to use an endpoint for a region you are not contracted to use, that request will be unsuccessful. If you want to use a different region, please contact your Account Manager.
81+
:::warning
82+
The EU2 and US2 environments are provided for enterprise customer high availability and failover purposes only. Jobs created in these environments will not be visible in the portal.
83+
:::
7484

7585
## Temporary keys
7686

7787
Speechmatics allows you to generate temporary keys to authenticate your requests instead of your long-lived API key. This can improve end-user experience by reducing latency, and can also reduce development effort and complexity.
7888

7989
:::info
80-
If you are an enterprise customer and would like to use temporary keys, reach out to [Support](https://support.speechmatics.com) or speak to your Account Manager.
90+
If you are an enterprise customer and would like to use temporary keys, reach out to [support](https://support.speechmatics.com) or speak to your account manager.
8191
:::
8292

8393
### Real-time transcription
@@ -117,6 +127,15 @@ curl -L -X POST "https://mp.speechmatics.com/v1/api_keys?type=batch" \
117127
-d '{"ttl": 60, "client_ref": "USER123"}'
118128
```
119129

130+
The response JSON looks like this:
131+
132+
```json
133+
{
134+
"apikey_id": null,
135+
"key_value": "eyJhbG..."
136+
}
137+
```
138+
120139
To authorize a request using a temporary key, simply use it in place of the API key in the Authorization header.
121140

122141
For example, if you created a temporary key associated with a given `client_ref` you can retrieve those jobs as follows:

docs/get-started/sidebar.ts

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,19 @@ export default {
6161
},
6262
],
6363
},
64+
{
65+
type: "category",
66+
label: "Text to speech",
67+
collapsible: true,
68+
collapsed: true,
69+
items: [
70+
{
71+
type: "link",
72+
href: "https://github.com/speechmatics/speechmatics-python-sdk/tree/main/sdk/tts",
73+
label: "Python",
74+
},
75+
],
76+
},
6477
{
6578
type: "category",
6679
label: "Voice agents – Flow",

docs/speech-to-text/batch/assets/transcript-response-example.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
"word_delimiter": " ",
1616
"writing_direction": "left-to-right"
1717
},
18-
"orchestrator_version": "2025.06.28651+1eb4127132.HEAD",
18+
"orchestrator_version": "2025.06.28+1eb4127132+13.4.0",
1919
"transcription_config": {
2020
"language": "en",
2121
"operating_point": "enhanced"

docs/speech-to-text/batch/batch_diarization.mdx renamed to docs/speech-to-text/batch/batch-diarization.mdx

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ The feature is disabled by default. To enable speaker diarization, `diarization`
5252
When diarization is enabled, each `word` and `punctuation` object in the transcript includes a `speaker` property that identifies who spoke it. There are two types of labels:
5353

5454
- `S#` – S stands for speaker, and `#` is a sequential number identifying each speaker. S1 appears first in the results, followed by S2, S3, and so on.
55-
- `UU` – Used when the speaker cannot be identified or diarization is not applied (i.e. running batch mode on CPU operating points), for example, if background noise is transcribed as speech but no speaker can be determined.
55+
- `UU` – Used when the speaker cannot be identified or diarization is not applied, for example, if background noise is transcribed as speech but no speaker can be determined.
5656

5757
```json
5858
"results": [
@@ -184,10 +184,6 @@ This adjustment only works when punctuation is enabled. Disabling punctuation vi
184184

185185
Adjusting punctuation sensitivity can also affect how accurately speakers are identified.
186186

187-
### Speaker diarization Timeout
188-
189-
Speaker diarization will time out if it takes too long to run for a particular audio file. Currently, the timeout is set to 5 minutes or 0.5 \* the audio duration, whichever is longer. For example, with a 2 hour audio file, the timeout is 1 hour. If a timeout happens, the transcript will still be returned and all speaker labels in the output will be labelled as UU.
190-
191187
### Speaker change (legacy)
192188

193189
The speaker change detection feature was removed in July 2024. The `speaker_change` and `channel_and_speaker_change` parameters are no longer supported. Use the [speaker diarization](#speaker-diarization) feature for speaker labeling.

docs/speech-to-text/batch/sidebar.ts

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,11 @@ export default {
2020
},
2121
{
2222
type: "doc",
23-
id: "speech-to-text/batch/batch_diarization",
23+
id: "speech-to-text/batch/batch-diarization",
24+
},
25+
{
26+
type: "doc",
27+
id: "speech-to-text/batch/speaker-identification",
2428
},
2529
{
2630
type: "category",

0 commit comments

Comments
 (0)