Skip to content

Conversation

chrisdavidmills
Copy link
Contributor

@chrisdavidmills chrisdavidmills commented Sep 15, 2025

Description

Chrome 142 desktop adds support for the on-device speech recognition functionality of the Web Speech API. See https://chromestatus.com/feature/6090916291674112.

This PR documents this new functionality as well as documenting a few related bits and making some needed updates.

Specifically, it:

  • Updates the "Using..." guide to add details of the new features
  • Adds docs for the new SpeechRecognition available() and install() methods.
  • Adds docs for the new SpeechRecognition phrases and processLocally properties.
  • Adds docs for the new SpeechRecognitionPhrase interface.
  • Adds docs for the new on-device-speech-recognition Permissions Policy directive.
  • Updates the SpeechRecognition.start() page to include the MediaStreamTrack parameter version.
  • Clearly marks the SpeechGrammar and SpeechGrammarList interfaces as deprecated and explains the story behind them.
  • Removes the SpeechRecognitionEvent emma and interpretation properties, as they have not been supported for about 5 years.
  • Adds ref pages for the SpeechRecognitionEvent and SpeechRecognitionErrorEvent constructors.

See mdn/dom-examples#332 for a related demo addition.

Motivation

Additional details

Related issues and pull requests

@chrisdavidmills chrisdavidmills requested review from a team as code owners September 15, 2025 11:38
@chrisdavidmills chrisdavidmills requested review from bsmth and removed request for a team September 15, 2025 11:38
@chrisdavidmills chrisdavidmills marked this pull request as draft September 15, 2025 11:38
@github-actions github-actions bot added Content:WebAPI Web API docs Content:HTTP HTTP docs size/m [PR only] 51-500 LoC changed labels Sep 15, 2025
Copy link
Contributor

github-actions bot commented Sep 15, 2025

Preview URLs (48 pages)
Flaws (7)

Note! 44 documents with no flaws that don't need to be listed. 🎉

URL: /en-US/docs/Web/API/SpeechRecognition/available_static
Title: SpeechRecognition: available() static method
Flaw count: 1

  • macros:
    • Macro produces link /en-US/docs/Web/API/Promise which is a redirect

URL: /en-US/docs/Web/API/SpeechRecognition/grammars
Title: SpeechRecognition: grammars property
Flaw count: 2

  • unknown:
    • Error serializing baseline for numeric-seperators: missing field description``
    • Error serializing baseline for single-color-gradients: missing field description``

URL: /en-US/docs/Web/API/SpeechRecognition/install_static
Title: SpeechRecognition: install() static method
Flaw count: 1

  • macros:
    • Macro produces link /en-US/docs/Web/API/Promise which is a redirect

URL: /en-US/docs/Web/HTTP/Reference/Headers/Permissions-Policy
Title: Permissions-Policy header
Flaw count: 3

  • unknown:
    • No generic content config found
    • no blog root
    • no blog root
External URLs (10)

URL: /en-US/docs/Web/API/SpeechGrammar
Title: SpeechGrammar


URL: /en-US/docs/Web/API/SpeechGrammarList
Title: SpeechGrammarList


URL: /en-US/docs/Web/API/SpeechRecognition/available_static
Title: SpeechRecognition: available() static method


URL: /en-US/docs/Web/API/SpeechRecognition/install_static
Title: SpeechRecognition: install() static method


URL: /en-US/docs/Web/API/SpeechRecognition/phrases
Title: SpeechRecognition: phrases property


URL: /en-US/docs/Web/API/SpeechRecognition/processLocally
Title: SpeechRecognition: processLocally property


URL: /en-US/docs/Web/API/SpeechRecognition/start
Title: SpeechRecognition: start() method


URL: /en-US/docs/Web/API/SpeechRecognitionPhrase
Title: SpeechRecognitionPhrase


URL: /en-US/docs/Web/API/Web_Speech_API/Using_the_Web_Speech_API
Title: Using the Web Speech API

(comment last updated: 2025-09-19 11:23:48)

@github-actions github-actions bot added size/l [PR only] 501-1000 LoC changed and removed size/m [PR only] 51-500 LoC changed labels Sep 15, 2025
@bsmth
Copy link
Member

bsmth commented Sep 16, 2025

Removing myself as reviewer for now

@bsmth bsmth removed their request for review September 16, 2025 07:51
@github-actions github-actions bot added size/xl [PR only] >1000 LoC changed and removed size/l [PR only] 501-1000 LoC changed labels Sep 16, 2025
@chrisdavidmills chrisdavidmills changed the title Document new web speech api features Technical review: Document new web speech api features Sep 17, 2025

{{APIRef("Web Speech API")}}

The **`available()`** static method of the [Web Speech API](/en-US/docs/Web/API/Web_Speech_API) checks whether specified languages are available for speech recognition either locally on the user's computer, or via a remote service.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, the API can be used to check if speech recognition is available matching a set of options. Right now, the only option other than language is processLocally, which can be used to guarantee that local recognition is used but can not be used to guarantee that remote service is used. If processLocally is false, speech recognition may happen anywhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I think I get the issue here — the initial description was a little bit ambiguous. I've cut it down to just say "...checks whether specified languages are available for speech recognition", and then made sure that I explain clearly what the processLocally option does in the Parameters description below. I added a note to clarify that you can't use available() to determine whether a remote service is guaranteed to support the specified languages.

Let me know what you think of the updates.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

A {{domxref("Promise")}} that resolves with an emumerated value indicating the availability of the specified languages for speech recognition. Possible values are:

- `available`
- : Indicates that the specified languages are available. If `processLocally` is set to `true`, `available` means that the required language packs have been downloaded and installed on the user's computer. If `processLocally` is set to `false`, `available` means that speech recognition is availale for those languages either on-device or remotely.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

availale->available

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good spot; fixed!

- `processLocally` {{optional_inline}}
- : A boolean that specifies whether you are checking availability of the specified languages for [on-device speech recognition](/en-US/docs/Web/API/Web_Speech_API/Using_the_Web_Speech_API#on-device_speech_recognition) (`true`) or availability of the specified languages for on-device or remote speech recognition (`false`). Defaults to `false`.

### Return value

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be a good idea to document what happens when multiple languages are specified with different availability: https://webaudio.github.io/web-speech-api/#availability-algorithm

Copy link
Contributor Author

@chrisdavidmills chrisdavidmills Sep 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify what you think needs explaining?

In the description of the different possible return values, I've explained that available is returned if support for all the languages is available, and unavailable is returned if at least one language is not supported.

I've done a bit of restructuring to make this clearer. Am I misunderstanding something?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The available() method takes in a collection of languages, so if a user calls it with en-US and fr-FR for example, en-US might be "available" but fr-FR might be "downloading", so the method would resolve with "downloading" since it only returns a single status for all of the languages specified. The section on the availability algorithm in the spec describes this behavior.


### Parameters

- `options`
Copy link

@evanbliu evanbliu Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, the install method takes in the same SpeechRecognitionOptions parameter has the available method. The behavior is undefined in the spec when install is called with processLocally=false though--we should probably fix that at some point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the way the spec is written makes it sound like "both take the same options object, but install() doesn't use the processLocally property", which made me think "in that case they should be defined as two separate dictionaries".

I decided to just not draw attention to it for now.


```js
const phraseData = [
{ phrase: "azure", boost: 10.0 },

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the spec, "A float representing approximately the natural log of the number of times more likely the website thinks this phrase is than what the speech recognition model knows. A valid boost must be a float value inside the range [0.0, 10.0], with a default value of 1.0 if not specified. A boost of 0.0 means the phrase is not boosted at all, and a higher boost means the phrase is more likely to appear. A boost of 10.0 means the phrase is extremely likely to appear and should be rarely set."

A boost of 10.0 for the phrase "azure" might result in phrases erroneously recognized as "azure".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, understood. I've put this value down to 5.0 in all the source code listings. Is that more sensible?

I've also added a note to the SpeechRecognitionPhrase() constructor and boost property pages to warn about setting your boost values too high.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!

- `audio-capture`
- : Audio capture failed.
- `bad-grammar`
- : There was an error in the speech recognition grammar or semantic tags, or the chosen

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, the bad-grammar error no longer appears in the Web Speech API spec.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know. I've put the icons for deprecated and non-standard next to the error name, and added a note saying that it has been removed, along with the concept of grammar.

Do you know if it is still available in the browser implementation? If not, we could even just remove it altogether

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it still exists in the Chromium implementation.

The Web Speech API has a main controller interface for this — {{domxref("SpeechRecognition")}} — plus several related interfaces for representing results, etc.

The Web Speech API has a main controller interface for this — {{domxref("SpeechRecognition")}} — plus a number of closely-related interfaces for representing grammar, results, etc. Generally, the default speech recognition system available on the device will be used for the speech recognition — most modern OSes have a speech recognition system for issuing voice commands. Think about Dictation on macOS, Siri on iOS, Cortana on Windows 10, Android Speech, etc.
Generally, the default speech recognition system available on the device will be used for the speech recognition — most modern OSes have a speech recognition system for issuing voice commands. Think about Dictation on macOS or Cortana on Windows. On some browsers, such as Chrome, using Speech Recognition on a web page involves a server-based recognition engine. Your audio is sent to a web service for recognition processing, so it won't work offline.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "Cortana" was rebranded as "Microsoft Copilot"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chrome recently launched on-device Web Speech, so it would be more accurate to say that speech recognition "may" involve a server-based recognition. Or it might be better to just use a different example like Safari that always uses server-side speech recognition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've replaced Cortana with Copilot — shows how long ago these docs were originally written!

I've restructured this bit to avoid the server-based versus on-device issue you highlighted. Let me know what you think.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@chrisdavidmills chrisdavidmills marked this pull request as ready for review September 19, 2025 11:21
@chrisdavidmills chrisdavidmills requested review from a team as code owners September 19, 2025 11:21
@chrisdavidmills chrisdavidmills requested review from dipikabh and removed request for evanbliu and a team September 19, 2025 11:21
@sideshowbarker sideshowbarker removed the request for review from a team September 21, 2025 05:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Content:HTTP HTTP docs Content:WebAPI Web API docs size/xl [PR only] >1000 LoC changed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants