-
Notifications
You must be signed in to change notification settings - Fork 22.9k
Technical review: Document new web speech api features #41145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Technical review: Document new web speech api features #41145
Conversation
Removing myself as reviewer for now |
…d interpretation property pages
|
||
{{APIRef("Web Speech API")}} | ||
|
||
The **`available()`** static method of the [Web Speech API](/en-US/docs/Web/API/Web_Speech_API) checks whether specified languages are available for speech recognition either locally on the user's computer, or via a remote service. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, the API can be used to check if speech recognition is available matching a set of options. Right now, the only option other than language is processLocally, which can be used to guarantee that local recognition is used but can not be used to guarantee that remote service is used. If processLocally is false, speech recognition may happen anywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I think I get the issue here — the initial description was a little bit ambiguous. I've cut it down to just say "...checks whether specified languages are available for speech recognition", and then made sure that I explain clearly what the processLocally
option does in the Parameters description below. I added a note to clarify that you can't use available()
to determine whether a remote service is guaranteed to support the specified languages.
Let me know what you think of the updates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
A {{domxref("Promise")}} that resolves with an emumerated value indicating the availability of the specified languages for speech recognition. Possible values are: | ||
|
||
- `available` | ||
- : Indicates that the specified languages are available. If `processLocally` is set to `true`, `available` means that the required language packs have been downloaded and installed on the user's computer. If `processLocally` is set to `false`, `available` means that speech recognition is availale for those languages either on-device or remotely. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
availale->available
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good spot; fixed!
- `processLocally` {{optional_inline}} | ||
- : A boolean that specifies whether you are checking availability of the specified languages for [on-device speech recognition](/en-US/docs/Web/API/Web_Speech_API/Using_the_Web_Speech_API#on-device_speech_recognition) (`true`) or availability of the specified languages for on-device or remote speech recognition (`false`). Defaults to `false`. | ||
|
||
### Return value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be a good idea to document what happens when multiple languages are specified with different availability: https://webaudio.github.io/web-speech-api/#availability-algorithm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify what you think needs explaining?
In the description of the different possible return values, I've explained that available
is returned if support for all the languages is available, and unavailable
is returned if at least one language is not supported.
I've done a bit of restructuring to make this clearer. Am I misunderstanding something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The available() method takes in a collection of languages, so if a user calls it with en-US and fr-FR for example, en-US might be "available" but fr-FR might be "downloading", so the method would resolve with "downloading" since it only returns a single status for all of the languages specified. The section on the availability algorithm in the spec describes this behavior.
|
||
### Parameters | ||
|
||
- `options` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, the install method takes in the same SpeechRecognitionOptions parameter has the available method. The behavior is undefined in the spec when install is called with processLocally=false though--we should probably fix that at some point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the way the spec is written makes it sound like "both take the same options object, but install() doesn't use the processLocally property", which made me think "in that case they should be defined as two separate dictionaries".
I decided to just not draw attention to it for now.
|
||
```js | ||
const phraseData = [ | ||
{ phrase: "azure", boost: 10.0 }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the spec, "A float representing approximately the natural log of the number of times more likely the website thinks this phrase is than what the speech recognition model knows. A valid boost must be a float value inside the range [0.0, 10.0], with a default value of 1.0 if not specified. A boost of 0.0 means the phrase is not boosted at all, and a higher boost means the phrase is more likely to appear. A boost of 10.0 means the phrase is extremely likely to appear and should be rarely set."
A boost of 10.0 for the phrase "azure" might result in phrases erroneously recognized as "azure".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, understood. I've put this value down to 5.0
in all the source code listings. Is that more sensible?
I've also added a note to the SpeechRecognitionPhrase()
constructor and boost
property pages to warn about setting your boost values too high.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good!
- `audio-capture` | ||
- : Audio capture failed. | ||
- `bad-grammar` | ||
- : There was an error in the speech recognition grammar or semantic tags, or the chosen |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, the bad-grammar error no longer appears in the Web Speech API spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to know. I've put the icons for deprecated and non-standard next to the error name, and added a note saying that it has been removed, along with the concept of grammar.
Do you know if it is still available in the browser implementation? If not, we could even just remove it altogether
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it still exists in the Chromium implementation.
The Web Speech API has a main controller interface for this — {{domxref("SpeechRecognition")}} — plus several related interfaces for representing results, etc. | ||
|
||
The Web Speech API has a main controller interface for this — {{domxref("SpeechRecognition")}} — plus a number of closely-related interfaces for representing grammar, results, etc. Generally, the default speech recognition system available on the device will be used for the speech recognition — most modern OSes have a speech recognition system for issuing voice commands. Think about Dictation on macOS, Siri on iOS, Cortana on Windows 10, Android Speech, etc. | ||
Generally, the default speech recognition system available on the device will be used for the speech recognition — most modern OSes have a speech recognition system for issuing voice commands. Think about Dictation on macOS or Cortana on Windows. On some browsers, such as Chrome, using Speech Recognition on a web page involves a server-based recognition engine. Your audio is sent to a web service for recognition processing, so it won't work offline. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "Cortana" was rebranded as "Microsoft Copilot"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chrome recently launched on-device Web Speech, so it would be more accurate to say that speech recognition "may" involve a server-based recognition. Or it might be better to just use a different example like Safari that always uses server-side speech recognition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've replaced Cortana with Copilot — shows how long ago these docs were originally written!
I've restructured this bit to avoid the server-based versus on-device issue you highlighted. Let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Description
Chrome 142 desktop adds support for the on-device speech recognition functionality of the Web Speech API. See https://chromestatus.com/feature/6090916291674112.
This PR documents this new functionality as well as documenting a few related bits and making some needed updates.
Specifically, it:
SpeechRecognition
available()
andinstall()
methods.SpeechRecognition
phrases
andprocessLocally
properties.SpeechRecognitionPhrase
interface.on-device-speech-recognition
Permissions Policy directive.SpeechRecognition.start()
page to include the MediaStreamTrack parameter version.SpeechGrammar
andSpeechGrammarList
interfaces as deprecated and explains the story behind them.SpeechRecognitionEvent
emma
andinterpretation
properties, as they have not been supported for about 5 years.SpeechRecognitionEvent
andSpeechRecognitionErrorEvent
constructors.See mdn/dom-examples#332 for a related demo addition.
Motivation
Additional details
Related issues and pull requests