Skip to content

An experimental but optimized polyfill extension made to enable some support for the webspeech/speechRecognition API in firefox

License

Notifications You must be signed in to change notification settings

apersongithub/Speech-Recognition-Polyfill

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Speech Recognition Polyfill

Firefox Extension Rating Firefox Extension Users Firefox Extension Version

A Web Speech API polyfill that swaps webkitSpeechRecognition for local Whisper and Cloud based AssemblyAI transcription.

All local AI Models used are free and don't require any major configuration.

Tested with Duolingo and Google Translate. May work decently with other sites that utilize the API..

Important

The following is accessed for patching sites that use webkitSpeechRecognition. Each serves a specific purpose:

Permissions Reason
<all_urls> Inject the content script/polyfill on any page using speech.
storage Save defaults and per-site overrides (engine, model, language, timeout, debug, cache, etc.).
tabs Open options page on install and manage icon state with active tabs.

Installation Process

Browser Installation Steps
Recommended: Mozilla Add-ons Store
- Click Add to Firefox
- ✅ Done
- ⭐ Rate the addon

Alternative (dev build):
- Download the latest ZIP from Releases
- Go to about:debugging#/runtime/this-firefox
- Click Load Temporary Add-on… and pick manifest.json (or the ZIP)
- ✅ Done
- ⭐ Pin the mic icon to see status colors

Frequently Asked Questions + Models Available

Q A
How do I make an API Key? Click the link, create an account, and you’ll get a key right after signing up.
Is the cloud model paid? AssemblyAI provides a free tier (IIRC, 300–500 hours/month). Beyond that, you have to pay or switch to the local model.
Does audio leave my device? Local (Default): No audio stays on-device (after the model downloads). Cloud (AssemblyAI): Yes, audio is uploaded for transcription.
Can you explain the icon indicators? Color reflects recording/processing/error; badges show downloading/cached/done/cancel. A red/error icon often means canceled, missing API key, or unintelligible speech...Not necessarily a bad mic. Pin the icon to monitor state.
How do I improve accuracy? Speak loud, slow, and clear; pick the correct mic. Use a larger Whisper model (slower) or switch to the cloud engine for better speed & quality.
How is silence handled? Adaptive Voice Activity Detection plus a configurable silence timeout (global and per-site).
Engine Model ID Notes
Local Whisper Xenova/whisper-tiny.en English-only, fastest
Local Whisper Xenova/whisper-tiny Multilingual, fast
Local Whisper Xenova/whisper-base.en English-only, balanced
Local Whisper Xenova/whisper-base Multilingual, balanced
Local Whisper Xenova/whisper-small.en English-only, higher quality (slower)
Local Whisper Xenova/whisper-small Multilingual, higher quality (slower)
Local Whisper Xenova/distil-whisper-medium.en English-only, distilled medium (larger/slower)
Cloud AssemblyAI (API key required) Remote transcription; model managed by AssemblyAI

Extra Tips

  • For learning sites (e.g., Duolingo): set the site language to the one you’re practicing for better speech recognition.
    • For Google Translate, auto-language usually suffices since the site gives us information.
  • It does not support continuous speech.

Contributing

  • Keep permissions minimal.
  • Please organize code.
  • Use Debug Mode.

Thanks i need it

About

An experimental but optimized polyfill extension made to enable some support for the webspeech/speechRecognition API in firefox

Resources

License

Stars

Watchers

Forks

Sponsor this project

Packages

No packages published