A Web Speech API polyfill that swaps
webkitSpeechRecognitionfor local Whisper and Cloud based AssemblyAI transcription.All local AI Models used are free and don't require any major configuration.
Tested with Duolingo and Google Translate. May work decently with other sites that utilize the API..
Important
The following is accessed for patching sites that use webkitSpeechRecognition. Each serves a specific purpose:
| Permissions | Reason |
|---|---|
<all_urls> |
Inject the content script/polyfill on any page using speech. |
storage |
Save defaults and per-site overrides (engine, model, language, timeout, debug, cache, etc.). |
tabs |
Open options page on install and manage icon state with active tabs. |
| Browser | Installation Steps |
|---|---|
| Recommended: Mozilla Add-ons Store - Click Add to Firefox - ✅ Done - ⭐ Rate the addon Alternative (dev build): - Download the latest ZIP from Releases - Go to about:debugging#/runtime/this-firefox- Click Load Temporary Add-on… and pick manifest.json (or the ZIP)- ✅ Done - ⭐ Pin the mic icon to see status colors |
| Q | A |
|---|---|
| How do I make an API Key? | Click the link, create an account, and you’ll get a key right after signing up. |
| Is the cloud model paid? | AssemblyAI provides a free tier (IIRC, 300–500 hours/month). Beyond that, you have to pay or switch to the local model. |
| Does audio leave my device? | Local (Default): No audio stays on-device (after the model downloads). Cloud (AssemblyAI): Yes, audio is uploaded for transcription. |
| Can you explain the icon indicators? | Color reflects recording/processing/error; badges show downloading/cached/done/cancel. A red/error icon often means canceled, missing API key, or unintelligible speech...Not necessarily a bad mic. Pin the icon to monitor state. |
| How do I improve accuracy? | Speak loud, slow, and clear; pick the correct mic. Use a larger Whisper model (slower) or switch to the cloud engine for better speed & quality. |
| How is silence handled? | Adaptive Voice Activity Detection plus a configurable silence timeout (global and per-site). |
| Engine | Model ID | Notes |
|---|---|---|
| Local Whisper | Xenova/whisper-tiny.en |
English-only, fastest |
| Local Whisper | Xenova/whisper-tiny |
Multilingual, fast |
| Local Whisper | Xenova/whisper-base.en |
English-only, balanced |
| Local Whisper | Xenova/whisper-base |
Multilingual, balanced |
| Local Whisper | Xenova/whisper-small.en |
English-only, higher quality (slower) |
| Local Whisper | Xenova/whisper-small |
Multilingual, higher quality (slower) |
| Local Whisper | Xenova/distil-whisper-medium.en |
English-only, distilled medium (larger/slower) |
| Cloud | AssemblyAI (API key required) |
Remote transcription; model managed by AssemblyAI |
- For learning sites (e.g., Duolingo): set the site language to the one you’re practicing for better speech recognition.
- For Google Translate, auto-language usually suffices since the site gives us information.
- It does not support continuous speech.
- Keep permissions minimal.
- Please organize code.
- Use Debug Mode.
Thanks i need it