Speech Recognition Polyfill

A Web Speech API polyfill that swaps webkitSpeechRecognition for local Whisper and Cloud based AssemblyAI transcription.

All local AI Models used are free and don't require any major configuration.

Tested with Duolingo and Google Translate. May work decently with other sites that utilize the API..

Important

The following is accessed for patching sites that use webkitSpeechRecognition. Each serves a specific purpose:

Permissions	Reason
`<all_urls>`	Inject the content script/polyfill on any page using speech.
`storage`	Save defaults and per-site overrides (engine, model, language, timeout, debug, cache, etc.).
`tabs`	Open options page on install and manage icon state with active tabs.

Installation Process

Browser	Installation Steps
	Recommended: Mozilla Add-ons Store - Click Add to Firefox - ✅ Done - ⭐ Rate the addon Alternative (dev build): - Download the latest ZIP from Releases - Go to `about:debugging#/runtime/this-firefox` - Click Load Temporary Add-on… and pick `manifest.json` (or the ZIP) - ✅ Done - ⭐ Pin the mic icon to see status colors

Frequently Asked Questions + Models Available

Q	A
How do I make an API Key?	Click the link, create an account, and you’ll get a key right after signing up.
Is the cloud model paid?	AssemblyAI provides a free tier (IIRC, 300–500 hours/month). Beyond that, you have to pay or switch to the local model.
Does audio leave my device?	Local (Default): No audio stays on-device (after the model downloads). Cloud (AssemblyAI): Yes, audio is uploaded for transcription.
Can you explain the icon indicators?	Color reflects recording/processing/error; badges show downloading/cached/done/cancel. A red/error icon often means canceled, missing API key, or unintelligible speech...Not necessarily a bad mic. Pin the icon to monitor state.
How do I improve accuracy?	Speak loud, slow, and clear; pick the correct mic. Use a larger Whisper model (slower) or switch to the cloud engine for better speed & quality.
How is silence handled?	Adaptive Voice Activity Detection plus a configurable silence timeout (global and per-site).

Engine	Model ID	Notes
Local Whisper	`Xenova/whisper-tiny.en`	English-only, fastest
Local Whisper	`Xenova/whisper-tiny`	Multilingual, fast
Local Whisper	`Xenova/whisper-base.en`	English-only, balanced
Local Whisper	`Xenova/whisper-base`	Multilingual, balanced
Local Whisper	`Xenova/whisper-small.en`	English-only, higher quality (slower)
Local Whisper	`Xenova/whisper-small`	Multilingual, higher quality (slower)
Local Whisper	`Xenova/distil-whisper-medium.en`	English-only, distilled medium (larger/slower)
Cloud	`AssemblyAI` (API key required)	Remote transcription; model managed by AssemblyAI

Extra Tips

For learning sites (e.g., Duolingo): set the site language to the one you’re practicing for better speech recognition.
- For Google Translate, auto-language usually suffices since the site gives us information.
It does not support continuous speech.

Contributing

Keep permissions minimal.
Please organize code.
Use Debug Mode.

Support Me

Thanks i need it

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github		.github
extension		extension
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Speech Recognition Polyfill

Installation Process

Frequently Asked Questions + Models Available

Extra Tips

Contributing

Support Me

About

Uh oh!

Releases 9

Sponsor this project

Uh oh!

Packages

Languages

Uh oh!

License

apersongithub/Speech-Recognition-Polyfill

Folders and files

Latest commit

History

Repository files navigation

Speech Recognition Polyfill

Installation Process

Frequently Asked Questions + Models Available

Extra Tips

Contributing

Support Me

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 9

Sponsor this project

Uh oh!

Packages 0

Languages

Packages