This project implements real-time speech-to-text functionality in Unity using the Vosk speech recognition toolkit, optimized for Apple Silicon Macs.
- Real-time speech recognition
- Support for multiple microphones
- Customizable speech detection parameters
- Threaded audio processing for improved performance
- Unity 2022.3 or later
- macOS with Apple Silicon (M1 chip or later)
- Xcode (for building on macOS)
-
Clone this repository or download the project files.
-
Open the project in Unity.
-
Download the Vosk model:
- Go to the Vosk Models page
- Download the
vosk-model-small-en-us-0.15
model (or another model of your choice) - Extract the downloaded model to
Assets/StreamingAssets/models/
-
Ensure the Vosk library is properly set up:
- Check that
libvosk.dylib
is present inAssets/Plugins/macOS/
- Verify that the
VoskLoader.cs
script is in your project
- Check that
-
Add the
ImprovedSpeechToText
script to a GameObject in your scene. -
Configure the script in the Inspector:
- Select a microphone from the available list
- Adjust the silence threshold, minimum speech duration, and maximum silence duration as needed
-
Run the scene. The script will automatically start listening and processing speech.
-
Speech recognition results will be logged to the Console. You can modify the
ProcessRecognitionResult
method to handle the results as needed for your application.
- To use a different Vosk model, change the
modelName
variable in theImprovedSpeechToText
script. - Adjust the
silenceThreshold
,minSpeechDuration
, andmaxSilenceDuration
parameters to fine-tune speech detection.
- If you encounter issues with library loading, check the Console for error messages from the
VoskLoader
script. - Ensure that the Vosk model is correctly placed in the StreamingAssets folder.
- Verify that your microphone is properly connected and recognized by your system.
[MIT License]
This project uses the Vosk Speech Recognition Toolkit, which is distributed under the Apache 2.0 license.