This solution contains custom UiPath activities for Windows framework for the use of Azure AI SDK, including but not limited to Speech-to-Text (STT) operations, and other Azure AI functionality using the Microsoft Cognitive Services Speech SDK.
This project focuses on the the SpeechToTextActivity
class, which leverages the Microsoft Cognitive Services Speech SDK to perform STT on a provided WAV file.
- BeginExecute: Starts the asynchronous speech recognition operation. It retrieves input arguments, validates them, and initiates the speech recognition task using
TaskCompletionSource<T>
to ensure compatibility with UiPath'sAsyncCodeActivity
. - EndExecute: Handles the completion of the asynchronous operation. It retrieves the result or handles exceptions if any occurred.
- PerformSpeechRecognition: Executes the speech recognition using Azure Cognitive Services Speech SDK and returns the recognized text or an error message.
- HandleSpeechRecognitionResult: Interprets the Speech SDK's results and formats them for output.
- Visual Studio (or compatible IDE)
- Microsoft Cognitive Services Speech SDK (version 1.30.0 or newer) - comes as built-in dependency
- .NET Framework 4.6.1 or newer / .NET 6.0-windows
- A valid Azure subscription with access to Azure Speech Services
- Clone or download the repository.
- Open the solution in Visual Studio.
- Restore NuGet packages by right-clicking on the solution in Solution Explorer and selecting "Restore NuGet Packages."
- Build the project to generate the necessary assemblies.
- Compile the project to create the assembly or deploy it as a custom activity in UiPath.
- Use UiPath Studio to create a workflow that utilizes the custom
SpeechToTextActivity
activity. - Provide the required input arguments in UiPath Studio:
- SubscriptionKey: Your Microsoft Cognitive Services subscription key.
- ServiceRegion: The region where your Speech Service is hosted (e.g.,
westus
). - AudioFilePath: The full path to the WAV file you want to transcribe.
- Locale: The language locale for speech recognition (e.g.,
en-US
).
The activity will output the result of the speech recognition process:
- RECOGNIZED: Displays the recognized text from the audio file.
- NOMATCH: Indicates that speech could not be recognized from the audio file.
- CANCELED: Provides details if the operation was canceled, including the reason and error details.
- UNKNOWN ERROR: Displays an error message if the recognition fails for unspecified reasons.
- SubscriptionKey:
<your_subscription_key>
- ServiceRegion:
<your_service_region>
- AudioFilePath:
<path_to_audio_file>
- Locale:
<language_locale>
Speech Recognition Result:
- RECOGNIZED:
Recognized Text: <recognized_text>
- NOMATCH: Speech could not be recognized.
- CANCELED: Reason=
<cancellation_reason>
, ErrorCode=<error_code>
, Details=<error_details>
. - UNKNOWN ERROR: Unable to process speech.
- FileNotFoundException: If the specified audio file does not exist, an error will be thrown.
- ArgumentException: Thrown if invalid arguments are provided.
- Exception: Any other unexpected errors will be captured and returned with details.
- Ensure that your WAV file is properly formatted for Azure Speech SDK.
- Check the Azure Speech SDK documentation for supported locales and regions.
This project is licensed under the MIT License. See the LICENSE file for details.