PanktiSelector with transcriber support (WebSpeechApi + MSFT) #415

manshantsingh · 2023-06-18T01:28:02Z

Select Pankti based on speech (alpha version)

Please review this PR, and I will make the edits suggested along with my other future changes

Also, I am new to react and so if something I did the 'wrong' way, please let me know.

Short Summary

Use speech to text to follow pankti on the current ang. Today supports WebSpeechApi and Microsoft Cognitive service's Speech-to-text providers
Speech-to-text result is matched using FuzzyMatching to find the closest pankti. We search based on the last 75 characters of transcribed speech-to-text
Works well if user is doing path and reading pankti's only once (ie., works well with Paath but not currently designed for kirtan)
Currently all transcriptions are done in Hindi and fuzzymatch is done on lines converted to hindi

Overview of code changes

Added a Transcriber base class which allows you to transform input lines into the format to be matched against results from the speech to text service implementation for classes that extend this class
Added WebSpeechApiTranscriber and MicrosoftCognitiveServicesSpeechTranscriber classes for services
Added PanktiSelector class that uses the current selected Transcriber and uses fuzzy match to return the result of what pankti to highlight
~~Added a new API in backend to allow fetching auth token for MicrosoftCognitiveServicesSpeechTranscriber~~

How to use:

Press 'space' (or click/touch the Mic button / icon in footer) to toggle start/stop transcribing to follow pankti

How to use: (Advanced, custom Transcriber)

Note: You can use other Transcribers if you set the correct cookies. To make this easy, I added a prompt for adding cookie by pressing 'shift+space'. Prompt expects a response as a cookieName,cookieValue pair (default key,value for WebSpeechApi is prefilled, you just need to press enter)
- You can also change WebSpeechApi to MSFT if you want to use the paid endpoint, but it will require you to set the other cookies as well. Here are all supported cookie key/value pairs
  - Cookie name: TRANSCRIBER_NAME, supported values are WebSpeechApi and MSFT
  - Cookie name: SPEECH_KEY, add the speech key from your azure resource (Only required if using MSFT transcriber)
  - Cookie name: SPEECH_REGION, add the speech region from your azure resource (Only required if using MSFT transcriber)
Refresh the page for it start using the new cookie values from now on
Press 'space' (or click/touch the Mic button / icon in footer) to toggle start/stop transcribing to follow pankti

Known issues:

Currently it does not goto next page / search pankti on next page (TODO, will implement it next)
WebSpeechAPI
- Chrome uses Google Cloud under the hood which is faster, but stops transcribing after sometime (maybe around a minute)
  - Note: Chrome uses (probably) an older version of google cloud for WebSpeechApi which today does not support Punjabi language. Google Cloud Paid service includes Punjabi
- Edge uses Microsoft Cognitive Services which is slower than Chrome, but in my tests did not stop working after some time. More ideal for longer runs
MSFT (ie., MicrosoftCognitiveServicesSpeechTranscriber)
- This requires secret for a azure resource and hence is mostly a paid service. Instead use Edge browser where you can see same/similar behavior for free
  - Why did I MicrosoftCognitiveServicesSpeechTranscriber when I could just use Edge for free ?
    - I did not realize this initially, and would have saved me a lot of time and unnecessary debugging lol (had many issues). I just realized this the day I am creating this PR.
    - Technically, removing this will also get rid of all newly added packages in this PR. I also didnt bother to check/cleanup the package files. They were just updated by npm commands when I was trying to get MicrosoftCognitiveServicesSpeechTranscriber working

manshantsingh · 2023-06-18T01:29:57Z

frontend/src/lib/speech/panktiSelector.ts

+  private static readonly transcriberNameCookieKey = 'TRANSCRIBER_NAME'
+  private static readonly msftApiKeyCookieKey = 'SPEECH_KEY'
+  private static readonly msftApiregionCookieKey = 'SPEECH_REGION'
+  private static readonly openaiApiSecretCookieKey = 'OPENAI_SECRET'


private static readonly openaiApiSecretCookieKey = 'OPENAI_SECRET'
This is unused. I was thinking of implementing online whisper provider as well, but then focused on other things. Forgot to remove this.

PanktiSelector with transcriber support (WebSpeechApi + MSFT)

1b98147

manshantsingh commented Jun 18, 2023

View reviewed changes

manshantsingh added 9 commits June 17, 2023 19:48

default value + page number passed along to prevent race condition

821c6cb

fix filename case-sensitive

793451a

remove debug commented code + move frontend dependencies to frontend

c312892

remove newly added auth api

7448632

more fixup

4076476

Add mic icon for touch screen + default webspeechapi on null cookie

487df03

cookie for testing punjabi

18bc422

console log to print language

343d040

for punjabi dont concatenate old results

ae8e363

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PanktiSelector with transcriber support (WebSpeechApi + MSFT) #415

PanktiSelector with transcriber support (WebSpeechApi + MSFT) #415

manshantsingh commented Jun 18, 2023 •

edited

Loading

manshantsingh Jun 18, 2023 •

edited

Loading

PanktiSelector with transcriber support (WebSpeechApi + MSFT) #415

Are you sure you want to change the base?

PanktiSelector with transcriber support (WebSpeechApi + MSFT) #415

Conversation

manshantsingh commented Jun 18, 2023 • edited Loading

Select Pankti based on speech (alpha version)

Please review this PR, and I will make the edits suggested along with my other future changes

Short Summary

Overview of code changes

How to use:

How to use: (Advanced, custom Transcriber)

Known issues:

manshantsingh Jun 18, 2023 • edited Loading

Choose a reason for hiding this comment

manshantsingh commented Jun 18, 2023 •

edited

Loading

manshantsingh Jun 18, 2023 •

edited

Loading