Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PanktiSelector with transcriber support (WebSpeechApi + MSFT) #415

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

manshantsingh
Copy link

@manshantsingh manshantsingh commented Jun 18, 2023

Select Pankti based on speech (alpha version)

Please review this PR, and I will make the edits suggested along with my other future changes

Also, I am new to react and so if something I did the 'wrong' way, please let me know.

Short Summary

  • Use speech to text to follow pankti on the current ang. Today supports WebSpeechApi and Microsoft Cognitive service's Speech-to-text providers
  • Speech-to-text result is matched using FuzzyMatching to find the closest pankti. We search based on the last 75 characters of transcribed speech-to-text
  • Works well if user is doing path and reading pankti's only once (ie., works well with Paath but not currently designed for kirtan)
  • Currently all transcriptions are done in Hindi and fuzzymatch is done on lines converted to hindi

Overview of code changes

  • Added a Transcriber base class which allows you to transform input lines into the format to be matched against results from the speech to text service implementation for classes that extend this class
  • Added WebSpeechApiTranscriber and MicrosoftCognitiveServicesSpeechTranscriber classes for services
  • Added PanktiSelector class that uses the current selected Transcriber and uses fuzzy match to return the result of what pankti to highlight
  • Added a new API in backend to allow fetching auth token for MicrosoftCognitiveServicesSpeechTranscriber

How to use:

  • Press 'space' (or click/touch the Mic button / icon in footer) to toggle start/stop transcribing to follow pankti

How to use: (Advanced, custom Transcriber)

  • Note: You can use other Transcribers if you set the correct cookies. To make this easy, I added a prompt for adding cookie by pressing 'shift+space'. Prompt expects a response as a cookieName,cookieValue pair (default key,value for WebSpeechApi is prefilled, you just need to press enter)
    • You can also change WebSpeechApi to MSFT if you want to use the paid endpoint, but it will require you to set the other cookies as well. Here are all supported cookie key/value pairs
      • Cookie name: TRANSCRIBER_NAME, supported values are WebSpeechApi and MSFT
      • Cookie name: SPEECH_KEY, add the speech key from your azure resource (Only required if using MSFT transcriber)
      • Cookie name: SPEECH_REGION, add the speech region from your azure resource (Only required if using MSFT transcriber)
  • Refresh the page for it start using the new cookie values from now on
  • Press 'space' (or click/touch the Mic button / icon in footer) to toggle start/stop transcribing to follow pankti

Known issues:

  • Currently it does not goto next page / search pankti on next page (TODO, will implement it next)
  • WebSpeechAPI
    • Chrome uses Google Cloud under the hood which is faster, but stops transcribing after sometime (maybe around a minute)
      • Note: Chrome uses (probably) an older version of google cloud for WebSpeechApi which today does not support Punjabi language. Google Cloud Paid service includes Punjabi
    • Edge uses Microsoft Cognitive Services which is slower than Chrome, but in my tests did not stop working after some time. More ideal for longer runs
  • MSFT (ie., MicrosoftCognitiveServicesSpeechTranscriber)
    • This requires secret for a azure resource and hence is mostly a paid service. Instead use Edge browser where you can see same/similar behavior for free
      • Why did I MicrosoftCognitiveServicesSpeechTranscriber when I could just use Edge for free ?
        • I did not realize this initially, and would have saved me a lot of time and unnecessary debugging lol (had many issues). I just realized this the day I am creating this PR.
        • Technically, removing this will also get rid of all newly added packages in this PR. I also didnt bother to check/cleanup the package files. They were just updated by npm commands when I was trying to get MicrosoftCognitiveServicesSpeechTranscriber working

private static readonly transcriberNameCookieKey = 'TRANSCRIBER_NAME'
private static readonly msftApiKeyCookieKey = 'SPEECH_KEY'
private static readonly msftApiregionCookieKey = 'SPEECH_REGION'
private static readonly openaiApiSecretCookieKey = 'OPENAI_SECRET'
Copy link
Author

@manshantsingh manshantsingh Jun 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private static readonly openaiApiSecretCookieKey = 'OPENAI_SECRET'
This is unused. I was thinking of implementing online whisper provider as well, but then focused on other things. Forgot to remove this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant