Operator-to-robot Text-to-Speech #64

hello-amal · 2024-07-05T00:27:19Z

Description

This PR adds operator to robot text-to-speech capabilities. Specifically, it adds:

Backend:
1. A TextToSpeechEngine abstract class that can be used to support multiple engines in a plug-and-play fashion. gTTS and pyttsx3 are implemented.
  1. This abstract class allows multiple voices, two speeds (slow and default), and interrupting an ongoing utterance.
2. A ROS2 node (and corresponding custom message) that takes in text and additional metadata (voice, speed, whether to interrupt) from a topic and executes it using the specified engine (currently gTTS).
Frontend:
1. A new basic component, DropdownInput, that behaves like Dropdown but has a textarea to the left of the dropdown arrow.
2. A web app component, on the same level as "Movement Recorder," that allows users to type arbitrary text, save/delete it, play it on the robot, and stop a robot's utterance.
3. The data flows through WebRTC and ROSLibJS to enable the above to work.

Select design decision

On the web app side, if the user clicks "Play" while an utterance is currently playing, it queues up the second utterance. This is for two reasons: (a) if the operator wants to interrupt the first utterance, they can click "Stop" followed by "Play." (b) if the operator wants the robot to speak a long utterance, this allows them to enter it one sentence at a time, to avoid a lond pause as they are typing.
On the web app, when the user clicks on the text area, it highlights all the text. This is to make it easier for them to delete text if they are typing one-sentence-at-a-time and speed is important (e.g., live conversaiton).

Testing procedure

Before opening a pull request

From the top-level of this repository, run:

pre-commit run --all-files

To merge

Squash & Merge

…ou type.

hello-amal · 2024-07-10T23:51:11Z

Ran all tests on 3030, but the test to ensure the requirements are complete. @hello-vinitha can you run that on your robot, since it is a "clean" install (e.g., it shouldn't have any of these audio libraries?)

hello-vinitha · 2024-07-12T23:36:56Z

@hello-amal All the tests pass on 2051. A couple of questions/suggestions:

Is there any clear benefit of having pyTTS/is there a scenario where a user would want to use pyTTS over gTTS? If yes, then I recommend adding this as an optional flag to the launch file and launch script. If not, I would recommend removing pyTTs.
When an utterance is playing, I recommend changing the text from "Play" to "Add to Queue" so that the operator knows that they can queue utterances.
I recommend hiding the "Stop" button unless something is playing so we can reduce the number of buttons when possible. Also, it will draw the operators attention when it does appear.
Make the delete icon in movement recorder the lighter red and the stop button in TTS the deeper red to be consistent with the color scheme throughout the rest of the interface.

…pdown

hello-amal · 2024-07-12T23:52:18Z

Well, gTTS uses Google's unofficial Google Translate API, which they may stop supporting at any time. So I think it is important to have pyttsx3, even if its voices are not good. I'll add a launchfile flag for that.
Going back to our earlier discussion, changing "Play" to "Add to Queue" and only showing "Stop" when an utterance is playing would require the text to speech node to provide feedback back to the app, which requires changing it to an action and is a pretty involved change on both the web app and ROS node side. I have created an issue for this to be done as a separate PR (TTS: Adaptivity based on whether the robot is speaking #73 ).
Will make the color change.

hello-vinitha · 2024-07-12T23:55:16Z

Ah yes, I completely forgot that we had discussed (2). That sounds good, we can revisit that.

hello-amal · 2024-07-15T21:05:25Z

Addressed the changes. Here is a screenshot of the updated color scheme.

hello-amal added 4 commits July 4, 2024 15:09

[WIP] write a ROS node for TTS

fc839d2

[WIP] pyttsx3 mostly works but sounds awful

d68effc

created abstract class to allow easy switching of engines

bdd718e

gTTS works and overrides work

f2a0ca2

hello-amal marked this pull request as draft July 5, 2024 00:27

hello-amal added 7 commits July 9, 2024 18:25

[WIP] basic UI arrangement works

6f2b996

UI layout done

5639f53

Play and Stop function providers work

3c84644

Finished implementing function providers

77ee16c

Add scroll to too-big dropdown popups, check whether text exists as y…

db9697c

…ou type.

Update comments

b29e1b5

Update requirements

841a688

hello-amal marked this pull request as ready for review July 10, 2024 20:54

hello-amal changed the title ~~[WIP] Operator-to-robot Text-to-Speech~~ Operator-to-robot Text-to-Speech Jul 10, 2024

hello-amal requested a review from hello-vinitha July 10, 2024 23:50

hello-amal added 2 commits July 10, 2024 16:55

Fixes from testing

d8ca787

Merge branch 'master' into amaln/operator_to_robot_tts

016fc35

hello-amal mentioned this pull request Jul 11, 2024

TTS: Adaptivity based on whether the robot is speaking #73

Open

hello-amal added 4 commits July 11, 2024 17:08

Remove unnecessary logs

4b6e8c5

Auto-select all text on click

a2fc743

Trim whitespace before storing

269b940

Removed unnecessary logs

1b4f8f5

updated style of dropdown input to be consistent with the regular dro…

1f797d7

…pdown

hello-vinitha approved these changes Jul 15, 2024

View reviewed changes

hello-amal added 2 commits July 15, 2024 12:28

Add launch arg for tts engine

93628a4

Update color scheme

2bada68

hello-amal merged commit c53a7c0 into master Jul 15, 2024
1 check passed

hello-amal deleted the amaln/operator_to_robot_tts branch July 15, 2024 21:05

hello-amal mentioned this pull request Jul 18, 2024

Add a script to configure audio devices #77

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Operator-to-robot Text-to-Speech #64

Operator-to-robot Text-to-Speech #64

hello-amal commented Jul 5, 2024 •

edited

Loading

hello-amal commented Jul 10, 2024

hello-vinitha commented Jul 12, 2024

hello-amal commented Jul 12, 2024

hello-vinitha commented Jul 12, 2024

hello-amal commented Jul 15, 2024

Operator-to-robot Text-to-Speech #64

Operator-to-robot Text-to-Speech #64

Conversation

hello-amal commented Jul 5, 2024 • edited Loading

Description

Select design decision

Testing procedure

Before opening a pull request

To merge

hello-amal commented Jul 10, 2024

hello-vinitha commented Jul 12, 2024

hello-amal commented Jul 12, 2024

hello-vinitha commented Jul 12, 2024

hello-amal commented Jul 15, 2024

hello-amal commented Jul 5, 2024 •

edited

Loading