Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tutorial Videos] Add transcripts and subtitles script #3251

Merged
merged 18 commits into from
Aug 12, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,10 @@ src/resources/dictionaries/*.txt
deploy/scripts/semantic_domains/json/*.json
database/semantic_domains/*

# Intermediate and output files for tutorial video subtitling
*.srt
*.mp4

# Combine installer
installer/*.run
installer/makeself-*
Expand Down
2 changes: 1 addition & 1 deletion Backend/Helper/GrammaticalCategory.cs
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ public bool Matches(string gramCat)
}
}

// The following patterns cover all grammatical categories in Fieldworks for:
// The following patterns cover all grammatical categories in FieldWorks for:
// English (en), Spanish (es), French (fr), Portuguese (pt), Russian (ru), Chinese (zh)
// Omissions due to conflicting abbreviations:
// Spanish "indf" for Indefinite Pronoun (conflicts with abbrev. for Indefinite article)
Expand Down
11 changes: 11 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@ A rapid word collection tool. See the [User Guide](https://sillsdev.github.io/Th
7. [Add or Update Dictionary Files](#add-or-update-dictionary-files)
8. [Cleanup Local Repository](#cleanup-local-repository)
9. [Generate Installer Script for The Combine](#generate-installer-script-for-the-combine-linux-only)
10. [Generate Tutorial Video Subtitles](#generate-tutorial-video-subtitles)
3. [Setup Local Kubernetes Cluster](#setup-local-kubernetes-cluster)
1. [Install Rancher Desktop](#install-rancher-desktop)
2. [Install Docker Desktop](#install-docker-desktop)
Expand Down Expand Up @@ -544,6 +545,16 @@ To update the PDF copy of the installer README.md file, run the following from t
pandoc --pdf-engine=weasyprint README.md -o README.pdf
```

## Generate Tutorial Video Subtitles

Tutorial video transcripts are housed in `docs/tutorial_subtitles`, together with timestamps aligning transcripts with
the corresponding videos and any transcript translations downloaded from Crowdin. To generate subtitle files (and
optionally attach them to a video file), run from within a Python virtual environment:

```bash
python scripts/subtitle_tutorial_video.py -s <subtitles_subfolder_name> [-i <input_video_path> -o <output_video_path] [-v]
```

## Setup Local Kubernetes Cluster

This section describes how to create a local Kubernetes cluster using either _Rancher Desktop_ or _Docker Desktop_.
Expand Down
18 changes: 18 additions & 0 deletions docs/tutorial_subtitles/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
This folder contains the transcripts for tutorial videos and timestamps for generating video subtitles.

The `_in_progress_transcripts_` subfolder holds transcripts that are awaiting a video recording.

Each other `<vid>` subfolder holds one `times.txt` and at least one `<vid>.<lang>.txt` where `<lang>` is the 3-character
code for the language of the transcript (`eng` as well as any other languages into which the transcripts has been
translated). All these files should have the same number of lines:

- `<vid>.eng.txt`: each line is one sentence;
- `times.txt`: each line has the ending time of the corresponding English sentence in the tutorial video (format: `m:s`,
where `s` can have up to 3 digits after the decimal);
- `<vid>.<lang>.txt` for `<lang>` other than `eng`: each line has the translation for the corresponding English sentence
(and if one sentence was translated into multiple sentences, the translation should still be just one line).

To generate the subtitles and attach them to the video, use `scripts/subtitle_tutorial_video.py`.

DON'T EDIT THE `.eng.txt` TRANSCRIPT IN A FOLDER WITH A `times.txt` FILE! It matches an existing video. If changes are
needed to the transcript for an updated video, put a copy into the `_in_progress_transcripts_` folder and edit it there.
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
The Combine is designed for Rapid Word Collection, a method of gathering words by semantic domain.
In this video, we will see how to do Data Entry in The Combine to collect words.
Let’s go to thecombine.app and log in.
When you click on a project, the semantic domain tree appears.
Selecting your domain is the first step in Data Entry.
If you are doing a different project task (for example, in data cleanup or project settings), you can get back here by clicking the “Data Entry” button in the top bar.
There is another tutorial video about navigating the semantic domain tree or changing its language.
For this video, let’s select domain “2: Person” and start gathering words!
There are 4 things that can be included in a new word.
First, the vernacular form of the word in the project’s vernacular language.
Second, a gloss for the word in the project’s primary analysis language.
(You can change a project’s analysis language in the project settings.)
Third, a note about the word.
Fourth, audio recordings of the word’s pronunciation.
After adding the content of the new word, press the Enter key.
See how the word we just entered appears in the table?
If we hover our cursor over the note icon for that word, the text of the note appears.
Let’s add another word!
Now the vernacular form is required for a new entry, but the gloss, note, and audio are optional.
At any time, you can make changes to the words you have entered.
Let’s add a gloss and a note to the second entry.
Let’s change the vernacular form and delete the note on the first entry.
What can we do with the audio recordings?
If we hover our cursor over the play button (the green triangle icon), text appears describing what options are available.
Click on the play button to listen.
Hold the shift-key and click to delete it.
A dialog box will appear to confirm whether you want to delete the recording.
(If you are using a touch-screen, you can tap on the play button to play, or press and hold the button to bring up a menu.)
When you are done entering words, click the Exit button to return to the semantic domain tree.
(Don’t worry—the words you entered are already saved even if you close the window without clicking the exit button.)
If we select the same domain to enter more words, see how the words we previously entered in this domain are listed in a panel on the side?
If you are working in a narrow web browser window, the panel of previously entered words will not automatically appear.
You can bring it up by pressing the sideways carat icon at the bottom of the data entry box.
Let’s enter more words!
If you want to delete one of the words you added, click on the delete icon at the end of its row.
Warning: this will permanently remove the word and all its content!
If you want to start over on a word you are adding, click on the delete icon here on the bottom row.
It will reset the vernacular field, the gloss field, the note, and the audio recordings for the new word.
That covers how to gather words with Data Entry in The Combine!
When gathering words by semantic domain, often the same word will be added to multiple domains, resulting in lots of duplicate words.
The Combine has ways to help avoid or manage that issue.
In the next video, we will look at entering multiple words with the same vernacular form.
Have a wonderful day!
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
Let’s see how to move lexical data from a project in The Combine to a project in FieldWorks.
To begin, log in at thecombine.app and open your project.
Go to project settings by clicking on the gear icon in the top bar.
Click the “Import/Export” tab.
In the “Export Project” section, click the “Export” button.
While your export is loading, the “Export” button will be disabled and have a spinning green circle.
There is also a loading icon with circling arrows in the top bar to indicate that the export is in progress.
If your project has hundreds of audio recordings, it may take a few minutes to prepare the export.
You can navigate to other pages in The Combine without interrupting the export, but do not close the window or log out!
When the export completes, it will be automatically downloaded as a ZIP file to your web browser’s default Downloads location.
To see where the export was downloaded, click on the Downloads icon in the browser, move your cursor to the most recent download, and click on the “Show in Folder” folder icon.
Now we can import that downloaded ZIP file into FieldWorks.
Open FieldWorks Language Explorer and open the project you want to import your data into.
If you are creating a new FLEx project, specify the same vernacular language as your project in The Combine.
When the project is open, click on the “File” menu, move your cursor to “Import…” near the bottom, and click on “Lexicon from The Combine…”.
In the “Import/Merge from The Combine” dialog that appears, click the “Browse…” button.
Another dialog appears with a file explorer.
Navigate to the downloads folder where the exported ZIP file is located. Select the ZIP file and click the “Open” button.
Back at the “Import/Merge” dialog, click the “OK” button.
When FieldWorks finishes the import, a summary page opens in your web browser.
Congratulations, you’ve imported your data from The Combine into FieldWorks!
If you need to update the vernacular or analysis writing systems of your FLEx project, the options are available at the bottom of the “Format” menu.
I hope this video helps you use your data from The Combine.
Have a wonderful day!
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
This is the second video on the Merge Duplicates tool.
In the previous video, we looked at the difference between the “Save & Continue” button and the “Defer” button, and we introduced adding a flag to words.
This video gets to the heart of the merge tool: moving, deleting, and combining senses!
Our first set of potential duplicates is a pair of words with vernacular form “fly”.
The first word has 3 senses: “move through the air” and “go through the air” are definitely the same word.
However, the third sense with gloss “winged insect” is a different concept. It’s a better fit with the second word, which has gloss “housefly”.
To move a sense from one word to another, click on the sense and drag it to the other word.
Voila! Now the correct senses are together.
Wait a second… the two glosses of the first word are not different senses. They are the exact same idea expressed redundantly.
To delete the unnecessary sense, click-and-drag it over the delete icon in the bottom corner.
When the tile turns red, release and it disappears.
When you’re satisfied with your changes, click “Save & Continue”.
In the next set of potential duplicates, we have two words with vernacular form “fine”.
Now the senses we see in the first word are in fact two different senses of the same word, so we can leave it alone.
In the second word, we see one sense with gloss “abcdefg” and other with “fee; monetary penalty”.
These two aren’t related to the first word or to each other. Let’s create a new word with the final sense.
To create a new word, click-and-drag a sense into the empty column, and release.
Voila! Now we have three words.
Wait a second… the sense “abcdefg” in the second word is nonsense.
To delete it, click-and-drag it to the delete icon in the bottom corner.
When you delete the only sense in a word, the whole word is deleted.
See how the column disappeared and we are back to two words.
Great! Click “Save & Continue” to save that work.
In this third set of potential duplicates are two words with vernacular form “toe”.
The sense of the first word has gloss “leg digit” and semantic domain 2.1.3.2.
The sense of the second word has gloss “foot digit” and semantic domain 2.1.3.3.
These are the same sense of the same word.
To combine them, click-and-drag one sense over the other. When both senses are green, release.
Voila! The senses are combined into a single sense with both semantic domains.
Note that a sidebar opened up to show the senses that are being combined.
You can close and open the sidebar by clicking the sideways caret icon.
Now the gloss is “leg digit; foot digit”.
When senses are combined, all the semantic domains are preserved and all glosses of the same language are combined.
To change which gloss comes first, click-and-drag the tiles within the sidebar to reorder them.
If you decide to keep both glosses as separate senses, click-and-drag a sense out of the sidebar and back to the word column.
Now click the “Save & Continue” button to save this merge.
I hope this video helps you to clean up the data you’ve collected in The Combine!
If you want to use the Merge tool on data that was exported from FieldWorks and imported into The Combine, please check out the third Merge Duplicates video.
Have a wonderful day!
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
In the first two videos about the Merge Duplicates tool, we covered all the basics.
If you are cleaning up data gathered with The Combine, the first two videos are all you need.
This video covers using the Merge Duplicates tool with data that was imported into The Combine.
Lexical data from FieldWorks can have information that is not supported in The Combine.
However, The Combine is designed to prevent accidental deletion of that information.
In this first set of potential duplicates, note that the top bar is yellow.
That indicates that these three imported entries have information that’s not visible in The Combine.
Such information could include (for example) annotations, etymologies, or variants.
Removing the final sense of a word in the Merge Duplicates tool results in that word being deleted.
Therefore, a lone sense on a protected word cannot be moved.
If we look at this second word, it has two senses.
It is a protected word, but we can move one of the senses without deleting the word.
So let’s move the sense with gloss “correct” to be a second sense of the first word.
Now that the second word only has one sense, that sense cannot be moved.
The third word is a duplicate of the first, but it cannot be deleted. So instead, we can add a flag.
Now click the “Save & Continue” button to save our work.
In this next set of potential duplicates, the tops of the words aren’t yellow but one of the senses is yellow—it is protected.
A sense of an imported word can be protected if it has sense-specific information that isn’t supported in The Combine.
Such information could include (for example) illustrations, reversals, or subsenses.
Protected senses can be moved.
However, protected senses cannot be deleted.
A protected sense also cannot be dropped into another sense.
If you want to merge two senses and one of them is protected, click-and-drag the other sense and drop it into the protected sense.
Merged senses can generally be reordered in the sidebar, but if the top sense is protected, you cannot move another sense above it.
Now click the “Save & Continue” button to save your work.
I hope this video helps you clean up lexical data imported into The Combine.
Have a wonderful day!
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
Let’s see how to review all the words in your project on The Combine.
Click the “Data Cleanup” button in the top bar, then select “Review Entries”.
This tool allows you to filter and sort all your word entries.
You can also edit them here, but we look at that in another video.
Each row in the table has one entry with all its content.
The first column has the vernacular form of the entries.
Another column has the number of senses in the entry.
Another column has the entry’s glosses in the project analysis languages.
Note that the glosses from different senses are separated by a vertical line.
Another column has the semantic domains from all of the entries senses, sorted numerically.
Another column has the pronunciation audio recordings.
You can click on a green triangle to play the audio.
If the audio recording has a speaker selected, hover your cursor over the green triangle to see the speaker name.
Another column is for any note attached to the entry.
And another column shows if the entry was flagged (which is usually done in the Merge Duplicates tool).
Hover your cursor over the flag icon to see any text that was included in the flag.
Note that flags are only used within The Combine and will not export with your data.
There are two other columns—for definitions and for part of speech—that will only be available if the project has imported data with that information.
Other lexical info on imported data—including reversals, annotations, and morph types—are not viewable within The Combine.
But don’t worry, that information won’t be lost when you move your data back to FieldWorks!
If you want to change the order of the columns or hide any columns, click on the icon in the top corner with three vertical bars.
Click on the toggle next to a column name to hide or show that column.
Click-and-drag the two horizontal lines next to a toggle to change the order of the columns.
Note that the Vernacular column cannot be hidden or moved. It will always be visible as the first column.
And there are three buttons at the top of this menu, one to hide all columns, one to reset the order of the columns, and one to show all columns.
In the bottom corner, you can change the number of rows to show per page—the options are 10 entries per page, 100 entries per page, or all entries on a single page.
There are also buttons to go to the next page, the last page, the previous page, or the first page.
At the top of each column are several controls.
Click on the arrow icon to sort by that column.
You can sort by vernacular, alphabetically or reverse alphabetically.
You can sort by the number of senses, increasing or decreasing.
You can sort by the gloss text alphabetically or reverse alphabetically.
You can sort by lowest semantic domain number, increasing or decreasing.
You can sort by number of audio recordings, increasing or decreasing.
You can sort by note text, alphabetically or reverse alphabetically.
You can sort by whether or not the entry is flagged, and the flagged entries are sorted by the text of the flag.
The funnel icon at the top of each column can be used to add a filter.
In the vernacular, glosses, note, and flag columns, type text into that column’s filter and only entries containing the typed text in that column will be shown.
You can type a number into the filter of the number of senses column and entries with exactly that many senses will be shown.
Likewise, a number filter in the pronunciations column will only show entries with exactly that many audio recordings.
If you type a speaker name in the filter for the pronunciations column, then you can see all words with an audio recording by that speaker.
The filter on the semantic domains column uses domain ids.
Type “1.2” to show all entries that have a sense in domain 1.2.
To include a domain and all its subdomains, add a period at the end of your filter.
For example, the filter “2.5.” shows entries in domain 2.5 as well as domain 2.5.2, domain 2.5.1.1, domain 2.5.2, etc.
Finally, in the Domains, Pronunciations, and Flag column, you can type a space for the filter to show all entries that have something, anything in that column.
You can only sort by one column at a time, but you can have an active filter in as many columns as you want.
I hope this video helps you review your lexical data in The Combine.
In another video, we will talk about editing entries in this Review Entries tool.
Have a wonderful day!
Loading
Loading