Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tutorial Videos] Add transcripts and subtitles script #3251

Merged
merged 18 commits into from
Aug 12, 2024
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,10 @@ src/resources/dictionaries/*.txt
deploy/scripts/semantic_domains/json/*.json
database/semantic_domains/*

# Intermediate and output files for tutorial video subtitling
*.srt
*.mp4

# Combine installer
installer/*.run
installer/makeself-*
Expand Down
2 changes: 1 addition & 1 deletion Backend/Helper/GrammaticalCategory.cs
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ public bool Matches(string gramCat)
}
}

// The following patterns cover all grammatical categories in Fieldworks for:
// The following patterns cover all grammatical categories in FieldWorks for:
// English (en), Spanish (es), French (fr), Portuguese (pt), Russian (ru), Chinese (zh)
// Omissions due to conflicting abbreviations:
// Spanish "indf" for Indefinite Pronoun (conflicts with abbrev. for Indefinite article)
Expand Down
11 changes: 11 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@ A rapid word collection tool. See the [User Guide](https://sillsdev.github.io/Th
7. [Add or Update Dictionary Files](#add-or-update-dictionary-files)
8. [Cleanup Local Repository](#cleanup-local-repository)
9. [Generate Installer Script for The Combine](#generate-installer-script-for-the-combine-linux-only)
10. [Generate Tutorial Video Subtitles](#generate-tutorial-video-subtitles)
3. [Setup Local Kubernetes Cluster](#setup-local-kubernetes-cluster)
1. [Install Rancher Desktop](#install-rancher-desktop)
2. [Install Docker Desktop](#install-docker-desktop)
Expand Down Expand Up @@ -544,6 +545,16 @@ To update the PDF copy of the installer README.md file, run the following from t
pandoc --pdf-engine=weasyprint README.md -o README.pdf
```

## Generate Tutorial Video Subtitles

Tutorial video transcripts are housed in `docs/tutorial_subtitles`, together with timestamps aligning transcripts with
the corresponding videos and any transcript translations downloaded from Crowdin. To generate subtitle files (and
optionally attach them to a video file), run from within a Python virtual environment:

```bash
python scripts/subtitle_tutorial_video.py -s <subtitles_subfolder_name> [-i <input_video_path> -o <output_video_path] [-v]
```

## Setup Local Kubernetes Cluster

This section describes how to create a local Kubernetes cluster using either _Rancher Desktop_ or _Docker Desktop_.
Expand Down
18 changes: 18 additions & 0 deletions docs/tutorial_subtitles/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
This folder contains the transcripts for tutorial videos and timestamps for generating video subtitles.

The `_in_progress_transcripts_` subfolder holds transcripts that are awaiting a video recording.

Each other `<vid>` subfolder holds one `times.txt` and at least one `<vid>.<lang>.txt` where `<lang>` is the 3-character
code for the language of the transcript (`eng` as well as any other languages into which the transcripts has been
translated). All these files should have the same number of lines:

- `<vid>.eng.txt`: each line is one sentence;
- `times.txt`: each line has the ending time of the corresponding English sentence in the tutorial video (format: `m:s`,
where `s` can have up to 3 digits after the decimal);
- `<vid>.<lang>.txt` for `<lang>` other than `eng`: each line has the translation for the corresponding English sentence
(and if one sentence was translated into multiple sentences, the translation should still be just one line).

To generate the subtitles and attach them to the video, use `scripts/subtitle_tutorial_video.py`.

DON'T EDIT THE `.eng.txt` TRANSCRIPT IN A FOLDER WITH A `times.txt` FILE! It matches an existing video. If changes are
needed to the transcript for an updated video, put a copy into the `_in_progress_transcripts_` folder and edit it there.
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
The Combine is designed for Rapid Word Collection, a method of gathering words by semantic domain.
In this video, we will see how to do Data Entry in The Combine to collect words.
Let’s go to thecombine.app and log in.
When you click on a project, the semantic domain tree appears.
Selecting your domain is the first step in Data Entry.
If you are doing something else in the project (such as data cleanup or project settings), you can get back here by clicking the “Data Entry” button in the top bar.
There is another tutorial video about navigating the semantic domain tree or changing its language.
For this video, let’s select domain “2: Person” and start gathering words!
There are 4 things that can be included in a new word.
First, the vernacular form of the word in the project’s vernacular language.
Second, a gloss for the word in the project’s primary analysis language.
(You can change a project’s analysis language in the project settings.)
Third, a note about the word.
Fourth, audio recordings of the word’s pronunciation.
After adding the content of the new word, press the Enter key.
See how the word we just entered appears in the table?
If we hover our cursor over the note icon for that word, the text of the note appears.
Let’s add another word!
Now the vernacular form is required for a new entry, but the gloss, note, and audio are optional.
At any time, you can make changes to the words you have entered.
Let’s add a gloss and a note to the second entry.
Let’s change the vernacular form and delete the note on the first entry.
What can we do with the audio recordings?
If we hover our cursor over the play button (the green triangle icon), text appears describing what options are available.
Click on the play button to listen.
Hold the shift-key and click to delete it.
A dialog box will appear to confirm whether you want to delete the recording.
(If you are using a touch-screen, you can tap on the play button to play, or press and hold the button to bring up a menu.)
When you are done entering words, click the Exit button to return to the semantic domain tree.
(Don’t worry—the words you entered are already saved even if you close the window without clicking the exit button.)
If we select the same domain to enter more words, see how the words we previously entered in this domain are listed in a panel on the side?
If you are working in a narrow web browser window, the panel of previously entered words will not automatically appear.
You can bring it up by pressing the sideways carat icon at the bottom of the data entry box.
Let’s enter more words!
If you want to delete one of the words you added, click on the delete icon at the end of its row.
Warning: this will permanently remove the word and all its content!
If you want to start over on a word you are adding, click on the delete icon here on the bottom row.
It will reset the vernacular field, the gloss field, the note, and the audio recordings for the new word.
That covers how to gather words with Data Entry in The Combine!
When gathering words by semantic domain, often the same word will be added to multiple domains, resulting in lots of duplicate words.
The Combine has ways to help avoid or manage that issue.
In the next video, we will look at entering multiple words with the same vernacular form.
Have a wonderful day!
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
Let’s see how to move lexical data from a project in FieldWorks to a project in The Combine.
To begin, open your project in FieldWorks.
Here I’m using an example project with words from the Naskapi language.
With the desired FLEx project open, click on the “File” menu, then select “Export…” near the bottom of the menu.
In the “Export” dialog that appears, click on the “Full Lexicon” “LIFT 0.13 XML” option, then click the “Export…” button.
Another dialog appears for you to select where the exported files will be saved.
You’ll need to create a new folder for the files.
I’m going to Desktop and clicking the “Make New Folder” button to create a Naskapi folder.
Select the new folder you just created and click the “OK” button.
In the File Explorer, go to the folder that contains this new folder.
Right click on the folder you just created for the export and select “Compress to ZIP file”.
See the ZIP file that was created?
This is what we are going to import into The Combine.
Now we open a web browser and go to thecombine.app.
Once we are logged in, we see two sections: “Select Project” and “Create Project”.
Under “Select Project” we can open a previously created project.
It IS possible to import the lexical data into an existing project, but we will look at that later.
Under “Create Project”, let’s create a new project using the export from FLEx.
First I’ll type a name for the project, in my case: Naskapi.
Notice there are fields below where we can specify the Vernacular Language and an Analysis Language for the project.
This is not necessary when we are importing data because project languages will automatically be gathered from the data.
To upload existing data, click the “Browse” button.
This brings up a file explorer dialog for you to select the LIFT data that was exported from FLEx.
I’m navigating to Desktop, where I exported my data, selecting the Naskapi ZIP file, and clicking the “Open” button.
See that The Combine has the text “File selected: Naskapi.zip”.
Great!
Under “Vernacular Language” there is now a drop-down menu.
Use it to select which of the languages in the data is to be the Vernacular Language.
The Combine only supports data entry for one vernacular language.
The vernacular language cannot be changed after the project has been created.
If you need to gather or organize lexical data for a different language, simply create another project.
Note that you cannot specify an analysis language.
This is because all analysis languages present in the ZIP file are automatically added, and you can add and remove analysis languages in the project at any time.
All that’s left is to click the “Create Project” button!
Once the project is created, you are taken to the project settings page.
To access this in the future you can click on the gear icon in the top bar.
Note that here in the “Languages” tab, we can see the Vernacular Language as well as review and change the Analysis Languages.
Let’s click on the “Import/Export” tab.
This is where you can import lexical data into an existing project.
That option is disabled now because we have already imported data into this project.
Only one import is allowed for a project in The Combine.
This is also where we can export data from The Combine to import into FieldWorks, but that is a topic for another video.
I hope this video helps you get started with The Combine.
Have a wonderful day!
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
Let’s see how to move lexical data from a project in The Combine to a project in FieldWorks.
To begin, log in at thecombine.app and open your project.
Go to project settings by clicking on the gear icon in the top bar.
Click the “Import/Export” tab.
In the “Export Project” section, click the “Export” button.
While your export is loading, the “Export” button will be disabled and have a spinning green circle.
There is also a loading icon with circling arrows in the top bar to indicate that the export is in progress.
If your project has hundreds of audio recordings, it may take a few minutes to prepare the export.
You can navigate to other pages in The Combine without interrupting the export, but do not close the window or log out!
When the export completes, it will be automatically downloaded as a ZIP file to your web browser’s default Downloads location.
To see where the export was downloaded, click on the Downloads icon in the browser, move your cursor to the most recent download, and click on the “Show in Folder” folder icon.
Now we can import that downloaded ZIP file into FieldWorks.
Open FieldWorks Language Explorer and open the project you want to import your data into.
If you are creating a new FLEx project, specify the same vernacular language as your project in The Combine.
When the project is open, click on the “File” menu, move your cursor to “Import…” near the bottom, and click on “Lexicon from The Combine…”.
In the “Import/Merge from The Combine” dialog that appears, click the “Browse…” button.
Another dialog appears with a file explorer.
Navigate to the downloads folder where the exported ZIP file is located. Select the ZIP file and click the “Open” button.
Back at the “Import/Merge” dialog, click the “OK” button.
When FieldWorks finishes the import, a summary page opens in your web browser.
Congratulations, you’ve imported your data from The Combine into FieldWorks!
If you need to update the vernacular or analysis writing systems of your FLEx project, the options are available at the bottom of the “Format” menu.
I hope this video helps you use your data from The Combine.
Have a wonderful day!
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
This is the second video on the Merge Duplicates tool.
In the previous video, we looked at the difference between the “Save & Continue” button and the “Defer” button, and we introduced adding a flag to words.
This video gets to the heart of the merge tool: moving, deleting, and combining senses!
Our first set of potential duplicates is a pair of words with vernacular form “fly”.
The first word has 3 senses: “move through the air” and “go through the air” are definitely the same word.
However, the third sense with gloss “winged insect” is a different concept. It’s a better fit with the second word, which has gloss “housefly”.
To move a sense from one word to another, click on the sense and drag it to the other word.
Voila! Now the correct senses are together.
Wait a second… the two glosses of the first word are not different senses. They are the exact same idea expressed redundantly.
To delete the unnecessary sense, click-and-drag it over the garbage icon in the bottom corner.
When the tile turns red, release and it disappears.
When you’re satisfied with your changes, click “Save & Continue”.
In the next set of potential duplicates, we have two words with vernacular form “fine”.
Now the senses we see in the first word are in fact two different senses of the same word, so we can leave it alone.
In the second word, we see one sense with gloss “abcdefg” and other with “fee; monetary penalty”.
These two aren’t related to the first word or to each other. Let’s create a new word with the final sense.
To create a new word, click-and-drag a sense into the empty column, and release.
Voila! Now we have three words.
Wait a second… the sense “abcdefg” in the second word is nonsense.
To delete it, click-and-drag it to the garbage icon.
When you delete the only sense in a word, the whole word is deleted.
See how the column disappeared and we are back to two words.
Great! Click “Save & Continue” to save that work.
In this third set of potential duplicates are two words with vernacular form “toe”.
The sense of the first word has gloss “leg digit” and semantic domain 2.1.3.2.
The sense of the second word has gloss “foot digit” and semantic domain 2.1.3.3.
These are the same sense of the same word.
To combine them, click-and-drag one sense over the other. When both senses are green, release.
Voila! The senses are combined into a single sense with both semantic domains.
Note that a sidebar opened up to show the senses that are being combined.
You can close and open the sidebar by clicking the sideways caret icon.
Now the gloss is “leg digit; foot digit”.
When senses are combined, all the semantic domains are preserved and all glosses of the same language are combined.
To change which gloss comes first, click-and-drag the tiles within the sidebar to reorder them.
If you decide to keep both glosses as separate senses, click-and-drag a sense out of the sidebar and back to the word column.
Now click the “Save & Continue” button to save this merge.
I hope this video helps you to clean up the data you’ve collected in The Combine!
If you want to use the Merge tool on data that was exported from FieldWorks and imported into The Combine, please check out the third Merge Duplicates video.
Have a wonderful day!
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
In the first two videos about the Merge Duplicates tool, we covered all the basics.
If you are cleaning up data gathered with The Combine, the first two videos are all you need.
This video covers using the Merge Duplicates tool with data that was imported into The Combine.
Lexical data from FieldWorks can have information that is not supported in The Combine.
However, The Combine is designed to prevent accidental deletion of that information.
In this first set of potential duplicates, note that the top bar is yellow.
That indicates that these three imported entries have information that’s not visible in The Combine.
Such information could include (for example) annotations, etymologies, or variants.
Removing the final sense of a word in the Merge Duplicates tool results in that word being deleted.
Therefore, a lone sense on a protected word cannot be moved.
If we look at this second word, it has two senses.
It is a protected word, but we can move one of the senses without deleting the word.
So let’s move the sense with gloss “correct” to be a second sense of the first word.
Now that the second word only has one sense, that sense cannot be moved.
The third word is a duplicate of the first, but it cannot be deleted. So instead, we can add a flag.
Now click the “Save & Continue” button to save our work.
In this next set of potential duplicates, the tops of the words aren’t yellow but one of the senses is yellow—it is protected.
A sense of an imported word can be protected if it has sense-specific information that isn’t supported in The Combine.
Such information could include (for example) illustrations, reversals, or subsenses.
Protected senses can be moved.
However, protected senses cannot be deleted.
A protected sense also cannot be dropped into another sense.
If you want to merge two senses and one of them is protected, click-and-drag the other sense and drop it into the protected sense.
Merged senses can generally be reordered in the sidebar, but if the top sense is protected, you cannot move another sense above it.
Now click the “Save & Continue” button to save your work.
I hope this video helps you clean up lexical data imported into The Combine.
Have a wonderful day!
Loading
Loading