Skip to content

Latest commit

 

History

History
22 lines (17 loc) · 1.74 KB

README.md

File metadata and controls

22 lines (17 loc) · 1.74 KB

Subtitle Alignment

About

Applying state-of-the-art sentence alignment tools to subtitle extraction and alignment, achieving a substantial improvement in subtitle alignment quality. Leveraging sentence embeddings, dynamic programming, cosine similarity, and partitioning we attained F1 scores exceeding 93% and estimate an overall improvement of 31% based on other subtitle alignment techniques.

Gold Standard Subtitle Alignments

There are gold alignments for 5 titles in the gold directory. The alignments can be found within each subdirectory with names like eng-spa-gold.txt and eng-ger-gold.txt. The subtitles themselves are in the sub-sub dirs eng, spa, ger, etc.

SubAlign Annotation tool

There is a curses and python implementation of an annotation tool. After you run scripts/run_vecalign.py on the title you want to annotate, it will load the alignments generated by that script into a vim-like editor where you can approve or edit the alignments. This tool supports the following operations:

Key Action
d Delete current alignment.
e Edit current alignment. Will open the current alignment in Vim.
u Union (merge) current subtitle with the following subtitle
s Split alignment into two. This will actually duplicate the current alignment allowing you to edit it and the subsequent (duplicate). Ideal for splitting alignments when multiple sentencese have been merged together.
w Write (save) all alignments including those that have not yet been reviewed.
n Move to Next alignment.
p Move to Previous alignment.
Captura de pantalla 2024-11-10 a la(s) 11 34 29