-
Notifications
You must be signed in to change notification settings - Fork 1.1k
使用「清理」替换「排序」逻辑,优先使用中文翻译 #724
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Replaced sorting of TTML translations with cleaning logic. Added a new method to clean unnecessary translations from TTML content.
Summary of ChangesHello @ranhengzhang, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! 此拉取请求重构了 TTML 歌词翻译的处理机制,将原有的基于排序的逻辑替换为更高效和可扩展的清洗逻辑。核心变更是删除了 Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
本次拉取请求将 sortTTMLTranslations 方法替换为 cleanTTMLTranslations,旨在提高 TTML 翻译处理的速度和可扩展性。新的 cleanTTMLTranslations 方法通过正则表达式识别语言标签并根据预设的中文优先级进行清洗。整体改动方向良好,但有几处细节可以进一步优化,以提高代码的健壮性和未来的可维护性。
src/core/player/LyricManager.ts
Outdated
| const lang_filter = (langs: string[]) : (string | null) => { | ||
| if (langs.length <= 1) return null; | ||
|
|
||
| // 重新排列翻译元素 | ||
| translationElements.forEach((translationElement) => { | ||
| element.appendChild(translationElement); // 移动到末尾以实现排序 | ||
| }); | ||
| }); | ||
| if (langs.includes('zh-Hans')) return 'zh-Hans'; | ||
| if (langs.includes('zh-CN')) return 'zh-CN'; | ||
| if (langs.includes('zh-Hant')) return 'zh-Hant'; | ||
|
|
||
| const major = langs.find(key => key.startsWith('zh')); | ||
| if (major) return major; | ||
|
|
||
| return langs[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
当前 lang_filter 函数的语言匹配逻辑相对简单,它直接检查 zh-Hans、zh-CN、zh-Hant,然后是所有以 zh 开头的语言,最后返回列表中的第一个语言。这与您在 PR 描述的「未来」部分中提到的 findBestMatchLanguage 函数的更复杂、更健壮的匹配逻辑(例如,通过短横线分隔符向下匹配 en-US 到 en)不符。
建议将 lang_filter 的逻辑更新为 findBestMatchLanguage 中描述的算法,以更好地支持未来的国际化需求,并提高语言匹配的准确性。这能确保在 i18n 功能扩展时,无需再次修改此核心逻辑。
const lang_filter = (context_lang: string[]) : (string | null) => {
if (!context_lang || context_lang.length === 0) {
return null;
}
let major_lang = 'zh-Hans'; // 默认优先简体中文
if (context_lang.includes(major_lang)) return major_lang;
major_lang = 'zh-CN';
if (context_lang.includes(major_lang)) return major_lang;
major_lang = 'zh-Hant';
if (context_lang.includes(major_lang)) return major_lang;
// 按照 PR 描述中的 findBestMatchLanguage 逻辑进行更通用的匹配
let currentLang = 'zh'; // 假设默认中文为主要语言,可以根据实际用户设置动态传入
while (true) {
if (context_lang.includes(currentLang)) {
return currentLang;
}
const prefixMatch = context_lang.find(lang => lang.startsWith(currentLang));
if (prefixMatch) {
return prefixMatch;
}
const lastHyphenIndex = currentLang.lastIndexOf('-');
if (lastHyphenIndex !== -1) {
currentLang = currentLang.substring(0, lastHyphenIndex);
} else {
break;
}
}
// 如果上述匹配都失败,则返回列表中的第一个语言
return context_lang[0];
}
src/core/player/LyricManager.ts
Outdated
| if (ttml_text.indexOf("iTunesMetadata") !== -1) { | ||
| const translationRegex = /<translation[^>]+xml:lang="([^"]+)"[^>]*>[\s\S]*?<\/translation>/g; | ||
|
|
||
| return ttml_text.replace(translationRegex, replacer); | ||
| } else { | ||
| const spanRegex = /<span[^>]+xml:lang="([^"]+)"[^>]*>[\s\S]*?<\/span>/g; | ||
|
|
||
| return ttml_text.replace(spanRegex, replacer); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
在 ttml_cleaner 函数中,您使用 ttml_text.indexOf("iTunesMetadata") !== -1 来判断应该使用 translationRegex 还是 spanRegex。这种基于字符串 indexOf 的启发式判断可能不够健壮。
如果 TTML 文件中包含 iTunesMetadata 但没有 translation 标签,或者有 translation 标签但没有 iTunesMetadata,这可能会导致错误的正则表达式被应用,从而影响清洗的准确性。建议考虑更明确的方式来区分这两种情况,例如检查是否存在 translation 标签本身,或者根据 TTML 文件的结构约定来选择。
if (/<translation[^>]+xml:lang="([^" ]+)"[^>]*>/.test(ttml_text)) { // 检查是否存在 <translation> 标签
const translationRegex = /<translation[^>]+xml:lang="([^" ]+)"[^>]*>[\s\S]*?<\/translation>/g;
return ttml_text.replace(translationRegex, replacer);
} else {
const spanRegex = /<span[^>]+xml:lang="([^" ]+)"[^>]*>[\s\S]*?<\/span>/g;
return ttml_text.replace(spanRegex, replacer);
}
拒绝 Ai 的建议
const translationRegex = /<translation[^>]+xml:lang="([^"]+)"[^>]*>[\s\S]*?<\/translation>/g;
const spanRegex = /<span[^>]+xml:lang="([^" ]+)"[^>]*>[\s\S]*?<\/span>/g;
return ttml_text.replace(translationRegex, replacer).replace(spanRegex, replacer); |
Consolidate regex replacements for translation and span tags.
第二点更改已完成: - if (ttml_text.indexOf("iTunesMetadata") !== -1) {
const translationRegex = /<translation[^>]+xml:lang="([^"]+)"[^>]*>[\s\S]*?<\/translation>/g;
- return ttml_text.replace(translationRegex, replacer);
- } else {
const spanRegex = /<span[^>]+xml:lang="([^"]+)"[^>]*>[\s\S]*?<\/span>/g;
- return ttml_text.replace(spanRegex, replacer);
+ return ttml_text.replace(translationRegex, replacer).replace(spanRegex, replacer);
- } |
- [Refactor] Enhance the logic for cleaning unnecessary translations in TTML content. - [Fix] Adjust parameter handling and improve language matching functionality.

改动
sortTTMLTranslations方法cleanTTMLTranslations方法进行替代效果
cleanTTMLTranslations方法及其中的lang_filter函数即可未来
如果后续扩展了 i18n 相关功能,需要根据用户界面更换翻译时,同样需要修改
cleanTTMLTranslations方法及其中的lang_filter函数,重点为lang_filter中的优选算法,需要使用短横线分隔语言代码向下选择,例如:文件中有en-GBen-CAen,界面语言为en或en-US的用户应该匹配到en,而文件中没有en时匹配到en-GB或en-CA,下面是一个实例函数(应该能直接用)