Skip to content

Commit 82ae8f6

Browse files
Stardust-minuspre-commit-ci[bot]jiangyuxiaoxiaoAkito-UzukiPOedoSoldier
authored
Dev no emo (yl4579#123)
* Create emo_gen.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update server.py, fix bugs in func get_text() and infer(). (yl4579#52) * Extract get_text() and infer() from webui.py. (yl4579#53) * Extract get_text() and infer() from webui.py. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * add emo emb * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * init emo gen * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * init emo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * init emo * Delete bert/bert-base-japanese-v3 directory * Create .gitkeep * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Create add_punc.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix bug in bert_gen.py (yl4579#54) * Update README.md * fix bug in models.py (yl4579#56) * 更新 models.py * Fix japanese cleaner (yl4579#61) * 初步,睡觉明天继续写( * 好好好放错分支了,熬夜是大忌 * [pre-commit.ci] pre-commit autoupdate (yl4579#55) * [pre-commit.ci] pre-commit autoupdate updates: - [github.com/pre-commit/pre-commit-hooks: v4.4.0 → v4.5.0](pre-commit/pre-commit-hooks@v4.4.0...v4.5.0) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Create tokenizer_config.json * update preprocess_text.py:过滤一个音频匹配多个文本的情况 (yl4579#57) * update preprocess_text.py:过滤音频不存在的情况 (yl4579#58) * 修复日语cleaner和bert * better * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Stardust·减 <star_dust_chen@foxmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Sora <atri@suzakuintsubaki.com> * Apply Code Formatter Change * Add config.yml for global configuration. (yl4579#62) * Add config.yml for global configuration. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug in webui.py. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Rename config.yml to default_config.yml. Add ./config.yml to gitignore. * Add config.py to parse config.yml * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update webui.py (yl4579#65) * Update webui.py: 1. Add auto translation from Chinese to Japanese. 2. Start to use config.py in webui.py to set config instead of using the command line. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix (yl4579#68) * 加上ー * fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update infer.py and webui.py. Supports loading and inference models of 1.1.1 version. (yl4579#66) * Update infer.py and webui.py. Supports loading and inference models of 1.1.1 version. * Update config.json * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix bug in translate.py (yl4579#69) * Supports loading and inference models of 1.1、1.0.1、1.0 version. (yl4579#70) * Supports loading and inference models of 1.1、1.0.1、1.0 version. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Delete useless file in OldVersion --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update japanese.py (yl4579#71) Handling JA long pronunciations * 使用配置文件配置bert_gen.py, preprocess_text.py, resample.py (yl4579#72) * Update bert_gen.py, preprocess_text.py, resample.py. Support using config.yml in these files. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update bert_gen.py * Update bert_gen.py, fix bug. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Delete bert/bert-base-japanese-v3 directory * Create config.json * Create tokenizer_config.json * Create vocab.txt * Update server.py. 支持多版本多模型 (yl4579#76) * Update server.py. 支持多版本多模型 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Dev webui (yl4579#77) * 申请pr (yl4579#75) * 2023/10/11 update 界面优化 * Update webui.py 翻译英文页面为中文 * Update train_ms.py 单卡训练 * 加入图片 * Update extern_subprocess.py * Update asr_transcript.py * Update asr_transcript.py * Update asr_transcript.py * Update extern_subprocess.py * Update asr_transcript.py * Update asr_transcript.py * Update asr_transcript.py * Update all_process.py * Update extern_subprocess.py * Update all_process.py * Update all_process.py * Update asr_transcript.py * Update extern_subprocess.py * Update webui.py * Create re_matching.py * Update webui.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update all_process.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update all_process.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update all_process.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update asr_transcript.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Pack 'update' functions into a module * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update all_process.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update asr_transcript.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update extern_subprocess.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update all_process.py * Update asr_transcript.py * Update webui.py * Add files via upload * Update extern_subprocess.py * Update all_process.py * Update asr_transcript.py * Update bert_gen.py * Update extern_subprocess.py * Update preprocess_text.py * Update re_matching.py * Update resample.py * Update update_status.py * Update update_status.py * Update webui.py * Update all_process.py * Update preprocess_text.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update train_ms.py --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Stardust·减 <star_dust_chen@foxmail.com> Co-authored-by: innnky <67028263+innnky@users.noreply.github.com> * Delete all_process.py * Delete asr_transcript.py * Delete extern_subprocess.py --------- Co-authored-by: spicysama <122108331+AnyaCoder@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: innnky <67028263+innnky@users.noreply.github.com> * Create config.json * Create preprocessor_config.json * Create vocab.json * Delete emotional/wav2vec2-large-robust-12-ft-emotion-msp-dim/.gitkeep * Update emo_gen.py * Delete add_punc.py * add emotion_clustering.i * Apply Code Formatter Change * Update models.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update preprocess_text.py (yl4579#78) * Update preprocess_text.py. 检测重复以及不存在的音频 (yl4579#79) * Handle Janpanese long pronunciations (yl4579#80) * Handle Janpanese long pronunciations * Update japanese.py * Update japanese.py * Use unified phonemes for Japanese long vowel (yl4579#82) * Use an unified phoneme for Japanese long vowel `symbol.py` has not been updated to ensure compatibility with older version models. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * 增加一个按钮,点击后可以按句子切分,添加“|” (yl4579#81) * Update re_matching.py * Update webui.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix phonemer bug (yl4579#83) * Fix phonemer bug * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix long vowel handler bug (yl4579#84) * Fix long vowel handler bug * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * 加入整合包管理器的特性:长文本合成可以自定义句间段间停顿 (yl4579#85) * Update webui.py * Update re_matching.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update train_ms.py * fix' * Update cleaner.py * add en * add en * Update english.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add en * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add en * add en * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add en * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * 更新 README.md * 更新 README.md * 更新 README.md * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Change phonemer to pyopenjtalk (yl4579#86) * Change phonemer to pyopenjtalk * 修改为openjtalk便于安装 --------- Co-authored-by: Stardust·减 <star_dust_chen@foxmail.com> * 更新 english.py * Fix english_bert_mock.py. (yl4579#87) * Add punctuation execptions (yl4579#88) * Add punctuation execptions * Ellipses exceptions * remove get bert * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug in oldVersion. (yl4579#89) * Update requirements.txt * change to large * rollback requirements.txt * Feat: Enable 1.1.1 models using fix-ver infer. (yl4579#91) * Feat: Enable 1.1.1 models using fix-ver infer. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Add Japanese accent (high-low) (yl4579#90) * Add punctuation execptions * Ellipses exceptions * Add Japanese accent * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Do not replace iteration mark (yl4579#92) * Add punctuation execptions * Ellipses exceptions * Add Japanese accent * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Do not replace iteration mark --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix: fix import error in oldVersion (yl4579#93) * Refactor: reusing model loading in webui.py and server.py. (yl4579#94) * Feat: Enable using config.yml in train_ms.py (yl4579#96) * 更新 emo_gen.py * Change emo_gen.py (yl4579#97) * Fix emo_gen bugs * Add multiprocess * Fix queue (yl4579#98) * Fix emo_gen bugs * Add multiprocess * Del var * Fix queue * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix training bugs (yl4579#99) * Updatge cluster notebook * Fix train * Fix filename * Update infer.py (yl4579#100) * Update infer.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Add reference audio (yl4579#101) * Add reference audio * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update * Update * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Stardust·减 <star_dust_chen@foxmail.com> * Fix: fix 1.1.1-fix (yl4579#102) * Fix infer bug (yl4579#103) * Feat: Add server_fastapi.py. (yl4579#104) * Feat: Add server_fastapi.py. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix: Update requirements.txt. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix: requirements.txt. (yl4579#105) * Swith to deberta-v3-large (yl4579#106) * Swith to deberta-v3-large * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Feat: Update config.py. (yl4579#107) * Feat: Update config.py. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Dev fix (yl4579#108) * fix bugs when deploying * fix bugs when deploying * fix bugs when deploying * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Revert "Dev fix (yl4579#108)" (yl4579#109) This reverts commit 685e18a10498d602b1a9a26079340d11925646f0. * Dev fix (yl4579#110) * fix bugs when deploying * fix bugs when deploying * fix bugs when deploying * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix fixed bugs * fix fixed bugs * fix fixed bug 3 * fix fixed bug 4 * fix fixed bug 5 * fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Add emo vec quantizer (yl4579#111) Co-authored-by: Stardust·减 <star_dust_chen@foxmail.com> * Clean req and gitignore (yl4579#112) * Clean req and gitignore * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Switch to deberta-v2-large-japanese (yl4579#113) * Switch to deberta-v2-large-japanese * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix emo bugs (yl4579#114) * Fix english (yl4579#115) * Remove emo (yl4579#117) * Don't train codebook * Remove emo * Update * Update * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Merge dev into no-emo (yl4579#122) * [pre-commit.ci] pre-commit autoupdate (yl4579#95) * [pre-commit.ci] pre-commit autoupdate updates: - [github.com/astral-sh/ruff-pre-commit: v0.0.292 → v0.1.1](astral-sh/ruff-pre-commit@v0.0.292...v0.1.1) - [github.com/psf/black: 23.9.1 → 23.10.0](psf/black@23.9.1...23.10.0) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Don't train codebook (yl4579#116) * Update requirements.txt * Update english_bert_mock.py * Fix: server_fastapi.py (yl4579#118) * Fix: server_fastapi.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix: don't print debug logging. (yl4579#119) * Fix: don't print debug logging. * Feat: support emo_gen config * Fix config * Apply Code Formatter Change * 更新,修正bug (yl4579#121) * Feat: Update infer.py preprocess_text.py server_fastapi.py. * Fix resample.py. Maintain same directory structure in out_dir as in_dir. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update resample.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update server_fastapi.py to no-emo ver * Update config.py, no emo config --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: OedoSoldier <31711261+OedoSoldier@users.noreply.github.com> Co-authored-by: Stardust·减 <star_dust_chen@foxmail.com> Co-authored-by: Stardust-minus <Stardust-minus@users.noreply.github.com> * Update train_ms.py * Update latest version info (yl4579#124) --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: jiangyuxiaoxiao <atri@suzakuintsubaki.com> Co-authored-by: AkitoLiu <39857739+Akito-UzukiP@users.noreply.github.com> Co-authored-by: Stardust-minus <Stardust-minus@users.noreply.github.com> Co-authored-by: OedoSoldier <31711261+OedoSoldier@users.noreply.github.com> Co-authored-by: spicysama <122108331+AnyaCoder@users.noreply.github.com> Co-authored-by: innnky <67028263+innnky@users.noreply.github.com> Co-authored-by: YYuX-1145 <138500330+YYuX-1145@users.noreply.github.com>
1 parent c1ba4c7 commit 82ae8f6

File tree

102 files changed

+175570
-1058
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

102 files changed

+175570
-1058
lines changed

.gitignore

+12
Original file line numberDiff line numberDiff line change
@@ -166,3 +166,15 @@ cython_debug/
166166
filelists/*
167167
!/filelists/esd.list
168168
data/*
169+
/config.yml
170+
/Web/
171+
/emotional/*/*.bin
172+
/bert/*/*.bin
173+
/bert/*/*.h5
174+
/bert/*/*.model
175+
/bert/*/*.safetensors
176+
/bert/*/*.msgpack
177+
asr_transcript.py
178+
extract_list.py
179+
/Data
180+
Data/*

.gitmodules

Whitespace-only changes.

README.md

+3-9
Original file line numberDiff line numberDiff line change
@@ -10,16 +10,8 @@ VITS2 Backbone with bert
1010
[//]: # ()
1111
[//]: # (本仓库来源于之前朋友分享了ai峰哥的视频,本人被其中的效果惊艳,在自己尝试MassTTS以后发现fs在音质方面与vits有一定差距,并且training的pipeline比vits更复杂,因此按照其思路将bert)
1212

13-
[//]: # (与vits结合起来以获得更好的韵律。本身我们是出于兴趣玩开源项目,用爱发电,我们本无意与任何人起冲突,然而[MaxMax2016]&#40;https://github.com/MaxMax2016&#41;)
14-
15-
[//]: # (以及其organization[PlayVoice]&#40;https://github.com/PlayVoice&#41;几次三番前来碰瓷,说本项目抄袭了他们的代码,甚至上法院云云,因此在Readme中特别声明,本项目与)
16-
17-
[//]: # ([PlayVoice/vits_chinese]&#40;https://github.com/PlayVoice/vits_chinese&#41;没有任何关系,结合bert的思路方面也是完全来源于MassTTS)
18-
19-
20-
[//]: # (附:对面认为本项目抄袭了他代码的证据,诸位可以自行查看并做出判断,[bert_vits2引用的MassTTS的实际代码]&#40;https://github.com/PlayVoice/vits_chinese/tree/4781241520c6b9fdcf090fca289148719272e89f#bert_vits2%E5%BC%95%E7%94%A8%E7%9A%84masstts%E7%9A%84%E5%AE%9E%E9%99%85%E4%BB%A3%E7%A0%81&#41; )
21-
2213
## 成熟的旅行者/开拓者/舰长/博士/sensei/猎魔人/喵喵露/V应当参阅代码自己学习如何训练。
14+
2315
### 严禁将此项目用于一切违反《中华人民共和国宪法》,《中华人民共和国刑法》,《中华人民共和国治安管理处罚法》和《中华人民共和国民法典》之用途。
2416
### 严禁用于任何政治相关用途。
2517
#### Video:https://www.bilibili.com/video/BV1hp4y1K78E
@@ -30,6 +22,8 @@ VITS2 Backbone with bert
3022
+ [p0p4k/vits2_pytorch](https://github.com/p0p4k/vits2_pytorch)
3123
+ [svc-develop-team/so-vits-svc](https://github.com/svc-develop-team/so-vits-svc)
3224
+ [PaddlePaddle/PaddleSpeech](https://github.com/PaddlePaddle/PaddleSpeech)
25+
+ [emotional-vits](https://github.com/innnky/emotional-vits)
26+
+ [Bert-VITS2-en](https://github.com/xwan07017/Bert-VITS2-en)
3327
## 感谢所有贡献者作出的努力
3428
<a href="https://github.com/fishaudio/Bert-VITS2/graphs/contributors" target="_blank">
3529
<img src="https://contrib.rocks/image?repo=fishaudio/Bert-VITS2"/>
+34
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
*.7z filter=lfs diff=lfs merge=lfs -text
2+
*.arrow filter=lfs diff=lfs merge=lfs -text
3+
*.bin filter=lfs diff=lfs merge=lfs -text
4+
*.bz2 filter=lfs diff=lfs merge=lfs -text
5+
*.ckpt filter=lfs diff=lfs merge=lfs -text
6+
*.ftz filter=lfs diff=lfs merge=lfs -text
7+
*.gz filter=lfs diff=lfs merge=lfs -text
8+
*.h5 filter=lfs diff=lfs merge=lfs -text
9+
*.joblib filter=lfs diff=lfs merge=lfs -text
10+
*.lfs.* filter=lfs diff=lfs merge=lfs -text
11+
*.mlmodel filter=lfs diff=lfs merge=lfs -text
12+
*.model filter=lfs diff=lfs merge=lfs -text
13+
*.msgpack filter=lfs diff=lfs merge=lfs -text
14+
*.npy filter=lfs diff=lfs merge=lfs -text
15+
*.npz filter=lfs diff=lfs merge=lfs -text
16+
*.onnx filter=lfs diff=lfs merge=lfs -text
17+
*.ot filter=lfs diff=lfs merge=lfs -text
18+
*.parquet filter=lfs diff=lfs merge=lfs -text
19+
*.pb filter=lfs diff=lfs merge=lfs -text
20+
*.pickle filter=lfs diff=lfs merge=lfs -text
21+
*.pkl filter=lfs diff=lfs merge=lfs -text
22+
*.pt filter=lfs diff=lfs merge=lfs -text
23+
*.pth filter=lfs diff=lfs merge=lfs -text
24+
*.rar filter=lfs diff=lfs merge=lfs -text
25+
*.safetensors filter=lfs diff=lfs merge=lfs -text
26+
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27+
*.tar.* filter=lfs diff=lfs merge=lfs -text
28+
*.tflite filter=lfs diff=lfs merge=lfs -text
29+
*.tgz filter=lfs diff=lfs merge=lfs -text
30+
*.wasm filter=lfs diff=lfs merge=lfs -text
31+
*.xz filter=lfs diff=lfs merge=lfs -text
32+
*.zip filter=lfs diff=lfs merge=lfs -text
33+
*.zst filter=lfs diff=lfs merge=lfs -text
34+
*tfevents* filter=lfs diff=lfs merge=lfs -text
+34
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
*.7z filter=lfs diff=lfs merge=lfs -text
2+
*.arrow filter=lfs diff=lfs merge=lfs -text
3+
*.bin filter=lfs diff=lfs merge=lfs -text
4+
*.bz2 filter=lfs diff=lfs merge=lfs -text
5+
*.ckpt filter=lfs diff=lfs merge=lfs -text
6+
*.ftz filter=lfs diff=lfs merge=lfs -text
7+
*.gz filter=lfs diff=lfs merge=lfs -text
8+
*.h5 filter=lfs diff=lfs merge=lfs -text
9+
*.joblib filter=lfs diff=lfs merge=lfs -text
10+
*.lfs.* filter=lfs diff=lfs merge=lfs -text
11+
*.mlmodel filter=lfs diff=lfs merge=lfs -text
12+
*.model filter=lfs diff=lfs merge=lfs -text
13+
*.msgpack filter=lfs diff=lfs merge=lfs -text
14+
*.npy filter=lfs diff=lfs merge=lfs -text
15+
*.npz filter=lfs diff=lfs merge=lfs -text
16+
*.onnx filter=lfs diff=lfs merge=lfs -text
17+
*.ot filter=lfs diff=lfs merge=lfs -text
18+
*.parquet filter=lfs diff=lfs merge=lfs -text
19+
*.pb filter=lfs diff=lfs merge=lfs -text
20+
*.pickle filter=lfs diff=lfs merge=lfs -text
21+
*.pkl filter=lfs diff=lfs merge=lfs -text
22+
*.pt filter=lfs diff=lfs merge=lfs -text
23+
*.pth filter=lfs diff=lfs merge=lfs -text
24+
*.rar filter=lfs diff=lfs merge=lfs -text
25+
*.safetensors filter=lfs diff=lfs merge=lfs -text
26+
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27+
*.tar.* filter=lfs diff=lfs merge=lfs -text
28+
*.tflite filter=lfs diff=lfs merge=lfs -text
29+
*.tgz filter=lfs diff=lfs merge=lfs -text
30+
*.wasm filter=lfs diff=lfs merge=lfs -text
31+
*.xz filter=lfs diff=lfs merge=lfs -text
32+
*.zip filter=lfs diff=lfs merge=lfs -text
33+
*.zst filter=lfs diff=lfs merge=lfs -text
34+
*tfevents* filter=lfs diff=lfs merge=lfs -text

bert/bert-large-japanese-v2/README.md

+53
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
---
2+
license: apache-2.0
3+
datasets:
4+
- cc100
5+
- wikipedia
6+
language:
7+
- ja
8+
widget:
9+
- text: 東北大学で[MASK]の研究をしています。
10+
---
11+
12+
# BERT large Japanese (unidic-lite with whole word masking, CC-100 and jawiki-20230102)
13+
14+
This is a [BERT](https://github.com/google-research/bert) model pretrained on texts in the Japanese language.
15+
16+
This version of the model processes input texts with word-level tokenization based on the Unidic 2.1.2 dictionary (available in [unidic-lite](https://pypi.org/project/unidic-lite/) package), followed by the WordPiece subword tokenization.
17+
Additionally, the model is trained with the whole word masking enabled for the masked language modeling (MLM) objective.
18+
19+
The codes for the pretraining are available at [cl-tohoku/bert-japanese](https://github.com/cl-tohoku/bert-japanese/).
20+
21+
## Model architecture
22+
23+
The model architecture is the same as the original BERT large model; 24 layers, 1024 dimensions of hidden states, and 16 attention heads.
24+
25+
## Training Data
26+
27+
The model is trained on the Japanese portion of [CC-100 dataset](https://data.statmt.org/cc-100/) and the Japanese version of Wikipedia.
28+
For Wikipedia, we generated a text corpus from the [Wikipedia Cirrussearch dump file](https://dumps.wikimedia.org/other/cirrussearch/) as of January 2, 2023.
29+
The corpus files generated from CC-100 and Wikipedia are 74.3GB and 4.9GB in size and consist of approximately 392M and 34M sentences, respectively.
30+
31+
For the purpose of splitting texts into sentences, we used [fugashi](https://github.com/polm/fugashi) with [mecab-ipadic-NEologd](https://github.com/neologd/mecab-ipadic-neologd) dictionary (v0.0.7).
32+
33+
## Tokenization
34+
35+
The texts are first tokenized by MeCab with the Unidic 2.1.2 dictionary and then split into subwords by the WordPiece algorithm.
36+
The vocabulary size is 32768.
37+
38+
We used [fugashi](https://github.com/polm/fugashi) and [unidic-lite](https://github.com/polm/unidic-lite) packages for the tokenization.
39+
40+
## Training
41+
42+
We trained the model first on the CC-100 corpus for 1M steps and then on the Wikipedia corpus for another 1M steps.
43+
For training of the MLM (masked language modeling) objective, we introduced whole word masking in which all of the subword tokens corresponding to a single word (tokenized by MeCab) are masked at once.
44+
45+
For training of each model, we used a v3-8 instance of Cloud TPUs provided by [TPU Research Cloud](https://sites.research.google/trc/about/).
46+
47+
## Licenses
48+
49+
The pretrained models are distributed under the Apache License 2.0.
50+
51+
## Acknowledgments
52+
53+
This model is trained with Cloud TPUs provided by [TPU Research Cloud](https://sites.research.google/trc/about/) program.
+19
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
{
2+
"architectures": [
3+
"BertForPreTraining"
4+
],
5+
"attention_probs_dropout_prob": 0.1,
6+
"hidden_act": "gelu",
7+
"hidden_dropout_prob": 0.1,
8+
"hidden_size": 1024,
9+
"initializer_range": 0.02,
10+
"intermediate_size": 4096,
11+
"layer_norm_eps": 1e-12,
12+
"max_position_embeddings": 512,
13+
"model_type": "bert",
14+
"num_attention_heads": 16,
15+
"num_hidden_layers": 24,
16+
"pad_token_id": 0,
17+
"type_vocab_size": 2,
18+
"vocab_size": 32768
19+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
{
2+
"tokenizer_class": "BertJapaneseTokenizer",
3+
"model_max_length": 512,
4+
"do_lower_case": false,
5+
"word_tokenizer_type": "mecab",
6+
"subword_tokenizer_type": "wordpiece",
7+
"mecab_kwargs": {
8+
"mecab_dic": "unidic_lite"
9+
}
10+
}

0 commit comments

Comments
 (0)